Show simple item record

dc.contributor.advisorHe, Jianghua
dc.contributor.advisorChalise, Prabhakar
dc.contributor.authorZhong, Yi
dc.date.accessioned2018-10-26T19:33:07Z
dc.date.available2018-10-26T19:33:07Z
dc.date.issued2018-05-31
dc.date.submitted2018
dc.identifier.otherhttp://dissertations.umi.com/ku:15959
dc.identifier.urihttp://hdl.handle.net/1808/27072
dc.description.abstractThis research focuses on using statistical learning methods on high-dimensional biological data analysis. In our implementation of high-dimensional biological data analysis, we primarily utilize the statistical learning methods in selecting important predictors and to build predictive classification models. Traditionally, cross-validation methods have been used in order to determine the tuning or threshold parameter for the feature selection. We propose improvements over the methods by adding repeated and nested cross validation techniques. Also, several types of machine learning methods such as lasso, support vector machine and random forest have been used by many previous studies. Those methods have their own merits and demerits. We also propose ensemble feature selection out of the results of the three machine learning methods by capturing their strengths in order to find the more stable feature subset and to optimize the prediction accuracy. We utilize DNA microarray gene expression datasets to describe our methods. We have summarized our work in the following order: (1) the structure of high dimensional biological datasets and the statistical methods to analyze such data; (2) several statistical and machine learning algorithms to analyze high-dimensional biological datasets; (3) improved cross-validation and ensemble learning method to achieve better prediction accuracy and (4) examples using the DNA microarray data to describe our method
dc.format.extent110 pages
dc.language.isoen
dc.publisherUniversity of Kansas
dc.rightsCopyright held by the author.
dc.subjectStatistics
dc.subjectcross-validation
dc.subjectfeature selection
dc.subjectstatistical learning
dc.titleFeature selection and classification for high-dimensional biological data under cross-validation framework
dc.typeDissertation
dc.contributor.cmtememberGajewski, Byron
dc.contributor.cmtememberWick, Jo
dc.contributor.cmtememberHagan, Christy
dc.thesis.degreeDisciplineBiostatistics
dc.thesis.degreeLevelPh.D.
dc.identifier.orcid
dc.rights.accessrightsopenAccess


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record