Feature selection and classification for high-dimensional biological data under cross-validation framework

Zhong, Yi

dc.contributor.advisor	He, Jianghua
dc.contributor.advisor	Chalise, Prabhakar
dc.contributor.author	Zhong, Yi
dc.date.accessioned	2018-10-26T19:33:07Z
dc.date.available	2018-10-26T19:33:07Z
dc.date.issued	2018-05-31
dc.date.submitted	2018
dc.identifier.other	http://dissertations.umi.com/ku:15959
dc.identifier.uri	http://hdl.handle.net/1808/27072
dc.description.abstract	This research focuses on using statistical learning methods on high-dimensional biological data analysis. In our implementation of high-dimensional biological data analysis, we primarily utilize the statistical learning methods in selecting important predictors and to build predictive classification models. Traditionally, cross-validation methods have been used in order to determine the tuning or threshold parameter for the feature selection. We propose improvements over the methods by adding repeated and nested cross validation techniques. Also, several types of machine learning methods such as lasso, support vector machine and random forest have been used by many previous studies. Those methods have their own merits and demerits. We also propose ensemble feature selection out of the results of the three machine learning methods by capturing their strengths in order to find the more stable feature subset and to optimize the prediction accuracy. We utilize DNA microarray gene expression datasets to describe our methods. We have summarized our work in the following order: (1) the structure of high dimensional biological datasets and the statistical methods to analyze such data; (2) several statistical and machine learning algorithms to analyze high-dimensional biological datasets; (3) improved cross-validation and ensemble learning method to achieve better prediction accuracy and (4) examples using the DNA microarray data to describe our method
dc.format.extent	110 pages
dc.language.iso	en
dc.publisher	University of Kansas
dc.rights	Copyright held by the author.
dc.subject	Statistics
dc.subject	cross-validation
dc.subject	feature selection
dc.subject	statistical learning
dc.title	Feature selection and classification for high-dimensional biological data under cross-validation framework
dc.type	Dissertation
dc.contributor.cmtemember	Gajewski, Byron
dc.contributor.cmtemember	Wick, Jo
dc.contributor.cmtemember	Hagan, Christy
dc.thesis.degreeDiscipline	Biostatistics
dc.thesis.degreeLevel	Ph.D.
dc.identifier.orcid
dc.rights.accessrights	openAccess

Files in this item

Name:: Zhong_ku_0099D_15959_DATA_1.pdf
Size:: 1.895Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

The University of Kansas prohibits discrimination on the basis of race, color, ethnicity, religion, sex, national origin, age, ancestry, disability, status as a veteran, sexual orientation, marital status, parental status, gender identity, gender expression and genetic information in the University’s programs and activities. The following person has been designated to handle inquiries regarding the non-discrimination policies: Director of the Office of Institutional Opportunity and Access, IOA@ku.edu, 1246 W. Campus Road, Room 153A, Lawrence, KS, 66045, (785)864-6414, 711 TTY.