USING PRINCIPAL COMPONENT ANALYSIS (PCA) TO OBTAIN AUXILIARY VARIABLES FOR MISSING DATA IN LARGE DATA SETS

Howard, Waylon Justin

dc.contributor.advisor	Little, Todd D.
dc.contributor.author	Howard, Waylon Justin
dc.date.accessioned	2013-02-17T16:35:55Z
dc.date.available	2013-02-17T16:35:55Z
dc.date.issued	2012-08-31
dc.date.submitted	2012
dc.identifier.other	http://dissertations.umi.com/ku:12364
dc.identifier.uri	http://hdl.handle.net/1808/10815
dc.description.abstract	The purpose of this dissertation is to address an important issue in the imputation of missing data in large data sets. The issue can arise in any analysis in which auxiliary variables are used to inform a modern missing data handling procedure (e.g., FIML, MI) to support the missing at random assumption, reduce bias and decrease standard errors. The problem is that researchers suggest an "inclusive strategy" where as many auxiliary variables are included as possible. However, the model becomes more complex with the addition of each additional auxiliary variable, so there is a practical limit to the number of auxiliary variables that can be successfully included. Beyond this limit, the model will fail to converge. Large data projects can present a challenge because it is possible to have hundreds of potential auxiliary variables to inform the missing data handling procedure, especially when non-linear information is included. The dissertation is divided into the following sections: 1) a brief discussion of the issue of missing data; 2) a review of the history of missing data including theory and existing solutions regarding handling missingness; 3) an assessment of the use of auxiliary variables in missing data handling; 4) a discussion of convergence failure with modern missing data methods; 5) a basic introduction to principal component analysis; 6) the introduction of an alternative strategy to address the large number of auxiliary variables issue; and finally, 7) a demonstration of the potential of the principal component scores as auxiliary variables approach by applying it to the analysis of simulated and empirical data.
dc.format.extent	286 pages
dc.language.iso	en
dc.publisher	University of Kansas
dc.rights	This item is protected by copyright and unless otherwise specified the copyright of this thesis/dissertation is held by the author.
dc.subject	Psychology
dc.subject	Statistics
dc.subject	Auxiliary variables
dc.subject	History
dc.subject	Large datasets
dc.subject	Missing data
dc.subject	Principal component analysis (PCA)
dc.title	USING PRINCIPAL COMPONENT ANALYSIS (PCA) TO OBTAIN AUXILIARY VARIABLES FOR MISSING DATA IN LARGE DATA SETS
dc.type	Dissertation
dc.contributor.cmtemember	Johnson, Paul
dc.contributor.cmtemember	Walker, Dale
dc.contributor.cmtemember	Woods, Carol
dc.contributor.cmtemember	Wu, Wei
dc.thesis.degreeDiscipline	Psychology
dc.thesis.degreeLevel	Ph.D.
kusw.oastatus	na
kusw.oapolicy	This item does not meet KU Open Access policy criteria.
kusw.bibid	8085897
dc.rights.accessrights	openAccess

Files in this item

Name:: Howard_ku_0099D_12364_DATA_1.pdf
Size:: 7.476Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

The University of Kansas prohibits discrimination on the basis of race, color, ethnicity, religion, sex, national origin, age, ancestry, disability, status as a veteran, sexual orientation, marital status, parental status, gender identity, gender expression and genetic information in the University’s programs and activities. The following person has been designated to handle inquiries regarding the non-discrimination policies: Director of the Office of Institutional Opportunity and Access, IOA@ku.edu, 1246 W. Campus Road, Room 153A, Lawrence, KS, 66045, (785)864-6414, 711 TTY.