dc.contributor.advisor | Little, Todd D. | |
dc.contributor.author | Howard, Waylon Justin | |
dc.date.accessioned | 2013-02-17T16:35:55Z | |
dc.date.available | 2013-02-17T16:35:55Z | |
dc.date.issued | 2012-08-31 | |
dc.date.submitted | 2012 | |
dc.identifier.other | http://dissertations.umi.com/ku:12364 | |
dc.identifier.uri | http://hdl.handle.net/1808/10815 | |
dc.description.abstract | The purpose of this dissertation is to address an important issue in the imputation of missing data in large data sets. The issue can arise in any analysis in which auxiliary variables are used to inform a modern missing data handling procedure (e.g., FIML, MI) to support the missing at random assumption, reduce bias and decrease standard errors. The problem is that researchers suggest an "inclusive strategy" where as many auxiliary variables are included as possible. However, the model becomes more complex with the addition of each additional auxiliary variable, so there is a practical limit to the number of auxiliary variables that can be successfully included. Beyond this limit, the model will fail to converge. Large data projects can present a challenge because it is possible to have hundreds of potential auxiliary variables to inform the missing data handling procedure, especially when non-linear information is included. The dissertation is divided into the following sections: 1) a brief discussion of the issue of missing data; 2) a review of the history of missing data including theory and existing solutions regarding handling missingness; 3) an assessment of the use of auxiliary variables in missing data handling; 4) a discussion of convergence failure with modern missing data methods; 5) a basic introduction to principal component analysis; 6) the introduction of an alternative strategy to address the large number of auxiliary variables issue; and finally, 7) a demonstration of the potential of the principal component scores as auxiliary variables approach by applying it to the analysis of simulated and empirical data. | |
dc.format.extent | 286 pages | |
dc.language.iso | en | |
dc.publisher | University of Kansas | |
dc.rights | This item is protected by copyright and unless otherwise specified the copyright of this thesis/dissertation is held by the author. | |
dc.subject | Psychology | |
dc.subject | Statistics | |
dc.subject | Auxiliary variables | |
dc.subject | History | |
dc.subject | Large datasets | |
dc.subject | Missing data | |
dc.subject | Principal component analysis (PCA) | |
dc.title | USING PRINCIPAL COMPONENT ANALYSIS (PCA) TO OBTAIN AUXILIARY VARIABLES FOR MISSING DATA IN LARGE DATA SETS | |
dc.type | Dissertation | |
dc.contributor.cmtemember | Johnson, Paul | |
dc.contributor.cmtemember | Walker, Dale | |
dc.contributor.cmtemember | Woods, Carol | |
dc.contributor.cmtemember | Wu, Wei | |
dc.thesis.degreeDiscipline | Psychology | |
dc.thesis.degreeLevel | Ph.D. | |
kusw.oastatus | na | |
kusw.oapolicy | This item does not meet KU Open Access policy criteria. | |
kusw.bibid | 8085897 | |
dc.rights.accessrights | openAccess | |