USING PRINCIPAL COMPONENT ANALYSIS (PCA) TO OBTAIN AUXILIARY VARIABLES FOR MISSING DATA IN LARGE DATA SETS
Issue Date
2012-08-31Author
Howard, Waylon Justin
Publisher
University of Kansas
Format
286 pages
Type
Dissertation
Degree Level
Ph.D.
Discipline
Psychology
Rights
This item is protected by copyright and unless otherwise specified the copyright of this thesis/dissertation is held by the author.
Metadata
Show full item recordAbstract
The purpose of this dissertation is to address an important issue in the imputation of missing data in large data sets. The issue can arise in any analysis in which auxiliary variables are used to inform a modern missing data handling procedure (e.g., FIML, MI) to support the missing at random assumption, reduce bias and decrease standard errors. The problem is that researchers suggest an "inclusive strategy" where as many auxiliary variables are included as possible. However, the model becomes more complex with the addition of each additional auxiliary variable, so there is a practical limit to the number of auxiliary variables that can be successfully included. Beyond this limit, the model will fail to converge. Large data projects can present a challenge because it is possible to have hundreds of potential auxiliary variables to inform the missing data handling procedure, especially when non-linear information is included. The dissertation is divided into the following sections: 1) a brief discussion of the issue of missing data; 2) a review of the history of missing data including theory and existing solutions regarding handling missingness; 3) an assessment of the use of auxiliary variables in missing data handling; 4) a discussion of convergence failure with modern missing data methods; 5) a basic introduction to principal component analysis; 6) the introduction of an alternative strategy to address the large number of auxiliary variables issue; and finally, 7) a demonstration of the potential of the principal component scores as auxiliary variables approach by applying it to the analysis of simulated and empirical data.
Collections
- Dissertations [4626]
- Psychology Dissertations and Theses [459]
Items in KU ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
We want to hear from you! Please share your stories about how Open Access to this item benefits YOU.