Finding Item-Level Causes of Differential Item Functioning: A Hierarchical IRT Model for Explaining DIF
Issue Date
2018-05-31Author
Brussow, Jennifer
Publisher
University of Kansas
Format
163 pages
Type
Dissertation
Degree Level
Ph.D.
Discipline
Psychology & Research in Education
Rights
Copyright held by the author.
Metadata
Show full item recordAbstract
This research explored the effectiveness of using a hierarchical 2-PL item response theory (IRT) model to explain differential item functioning (DIF) according to item-level features. Explaining DIF in terms of variance attributable to construct-irrelevant item-level features would allow testing programs to improve item writing and item review processes to account for the features shown to predict DIF. Whereas previous research in this area has used classical test theory for scaling and logistic regression for DIF detection, this study explained DIF in terms of a hierarchical IRT model. Latent trait models are more widely used in operational testing programs; additionally, simultaneous estimation allows uncertainty in parameter estimates to be considered during the estimation of item-level features’ relationship with DIF and is more parsimonious than a two-stage model. This simulation study assessed the parameter recovery and stability of the proposed model across 36 different conditions created by varying four parameters: the strength of the correlation between the amount of DIF and the item-level features, the proportion of examinees in the reference group, and the mean and mixture probability of the mixture distribution used to sample items’ DIF. The model successfully recovered person and item parameters, differences in groups’ mean ability, and the relationship between the amount of DIF observed in an item and the presence of DIF-related item-level features. Model performance varied according to the values of the four parameters used to create conditions, especially the proportion of examinees in the reference group, which exhibited meaningful effect sizes in ANOVAs used to assess the parameters’ impact on MSE and affected the model’s power to detect DIF. When there were equal numbers of examinees in the reference and focal groups, the power to detect DIF increased, but at the expense of higher false positive rates and poorer precision.
Collections
- Dissertations [4660]
- Educational Psychology Scholarly Works [75]
Items in KU ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
We want to hear from you! Please share your stories about how Open Access to this item benefits YOU.