Finding Item-Level Causes of Differential Item Functioning: A Hierarchical IRT Model for Explaining DIF

Brussow, Jennifer

View/Open

Brussow_ku_0099D_15778_DATA_1.pdf (8.239Mb)

Issue Date

2018-05-31

Author

Brussow, Jennifer

Publisher

University of Kansas

Format

163 pages

Type

Dissertation

Degree Level

Ph.D.

Discipline

Psychology & Research in Education

Rights

Metadata

Show full item record

Abstract

This research explored the effectiveness of using a hierarchical 2-PL item response theory (IRT) model to explain differential item functioning (DIF) according to item-level features. Explaining DIF in terms of variance attributable to construct-irrelevant item-level features would allow testing programs to improve item writing and item review processes to account for the features shown to predict DIF. Whereas previous research in this area has used classical test theory for scaling and logistic regression for DIF detection, this study explained DIF in terms of a hierarchical IRT model. Latent trait models are more widely used in operational testing programs; additionally, simultaneous estimation allows uncertainty in parameter estimates to be considered during the estimation of item-level features’ relationship with DIF and is more parsimonious than a two-stage model. This simulation study assessed the parameter recovery and stability of the proposed model across 36 different conditions created by varying four parameters: the strength of the correlation between the amount of DIF and the item-level features, the proportion of examinees in the reference group, and the mean and mixture probability of the mixture distribution used to sample items’ DIF. The model successfully recovered person and item parameters, differences in groups’ mean ability, and the relationship between the amount of DIF observed in an item and the presence of DIF-related item-level features. Model performance varied according to the values of the four parameters used to create conditions, especially the proportion of examinees in the reference group, which exhibited meaningful effect sizes in ANOVAs used to assess the parameters’ impact on MSE and affected the model’s power to detect DIF. When there were equal numbers of examinees in the reference and focal groups, the power to detect DIF increased, but at the expense of higher false positive rates and poorer precision.

URI

http://hdl.handle.net/1808/27547

Collections

Dissertations [4702]
Educational Psychology Scholarly Works [75]

The University of Kansas prohibits discrimination on the basis of race, color, ethnicity, religion, sex, national origin, age, ancestry, disability, status as a veteran, sexual orientation, marital status, parental status, gender identity, gender expression and genetic information in the University’s programs and activities. The following person has been designated to handle inquiries regarding the non-discrimination policies: Director of the Office of Institutional Opportunity and Access, IOA@ku.edu, 1246 W. Campus Road, Room 153A, Lawrence, KS, 66045, (785)864-6414, 711 TTY.