Abstract
Living in an era of test-based accountability systems, how do we hold accountability tests accountable? Many accountability decisions made today are based on the assumption that test scores successfully reflect the effect of instruction. However, only instructionally sensitive assessments, not the instructionally insensitive ones, reflect the impact of instruction. The purpose of this study is to explore the relationship between students' instructional experiences and their test scores on standardized achievement test items. The Mantel-Haenszel statistics, logistic regression and judgmental item-detection approaches were used to identify instructionally sensitive items in the Kansas Mathematics Interim Assessment for seventh graders. The two empirical methods performed very similarly. Many instructionally sensitive items were identified by the empirical methods. No strong agreement between the empirical and judgmental approaches was found. The implications of this study to educators and policymakers, the limitations of this study, and the directions for further studies are discussed.