The Best Subset In Validation Algorithm: Testing Political Scientific Theory Via Predictive Analytics
Rogers, Benjamin Joseph
University of Kansas
Copyright held by the author.
MetadataShow full item record
The difficulties arising from using statistical significance to demonstrate the truth and utility of a hypothesis have been known for some time (Cohen 1994). To develop a more rigorous conception of substantive significance, a new means of determining substantive significance is presented, and a new technique for its implementation is demonstrated. The predictive approach, as differentiated from the explanatory approach (in which significance testing is grounded), concentrates on best determining the value of observations that a given model has not yet seen (Shmueli 2010). The technique for implementation, the best subset in validation algorithm (or BeSIVa), attempts to make the best prediction possible using all available observations. Dividing observations into two separate datasets, training data used for modeling and test data which determines the quality of models at making predictions, BeSiVa tries to best predict predict a dependent variable using a randomly selected test set. BeSiVa is applied to an old question, the choice to vote, as well as two new ones: innumeracy on the proportion of minorities in the United States (Alba, Rumbaut, and Marotz 2005) and support for Donald Trump during his 2016 presidential run. When 656 variables from the GSS were provided to determine if the algorithm regularly selected theoretically relevant independent variables to model turnout, BeSiVa selected a theoretically relevant predictor, voting in the last presidential election, each time. Then, a smaller selection of variables that had been theoretically verified as related to turnout in the 2000 presidential election were then provided. From these variables, BeSiVa clearly favored sociological and psychological theories of turnout over the more recent mobilization theory. Having demonstrated how BeSiVa selected relevant independent variables when analyzing turnout, it was applied to newer questions. Innumeracy's theoretical origins were extended, showing how religious identification and financial satisfaction predicted an individual's ability to estimate minority proportions. BeSiVa also suggested origins for President Trump's support, grounding it in racial resentment, feelings on President Obama, and concerns about security and immigration. The algorithm's tendency to make theoretically grounded models, even when irrelevant independent variables were provided demonstrates its capability at making useful predictive models with relevant predictors.
- Dissertations 
- Political Science Dissertations and Theses 
Items in KU ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
We want to hear from you! Please share your stories about how Open Access to this item benefits YOU.