The Best Subset In Validation Algorithm: Testing Political Scientific Theory Via Predictive Analytics

Rogers, Benjamin Joseph

View/Open

Rogers_ku_0099D_13778_DATA_1.pdf (1.710Mb)

Issue Date

2017-08-31

Author

Rogers, Benjamin Joseph

Publisher

University of Kansas

Format

194 pages

Type

Dissertation

Degree Level

Ph.D.

Discipline

Political Science

Rights

Metadata

Show full item record

Abstract

The difficulties arising from using statistical significance to demonstrate the truth and utility of a hypothesis have been known for some time (Cohen 1994). To develop a more rigorous conception of substantive significance, a new means of determining substantive significance is presented, and a new technique for its implementation is demonstrated. The predictive approach, as differentiated from the explanatory approach (in which significance testing is grounded), concentrates on best determining the value of observations that a given model has not yet seen (Shmueli 2010). The technique for implementation, the best subset in validation algorithm (or BeSIVa), attempts to make the best prediction possible using all available observations. Dividing observations into two separate datasets, training data used for modeling and test data which determines the quality of models at making predictions, BeSiVa tries to best predict predict a dependent variable using a randomly selected test set. BeSiVa is applied to an old question, the choice to vote, as well as two new ones: innumeracy on the proportion of minorities in the United States (Alba, Rumbaut, and Marotz 2005) and support for Donald Trump during his 2016 presidential run. When 656 variables from the GSS were provided to determine if the algorithm regularly selected theoretically relevant independent variables to model turnout, BeSiVa selected a theoretically relevant predictor, voting in the last presidential election, each time. Then, a smaller selection of variables that had been theoretically verified as related to turnout in the 2000 presidential election were then provided. From these variables, BeSiVa clearly favored sociological and psychological theories of turnout over the more recent mobilization theory. Having demonstrated how BeSiVa selected relevant independent variables when analyzing turnout, it was applied to newer questions. Innumeracy's theoretical origins were extended, showing how religious identification and financial satisfaction predicted an individual's ability to estimate minority proportions. BeSiVa also suggested origins for President Trump's support, grounding it in racial resentment, feelings on President Obama, and concerns about security and immigration. The algorithm's tendency to make theoretically grounded models, even when irrelevant independent variables were provided demonstrates its capability at making useful predictive models with relevant predictors.

URI

http://hdl.handle.net/1808/26048

Collections

Dissertations [4889]
Political Science Dissertations and Theses [134]

The University of Kansas prohibits discrimination on the basis of race, color, ethnicity, religion, sex, national origin, age, ancestry, disability, status as a veteran, sexual orientation, marital status, parental status, gender identity, gender expression and genetic information in the University’s programs and activities. The following person has been designated to handle inquiries regarding the non-discrimination policies: Director of the Office of Institutional Opportunity and Access, IOA@ku.edu, 1246 W. Campus Road, Room 153A, Lawrence, KS, 66045, (785)864-6414, 711 TTY.