New Probabilistic Techniques for Classification Problems and an Application

Tan, Yi

View/Open

Tan_ku_0099D_17232_DATA_1.pdf (771.2Kb)

Issue Date

2020-05-31

Author

Tan, Yi

Publisher

University of Kansas

Format

109 pages

Type

Dissertation

Degree Level

Ph.D.

Discipline

Business

Rights

Metadata

Show full item record

Abstract

The main focus of this dissertation is to develop new machine learning and statistical methodologies for classification problems, with a real–life application in healthcare. The dissertation has three chapters. In the first chapter, we examine the construction of hybrid logistic regression–naïve Bayes model, a restricted Bayesian network classifier that combines two probabilistic models in a graphical way, with the aim of combining the strengths of both models. We follow the strategy of balancing the tradeoff between model bias and variance with the objective of minimizing the sum of these two errors. Specifically, we use training set size as a proxy for model variance and conditional dependence among features as a proxy for model bias. Experimental results show that, the resulting hybrid logistic regression–naïve Bayes model is a competitive alternative to a variety of state-of-the-art classifiers. In the second chapter, we focus on a regularization method, which is a technique of adding information to the learning algorithm to improve the estimation of the model. Most of the existing regularization methods (e.g., lasso) rely on sparsity assumption, which reduces a model’s variance by shrinking its coefficients towards zero. One limitation of lasso is that, in practice, sparsity assumption is often violated. Shrinking the coefficients of influential predictors towards zero introduces bias, and make the regression estimates suboptimal. As a consequence, lasso may not perform well when the training set size is relatively large as compared to the number of parameters to be estimated. We argue that for such a situation, shrinking the coefficients towards a low-variance data driven estimate could be a better strategy. For classification purposes, we propose a naïve Bayes regularized logistic regression, which shrinks its coefficients towards naïve Bayes estimates, a well-known low variance estimator, instead of zero. This method is driven by the fact that naïve Bayes and logistic regression converge toward identical classifiers if the naïve Bayes’ conditional independence assumptions hold. Simulation and experimental results suggest that this method is highly competitive with a variety of state-of-the-art classifiers. In the third chapter, we are collaborating with the U.S. Veterans Affairs’ (VA) Eastern Kansas Health Care System, to help them construct a clinical model that can assist doctors in predicting and diagnosing the post-traumatic stress disorder (PTSD). This study is motivated by the need to provide more efficient service process of VA hospitals and reduce veterans’ waiting time. Specifically, we propose a sparsity-enforcing l1 penalized Bayesian network-based model by addressing three clinical challenges presented in veteran PTSD prediction problem: 1. probabilistic classification, 2. large amount of missing data, and 3. high dimensional search space. The proposed model provides better prediction in veterans’ likelihood of suffering from PTSD as compared with a variety of state-of-art probabilistic classifiers. In addition, our model identifies eight variables which provide the most directly predictive power.

URI

http://hdl.handle.net/1808/31730

Collections

Dissertations [4889]

The University of Kansas prohibits discrimination on the basis of race, color, ethnicity, religion, sex, national origin, age, ancestry, disability, status as a veteran, sexual orientation, marital status, parental status, gender identity, gender expression and genetic information in the University’s programs and activities. The following person has been designated to handle inquiries regarding the non-discrimination policies: Director of the Office of Institutional Opportunity and Access, IOA@ku.edu, 1246 W. Campus Road, Room 153A, Lawrence, KS, 66045, (785)864-6414, 711 TTY.