New Probabilistic Techniques for Classification Problems and an Application

Tan, Yi

dc.contributor.advisor	Shenoy, Prakash
dc.contributor.author	Tan, Yi
dc.date.accessioned	2021-07-20T19:55:34Z
dc.date.available	2021-07-20T19:55:34Z
dc.date.issued	2020-05-31
dc.date.submitted	2020
dc.identifier.other	http://dissertations.umi.com/ku:17232
dc.identifier.uri	http://hdl.handle.net/1808/31730
dc.description.abstract	The main focus of this dissertation is to develop new machine learning and statistical methodologies for classification problems, with a real–life application in healthcare. The dissertation has three chapters. In the first chapter, we examine the construction of hybrid logistic regression–naïve Bayes model, a restricted Bayesian network classifier that combines two probabilistic models in a graphical way, with the aim of combining the strengths of both models. We follow the strategy of balancing the tradeoff between model bias and variance with the objective of minimizing the sum of these two errors. Specifically, we use training set size as a proxy for model variance and conditional dependence among features as a proxy for model bias. Experimental results show that, the resulting hybrid logistic regression–naïve Bayes model is a competitive alternative to a variety of state-of-the-art classifiers. In the second chapter, we focus on a regularization method, which is a technique of adding information to the learning algorithm to improve the estimation of the model. Most of the existing regularization methods (e.g., lasso) rely on sparsity assumption, which reduces a model’s variance by shrinking its coefficients towards zero. One limitation of lasso is that, in practice, sparsity assumption is often violated. Shrinking the coefficients of influential predictors towards zero introduces bias, and make the regression estimates suboptimal. As a consequence, lasso may not perform well when the training set size is relatively large as compared to the number of parameters to be estimated. We argue that for such a situation, shrinking the coefficients towards a low-variance data driven estimate could be a better strategy. For classification purposes, we propose a naïve Bayes regularized logistic regression, which shrinks its coefficients towards naïve Bayes estimates, a well-known low variance estimator, instead of zero. This method is driven by the fact that naïve Bayes and logistic regression converge toward identical classifiers if the naïve Bayes’ conditional independence assumptions hold. Simulation and experimental results suggest that this method is highly competitive with a variety of state-of-the-art classifiers. In the third chapter, we are collaborating with the U.S. Veterans Affairs’ (VA) Eastern Kansas Health Care System, to help them construct a clinical model that can assist doctors in predicting and diagnosing the post-traumatic stress disorder (PTSD). This study is motivated by the need to provide more efficient service process of VA hospitals and reduce veterans’ waiting time. Specifically, we propose a sparsity-enforcing l1 penalized Bayesian network-based model by addressing three clinical challenges presented in veteran PTSD prediction problem: 1. probabilistic classification, 2. large amount of missing data, and 3. high dimensional search space. The proposed model provides better prediction in veterans’ likelihood of suffering from PTSD as compared with a variety of state-of-art probabilistic classifiers. In addition, our model identifies eight variables which provide the most directly predictive power.
dc.format.extent	109 pages
dc.language.iso	en
dc.publisher	University of Kansas
dc.rights	Copyright held by the author.
dc.subject	Business administration
dc.subject	Statistics
dc.subject	Business Analytics
dc.subject	Healthcare Analytics
dc.subject	Machine Learning
dc.subject	Probabilistic Classification
dc.title	New Probabilistic Techniques for Classification Problems and an Application
dc.type	Dissertation
dc.contributor.cmtemember	Sherwood, Ben
dc.contributor.cmtemember	Hillmer, Steve
dc.contributor.cmtemember	Arikan, Mazhar
dc.contributor.cmtemember	Cai, Zongwu
dc.thesis.degreeDiscipline	Business
dc.thesis.degreeLevel	Ph.D.
dc.identifier.orcid	https://orcid.org/0000-0002-9235-0299	en_US
dc.rights.accessrights	openAccess

Files in this item

Name:: Tan_ku_0099D_17232_DATA_1.pdf
Size:: 771.2Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

The University of Kansas prohibits discrimination on the basis of race, color, ethnicity, religion, sex, national origin, age, ancestry, disability, status as a veteran, sexual orientation, marital status, parental status, gender identity, gender expression and genetic information in the University’s programs and activities. The following person has been designated to handle inquiries regarding the non-discrimination policies: Director of the Office of Institutional Opportunity and Access, IOA@ku.edu, 1246 W. Campus Road, Room 153A, Lawrence, KS, 66045, (785)864-6414, 711 TTY.