KUKU

KU ScholarWorks

  • myKU
  • Email
  • Enroll & Pay
  • KU Directory
    • Login
    View Item 
    •   KU ScholarWorks
    • Dissertations and Theses
    • Dissertations
    • View Item
    •   KU ScholarWorks
    • Dissertations and Theses
    • Dissertations
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    New Probabilistic Techniques for Classification Problems and an Application

    Thumbnail
    View/Open
    Tan_ku_0099D_17232_DATA_1.pdf (771.2Kb)
    Issue Date
    2020-05-31
    Author
    Tan, Yi
    Publisher
    University of Kansas
    Format
    109 pages
    Type
    Dissertation
    Degree Level
    Ph.D.
    Discipline
    Business
    Rights
    Copyright held by the author.
    Metadata
    Show full item record
    Abstract
    The main focus of this dissertation is to develop new machine learning and statistical methodologies for classification problems, with a real–life application in healthcare. The dissertation has three chapters. In the first chapter, we examine the construction of hybrid logistic regression–naïve Bayes model, a restricted Bayesian network classifier that combines two probabilistic models in a graphical way, with the aim of combining the strengths of both models. We follow the strategy of balancing the tradeoff between model bias and variance with the objective of minimizing the sum of these two errors. Specifically, we use training set size as a proxy for model variance and conditional dependence among features as a proxy for model bias. Experimental results show that, the resulting hybrid logistic regression–naïve Bayes model is a competitive alternative to a variety of state-of-the-art classifiers. In the second chapter, we focus on a regularization method, which is a technique of adding information to the learning algorithm to improve the estimation of the model. Most of the existing regularization methods (e.g., lasso) rely on sparsity assumption, which reduces a model’s variance by shrinking its coefficients towards zero. One limitation of lasso is that, in practice, sparsity assumption is often violated. Shrinking the coefficients of influential predictors towards zero introduces bias, and make the regression estimates suboptimal. As a consequence, lasso may not perform well when the training set size is relatively large as compared to the number of parameters to be estimated. We argue that for such a situation, shrinking the coefficients towards a low-variance data driven estimate could be a better strategy. For classification purposes, we propose a naïve Bayes regularized logistic regression, which shrinks its coefficients towards naïve Bayes estimates, a well-known low variance estimator, instead of zero. This method is driven by the fact that naïve Bayes and logistic regression converge toward identical classifiers if the naïve Bayes’ conditional independence assumptions hold. Simulation and experimental results suggest that this method is highly competitive with a variety of state-of-the-art classifiers. In the third chapter, we are collaborating with the U.S. Veterans Affairs’ (VA) Eastern Kansas Health Care System, to help them construct a clinical model that can assist doctors in predicting and diagnosing the post-traumatic stress disorder (PTSD). This study is motivated by the need to provide more efficient service process of VA hospitals and reduce veterans’ waiting time. Specifically, we propose a sparsity-enforcing l1 penalized Bayesian network-based model by addressing three clinical challenges presented in veteran PTSD prediction problem: 1. probabilistic classification, 2. large amount of missing data, and 3. high dimensional search space. The proposed model provides better prediction in veterans’ likelihood of suffering from PTSD as compared with a variety of state-of-art probabilistic classifiers. In addition, our model identifies eight variables which provide the most directly predictive power.
    URI
    http://hdl.handle.net/1808/31730
    Collections
    • Dissertations [4475]

    Items in KU ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.


    We want to hear from you! Please share your stories about how Open Access to this item benefits YOU.


    Contact KU ScholarWorks
    785-864-8983
    KU Libraries
    1425 Jayhawk Blvd
    Lawrence, KS 66045
    785-864-8983

    KU Libraries
    1425 Jayhawk Blvd
    Lawrence, KS 66045
    Image Credits
     

     

    Browse

    All of KU ScholarWorksCommunities & CollectionsThis Collection

    My Account

    LoginRegister

    Statistics

    View Usage Statistics

    Contact KU ScholarWorks
    785-864-8983
    KU Libraries
    1425 Jayhawk Blvd
    Lawrence, KS 66045
    785-864-8983

    KU Libraries
    1425 Jayhawk Blvd
    Lawrence, KS 66045
    Image Credits
     

     

    The University of Kansas
      Contact KU ScholarWorks
    Lawrence, KS | Maps
     
    • Academics
    • Admission
    • Alumni
    • Athletics
    • Campuses
    • Giving
    • Jobs

    The University of Kansas prohibits discrimination on the basis of race, color, ethnicity, religion, sex, national origin, age, ancestry, disability, status as a veteran, sexual orientation, marital status, parental status, gender identity, gender expression and genetic information in the University’s programs and activities. The following person has been designated to handle inquiries regarding the non-discrimination policies: Director of the Office of Institutional Opportunity and Access, IOA@ku.edu, 1246 W. Campus Road, Room 153A, Lawrence, KS, 66045, (785)864-6414, 711 TTY.

     Contact KU
    Lawrence, KS | Maps