A Likelihood Based Approach to the Assessment of Large Sample Convergence and Model Based Clustering.
Issue Date
2015-12-31Author
Bimali, Milan
Publisher
University of Kansas
Format
128 pages
Type
Dissertation
Degree Level
Ph.D.
Discipline
Biostatistics
Rights
Copyright held by the author.
Metadata
Show full item recordAbstract
The likelihood is a function of model parameter(s) and data using a pre-defined probability density function (pdf). Thus, the likelihood can be viewed as model-data combination that can be utilized to address questions of interest. The relative likelihood function is the likelihood function scaled by its mode so as to have its maximum at one. Unlike likelihood functions, relative likelihood functions have attracted little attention and use by statisticians. The proposed dissertation work explores the properties and applications of relative likelihood functions in examining the large-sample convergence properties of maximum likelihood estimator (MLE) and in relation to clustering. The dissertation consists of three chapters. The first chapter presents a simulation based approach to examine the relationship between sample size and the asymptotic behavior of the MLE. The convergence of the observed relative likelihood function (RLF) to the asymptotic relative likelihood function (RLF) is assessed for different sample sizes using two measures of convergence; difference in areas and dissimilarity in shape. The proposed approach has been applied to data from the literature as well as to data simulated from different exponential family distributions. The second chapter proposes a novel clustering approach based on the observed RLFs. Observations in the dataset are assumed to follow a known distribution and observed RLFs are obtained. The observed RLFs are further scaled by the inverse of the asymptotic variation (Fisher Information) evaluated at the mode of the likelihood functions. The weighted RLFs reflect information based similarity among observations in the data. A data matrix is then developed by evaluating the weighted RLFs at different values in the parameter space. The data matrix allows for direct application of standard clustering algorithms such as k-means algorithm. This clustering approach was applied to simulated dataset based on real data and to datasets simulated from known distributions. The third chapter examines the proposed RLF based clustering approach to a publicly available gene expression dataset consisting of 70 gene expression profiles used to classify patients into prognostic groups. The agreement between the RLF clustering results and previous classification is also presented. The clusters obtained are also examined in relation to differences in two clinical features – time to overall survival; and time to metastases.
Collections
- Dissertations [4626]
- KU Med Center Dissertations and Theses [464]
Items in KU ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
We want to hear from you! Please share your stories about how Open Access to this item benefits YOU.