A Likelihood Based Approach to the Assessment of Large Sample Convergence and Model Based Clustering.

Bimali, Milan

View/Open

Bimali_ku_0099D_14370_DATA_1.pdf (1018.Kb)

Issue Date

2015-12-31

Author

Bimali, Milan

Publisher

University of Kansas

Format

128 pages

Type

Dissertation

Degree Level

Ph.D.

Discipline

Biostatistics

Rights

Metadata

Show full item record

Abstract

The likelihood is a function of model parameter(s) and data using a pre-defined probability density function (pdf). Thus, the likelihood can be viewed as model-data combination that can be utilized to address questions of interest. The relative likelihood function is the likelihood function scaled by its mode so as to have its maximum at one. Unlike likelihood functions, relative likelihood functions have attracted little attention and use by statisticians. The proposed dissertation work explores the properties and applications of relative likelihood functions in examining the large-sample convergence properties of maximum likelihood estimator (MLE) and in relation to clustering. The dissertation consists of three chapters. The first chapter presents a simulation based approach to examine the relationship between sample size and the asymptotic behavior of the MLE. The convergence of the observed relative likelihood function (RLF) to the asymptotic relative likelihood function (RLF) is assessed for different sample sizes using two measures of convergence; difference in areas and dissimilarity in shape. The proposed approach has been applied to data from the literature as well as to data simulated from different exponential family distributions. The second chapter proposes a novel clustering approach based on the observed RLFs. Observations in the dataset are assumed to follow a known distribution and observed RLFs are obtained. The observed RLFs are further scaled by the inverse of the asymptotic variation (Fisher Information) evaluated at the mode of the likelihood functions. The weighted RLFs reflect information based similarity among observations in the data. A data matrix is then developed by evaluating the weighted RLFs at different values in the parameter space. The data matrix allows for direct application of standard clustering algorithms such as k-means algorithm. This clustering approach was applied to simulated dataset based on real data and to datasets simulated from known distributions. The third chapter examines the proposed RLF based clustering approach to a publicly available gene expression dataset consisting of 70 gene expression profiles used to classify patients into prognostic groups. The agreement between the RLF clustering results and previous classification is also presented. The clusters obtained are also examined in relation to differences in two clinical features – time to overall survival; and time to metastases.

URI

http://hdl.handle.net/1808/21918

Collections

Dissertations [4889]
KU Med Center Dissertations and Theses [464]

The University of Kansas prohibits discrimination on the basis of race, color, ethnicity, religion, sex, national origin, age, ancestry, disability, status as a veteran, sexual orientation, marital status, parental status, gender identity, gender expression and genetic information in the University’s programs and activities. The following person has been designated to handle inquiries regarding the non-discrimination policies: Director of the Office of Institutional Opportunity and Access, IOA@ku.edu, 1246 W. Campus Road, Room 153A, Lawrence, KS, 66045, (785)864-6414, 711 TTY.