Loading...
Thumbnail Image
Item

Methodological Development with Machine Learning and Bayesian Approaches in Cancer and Nutrition Research

Dutta, Sreejata
Citations
Altmetric:
Abstract
Statistics, the science of data, encompasses the entire process from planning experiments to interpreting and presenting results, involving hypothesis formulation, data collection, analysis, and pattern discovery for informed decision-making. Data analysis serves two primary purposes: prediction and inference. Inference extracts information to understand associations between predictors (or features) and responses, while prediction consists of forecasting future values based on the current values of predictors and responses. This research explores both inference and prediction through data and algorithm modeling by incorporating Bayesian and machine learning methodologies to solve problems in cancer and nutrition research. Chapter 2 explores Bayesian finite mixture models for adherence estimation and clinical trial design aimed at improving adherence. Bayesian methods use Bayes' theorem to update knowledge about parameters in a statistical model with new observed data, providing a probabilistic framework for data analysis while incorporating prior knowledge. Finite mixture models are versatile tools for detecting sub-groups within a population. In Chapter 2, we propose a Bayesian finite mixture model to estimate adherence among women with low baseline DHA levels. Using the estimands from this model, a Bayesian adaptive trial design for adherence improvement is designed, utilizing a similar Bayesian finite mixture model to determine effect size and leading to irregular interim analyses. Chapter 3 combines Bayesian and machine learning methodologies to design a predictive study. This chapter introduces a Bayesian Beta-Binomial model for determining test set size based on the precision of model metrics such as sensitivity and specificity. The Beta-Binomial model is also used to find the 95% credible interval for positive predictive value. To demonstrate the application of this Beta-Binomial model, we apply our method to baseline data from two recent clinical trials investigating DHA's impact on early preterm birth. The primary predictive goal is to propose a novel predictive model that uses demographic and self-reported dietary information to predict low blood DHA status among pregnant individuals. The final predictive model estimates baseline blood DHA status with a positive predictive value of 59.3% (95% credible interval: 51.8-66.4%). Chapter 4 focuses on machine learning in rare event prediction using flow cytometry and proteomics data. Machine learning excels in processing vast data amounts, identifying complex patterns, and making predictions. High-throughput methodologies like flow cytometry reveal insights into cell processes but pose computational challenges due to data complexity. Our computational framework integrating machine learning algorithms, proposed in Chapter 4, applies a "wisdom of the crowd" approach for rare cell population prediction and introduces the concept of persistent feature structure that can be used to perform feature selection. The motivation for developing this computational framework is twofold. First, we want researchers to leverage machine learning for predicting rare cell populations in lieu of the laborious process of manual gating as employed in flow cytometry. Second, our framework enables researchers to formulate hypotheses about the importance of biomarkers in rare population identification, which can then be validated through laboratory experiments.
Description
Date
2024-01-01
Journal Title
Journal ISSN
Volume Title
Publisher
University of Kansas
Archive Status
This item contains archived web content.
Research Projects
Organizational Units
Journal Issue
Keywords
Biostatistics, Bayesian, Bayesian adaptive trial design, Feature selection, Machine learning, Sizing test set
Citation
DOI
Published Version
Embedded videos