Loading...
Thumbnail Image
Item

Novel Network-Based Models for High-dimensional Data

Yang, Fengwei
Citations
Altmetric:
Abstract
Modern cancer-genomic studies frequently involve high-dimensional omics data characterized by complex network structures, such as gene pathways in transcriptomic profiles and protein-protein interaction networks in proteomic profiles. These network structures, encompassing network topol ogy, edge strengths, and the marginal effects of nodes, provide crucial insights into the relation ships among predictors, which can significantly improve biomarker discovery and clinical outcome prediction. However, most existing statistical methods fail to incorporate such network infor mation. To address this limitation, we developed a high-dimensional generalized linear model (HDnetGLM) that explicitly integrates network structures within the statistical framework. Our model decomposes the joint effect of each predictor based on its neighbors within the network, encouraging the selection of connected features and improving interpretability. Theproposedmethodisimplementedinbothlogistic andPoisson regressions and demonstrates strong theoretical properties. Through extensive simulation studies, our method outperforms com peting techniques in terms of variable selection, parameter estimation, and outcome prediction. We applied this model to breast cancer clinical and RNA sequencing data from The Cancer Genome Atlas (TCGA) and the Genotype-Tissue Expression Project (GTEx) to predict breast cancer occur rence, severity, and recurrence. Our analysis identified novel breast cancer risk genes, including ADCY4, COL1A1, and FGF4, and resulted in improved prediction accuracy, offering new oppor tunities for early diagnosis and personalized treatment strategies in oncology. In addition to cancer genomics, we extended this approach to survival analysis for time-to event outcomes by incorporating network information into a high-dimensional proportional haz ards model (HDnetCox). This approach allows us to analyze gene expression data in relation to patient survival while considering the inherent network structure among predictors. Comparative simulations with popular survival models (Cox-based LASSO, Ridge, and elasticNet) revealed that our network-informed model consistently outperforms the others in prediction accuracy and variable selection. Applied to TCGA ovarian cancer survival data, the model identified novel risk genes and significantly improved survival predictions, emphasizing its potential to unravel com plex disease mechanisms and enhance clinical predictions in oncology. We further demonstrated the versatility of our method by applying it to Alzheimer’s disease (AD) research. Using Fluorodeoxyglucose (FDG)-positron emission tomography (PET) imaging and clinical data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI), our HDnetGLM model, which leverages the brain’s connectivity network, achieved superior performance in detect ing early-stage AD. The model identified ten AD-specific target regions, providing both predictive power and visual evidence of brain degeneration patterns. Network analysis highlighted the sig nificance of incorporating connectivity disruptions into statistical models for neurodegenerative diseases. Our findings offer novel insights into the pathophysiological mechanisms of Alzheimer’s disease and suggest the potential for improving diagnosis by utilizing brain network information. Overall, our model presents a powerful framework for integrating network structures in high dimensional omics and imaging data, leading to advancements in biomarker discovery, survival prediction, and disease diagnosis across multiple complex disorders, including cancer and Alzheimer’s disease.
Description
Date
2024-01-01
Journal Title
Journal ISSN
Volume Title
Publisher
University of Kansas
Archive Status
This item contains archived web content.
Research Projects
Organizational Units
Journal Issue
Keywords
Biostatistics
Citation
DOI
Published Version
Embedded videos