Loading...
Novel Network-Based Models for High-dimensional Data
Yang, Fengwei
Yang, Fengwei
Citations
Altmetric:
Abstract
Modern cancer-genomic studies frequently involve high-dimensional omics data characterized by
complex network structures, such as gene pathways in transcriptomic profiles and protein-protein
interaction networks in proteomic profiles. These network structures, encompassing network topol
ogy, edge strengths, and the marginal effects of nodes, provide crucial insights into the relation
ships among predictors, which can significantly improve biomarker discovery and clinical outcome
prediction. However, most existing statistical methods fail to incorporate such network infor
mation. To address this limitation, we developed a high-dimensional generalized linear model
(HDnetGLM) that explicitly integrates network structures within the statistical framework. Our
model decomposes the joint effect of each predictor based on its neighbors within the network,
encouraging the selection of connected features and improving interpretability.
Theproposedmethodisimplementedinbothlogistic andPoisson regressions and demonstrates
strong theoretical properties. Through extensive simulation studies, our method outperforms com
peting techniques in terms of variable selection, parameter estimation, and outcome prediction. We
applied this model to breast cancer clinical and RNA sequencing data from The Cancer Genome
Atlas (TCGA) and the Genotype-Tissue Expression Project (GTEx) to predict breast cancer occur
rence, severity, and recurrence. Our analysis identified novel breast cancer risk genes, including
ADCY4, COL1A1, and FGF4, and resulted in improved prediction accuracy, offering new oppor
tunities for early diagnosis and personalized treatment strategies in oncology.
In addition to cancer genomics, we extended this approach to survival analysis for time-to
event outcomes by incorporating network information into a high-dimensional proportional haz
ards model (HDnetCox). This approach allows us to analyze gene expression data in relation to
patient survival while considering the inherent network structure among predictors. Comparative
simulations with popular survival models (Cox-based LASSO, Ridge, and elasticNet) revealed
that our network-informed model consistently outperforms the others in prediction accuracy and
variable selection. Applied to TCGA ovarian cancer survival data, the model identified novel risk
genes and significantly improved survival predictions, emphasizing its potential to unravel com
plex disease mechanisms and enhance clinical predictions in oncology.
We further demonstrated the versatility of our method by applying it to Alzheimer’s disease
(AD) research. Using Fluorodeoxyglucose (FDG)-positron emission tomography (PET) imaging
and clinical data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI), our HDnetGLM
model, which leverages the brain’s connectivity network, achieved superior performance in detect
ing early-stage AD. The model identified ten AD-specific target regions, providing both predictive
power and visual evidence of brain degeneration patterns. Network analysis highlighted the sig
nificance of incorporating connectivity disruptions into statistical models for neurodegenerative
diseases. Our findings offer novel insights into the pathophysiological mechanisms of Alzheimer’s
disease and suggest the potential for improving diagnosis by utilizing brain network information.
Overall, our model presents a powerful framework for integrating network structures in high
dimensional omics and imaging data, leading to advancements in biomarker discovery, survival
prediction, and disease diagnosis across multiple complex disorders, including cancer and Alzheimer’s
disease.
Description
Date
2024-01-01
Journal Title
Journal ISSN
Volume Title
Publisher
University of Kansas
Collections
Archive Status
This item contains archived web content.
Files
Yang_ku_0099D_19817.pdf
Adobe PDF, 7.2 MB
- Embargoed until 2174-05-31
Research Projects
Organizational Units
Journal Issue
Keywords
Biostatistics
