dc.contributor.author | Shi, Honglian | |
dc.contributor.author | Paolucci, Ugo | |
dc.contributor.author | Vigneau-Callahan, Karen E. | |
dc.contributor.author | Milbury, Paul E. | |
dc.contributor.author | Matson, Wayne R. | |
dc.contributor.author | Kristal, Bruce S. | |
dc.date.accessioned | 2012-05-16T19:03:59Z | |
dc.date.available | 2012-05-16T19:03:59Z | |
dc.date.issued | 2004 | |
dc.identifier.citation | Shi H, Paolucci U, Vigneau-Callahan KE, Shestopalov AI, Milbury PE, Matson WR, and Kristal BS. Development of biomarkers based on diet-dependent metabolic serotypes: practical issues in development of expert system-based classification models in metabolomic studies. OMICS J Integr Biol 8 (3): 197-208; 2004. | |
dc.identifier.uri | http://hdl.handle.net/1808/9572 | |
dc.description | This is the publisher's official version, also available electronically from: http://online.liebertpub.com/doi/pdfplus/10.1089/omi.2004.8.197 | |
dc.description.abstract | Dietary restriction (DR)-induced changes in the serum metabolome may be biomarkers for
physiological status (e.g., relative risk of developing age-related diseases such as cancer).
Megavariate analysis (unsupervised hierarchical cluster analysis IHCAJ; principal components
analysis [PCAJ) of serum metabolites reproducibly distinguish DR from ad libitum fed
rats. Component-based approaches (i.e., PCA) consistently perform as well as or better than
distance-based metrics (i.e., HCA). We therefore tested the following: (A) Do identified subsets
of serum metabolites contain sufficient information to construct mathematical models
of class membership (i.e., expert systems)? (B) Do component-based metrics out-perform
distance-based metrics? Testing was conducted using KNN (k-nearest neighbors, supervised
HCA) and SIMCA (soft independent modeling of class analogy, supervised PCA). Models
were built with single cohorts, combined cohorts or mixed samples from previously studied
cohorts as training sets. Both algorithms over-fit models based on single cohort training sets.
KNN models had >85% accuracy within training/test sets, but were unstable (i.e., values of
k could not be accurately set in advance). SIMCA models had 100% accuracy within all
training sets, 89% accuracy in test sets, did not appear to over-fit mixed cohort training sets,
and did not require post-hoc modeling adjustments. These data indicate that (i) previously
defined metabolites are robust enough to construct classification models (expert systems)
with SIMCA that can predict unknowns by dietary category; (ii) component-based analyses
outperformed distance-based metrics; (iii) use of over-fitting controls is essential; and (iv)
subtle inter-cohort variability may be a critical issue for high data density biomarker studies
that lack state markers. | |
dc.language.iso | en | |
dc.publisher | Mary Ann Liebert, Inc. | |
dc.title | Development of Biomarkers Based on Diet-Dependent Metabolic Serotypes: Practical Issues in Development of Expert System-Based Classification Models in Metabolomic Studies | |
dc.type | Article | |
kusw.kuauthor | Shi, Honglian | |
kusw.kudepartment | Pharmacology and Toxicology | |
kusw.oastatus | fullparticipation | |
dc.identifier.doi | 10.1089/omi.2004.8.197 | |
kusw.oaversion | Scholarly/refereed, publisher version | |
kusw.oapolicy | This item meets KU Open Access policy criteria. | |
dc.rights.accessrights | openAccess | |