Abstract
Background: Despite the widely recognized association between the severity of early preterm birth (ePTB) and its
related severe diseases, little is known about the potential risk factors of ePTB and the sub-population with high risk
of ePTB. Moreover, motivated by a future confirmatory clinical trial to identify whether supplementing pregnant
women with docosahexaenoic acid (DHA) has a different effect on the risk subgroup population or not in terms of
ePTB prevalence, this study aims to identify potential risk subgroups and risk factors for ePTB, defined as babies
born less than 34 weeks of gestation.
Methods: The analysis data (N = 3,994,872) were obtained from CDC and NCHS’ 2014 Natality public data file. The
sample was split into independent training and validation cohorts for model generation and model assessment,
respectively. Logistic regression and CART models were used to examine potential ePTB risk predictors and their
interactions, including mothers’ age, nativity, race, Hispanic origin, marital status, education, pre-pregnancy smoking
status, pre-pregnancy BMI, pre-pregnancy diabetes status, pre-pregnancy hypertension status, previous preterm
birth status, infertility treatment usage status, fertility enhancing drug usage status, and delivery payment source.
Results: Both logistic regression models with either 14 or 10 ePTB risk factors produced the same C-index (0.646)
based on the training cohort. The C-index of the logistic regression model based on 10 predictors was 0.645 for the
validation cohort. Both C-indexes indicated a good discrimination and acceptable model fit. The CART model
identified preterm birth history and race as the most important risk factors, and revealed that the subgroup with a
preterm birth history and a race designation as Black had the highest risk for ePTB. The c-index and misclassification
rate were 0.579 and 0.034 for the training cohort, and 0.578 and 0.034 for the validation cohort, respectively.
Conclusions: This study revealed 14 maternal characteristic variables that reliably identified risk for ePTB through
either logistic regression model and/or a CART model. Moreover, both models efficiently identify risk subgroups for
further enrichment clinical trial design.
Description
A grant from the One-University Open Access Fund at the University of Kansas was used to defray the author's publication fees in this Open Access journal. The Open Access Fund, administered by librarians from the KU, KU Law, and KUMC libraries, is made possible by contributions from the offices of KU Provost, KU Vice Chancellor for Research & Graduate Studies, and KUMC Vice Chancellor for Research. For more information about the Open Access Fund, please see http://library.kumc.edu/authors-fund.xml.
Citation
Zhang, C., Garrard, L., Keighley, J., Carlson, S., & Gajewski, B. (2017). Subgroup identification of early preterm birth (ePTB): informing a future prospective enrichment clinical trial design. BMC Pregnancy and Childbirth, 17, 18. http://doi.org/10.1186/s12884-016-1189-0