Artificial intelligence in preterm birth prediction: a narrative review of current approaches and clinical applicability

Article information

Obstet Gynecol Sci. 2026;69(2):94-102
Publication date (electronic) : 2026 March 3
doi : https://doi.org/10.5468/ogs.26043
Division of Maternal Fetal Medicine, Department of Obstetrics and Gynecology, MizMedi Hospital, Seoul, Korea
Corresponding author: YooKyung Lee, MD, PhD, Division of Maternal Fetal Medicine, Department of Obstetrics and Gynecology, MizMedi Hospital, 295 Gangseo-ro, Gangseo-gu, Seoul 07639, Korea, E-mail: assdfg2@naver.com, https://orcid.org/0009-0006-9560-7156
Received 2026 January 24; Revised 2026 February 11; Accepted 2026 February 24.

Abstract

Preterm birth remains the leading cause of neonatal morbidity and mortality worldwide, affecting approximately 13.4 million births annually. Despite advances in our understanding of risk factors, current clinical prediction methods have demonstrated limited accuracy in individual risk stratification. This narrative review examines the current landscape of artificial intelligence (AI) applications for preterm birth prediction and evaluates the methodological quality and clinical applicability across different data modalities. PubMed, Embase, and Web of Science were searched to develop and validate machine learning models for predicting spontaneous preterm births. AI approaches include electronic health record-based models, deep learning for ultrasound image analysis, cervical texture and radiomics feature extraction, elastography-derived parameters, and multi-omics integration using transformer architectures. Area under the receiver operating characteristic curve values range from 0.61 to 0.89 across modalities. However, the systematic reviews identified significant methodological limitations; 79% of the studies had a high risk of bias according to the prediction model risk-of-bias assessment tool criteria, with a median transparent reporting of multivariable prediction model for individual prognosis or diagnosis (TRIPOD) adherence of only 49%. Common deficiencies include inadequate sample sizes, a lack of external validation, and failure to report calibration metrics. Although AI-based prediction shows promise, substantial improvements in methodological rigor are required before clinical implementation. Priority areas include rigorous external validation, adherence to TRIPOD+AI reporting standards, and prospective evaluation of clinical utility.

Introduction

Preterm birth, defined as delivery before 37 weeks of gestation, remains the leading cause of neonatal and long-term morbidity worldwide [1]. In 2020, an estimated 13.4 million infants were born preterm globally, accounting for 9.9% of all live births. Despite extensive research and clinical efforts over the preceding decade, no measurable reduction in preterm birth rate has been achieved. The burden is particularly high in low- and middle-income countries, although high-income nations continue to face persistent rates that have proven resistant to intervention. Preterm birth has lifelong consequences beyond immediate neonatal complications, including an increased risk of neurodevelopmental disabilities, chronic respiratory disease, and cardiovascular morbidity [2].

The etiology of spontaneous preterm birth is multifactorial and involves complex interactions among genetic predisposition, infection and inflammation, uterine overdistension, cervical insufficiency, decidual hemorrhage, and psychosocial stressors [3]. This heterogeneity complicates efforts to develop accurate prediction tools because different pathophysiological pathways may predominate in different women. Current clinical approaches to preterm birth prediction rely primarily on cervical length measurement using transvaginal ultrasound and biochemical markers such as fetal fibronectin [4]. The International Society of Ultrasound in Obstetrics and Gynecology recommends universal cervical length screening between 18 weeks and 24 weeks of gestation, when resources permit, with vaginal progesterone administered to women with a short cervix [4,5]. Standardized cervical assessment protocols, including elastographic measurement techniques, have been developed to improve the reproducibility of these assessments [6].

However, these methods demonstrate moderate predictive performance. A systematic review of fetal fibronectin testing in symptomatic women reported a positive likelihood ratio of 5.42 for delivery within 7–10 days [7]. The positive predictive value remains low in most clinical settings owing to the relatively low prevalence of preterm birth, limiting the clinical utility for individual patient management. Similarly, cervical length measurements showed modest discrimination with considerable overlap between women who delivered preterm and those who delivered at term.

The application of artificial intelligence (AI) and machine learning to medical prediction has grown substantially over the past decade with demonstrated success in areas such as radiology, pathology, and cardiovascular risk assessment [8,9]. In obstetrics, systematic reviews have identified machine learning as a tool for predicting pregnancy complications including preterm birth, preeclampsia, and gestational diabetes [10]. Recent reviews have highlighted the expanding role of AI in obstetric practice, including fetal growth assessment, placental pathology analysis, and delivery outcome prediction [11]. Machine learning approaches have also been applied to other areas of gynecologic oncology, demonstrating the broad applicability of these methods [12]. The potential use of AI tools, including large language models, in Korean obstetric practice has also been recognized [13].

Machine learning algorithms offer theoretical advantages over traditional statistical methods for predicting preterm birth. They can integrate multiple risk factors simultaneously, identify nonlinear relationships and complex interactions, and extract predictive features from unstructured data such as medical images without the need for manual annotations. Methodological approaches have evolved from conventional algorithms, including logistic regression, random forests, and gradient boosting applied to electronic health record data, to deep learning methods capable of directly analyzing ultrasound images [14]. Recently, transformer-based architectures have enabled the integration of high-dimensional multiomics data, including cell-free DNA and RNA profiles [15].

However, several challenges remain before AI-based predictions can be translated into routine clinical practice. Systematic reviews have consistently identified methodological limitations, including small sample sizes, lack of external validation, and poor adherence to reporting guidelines, such as transparent reporting of multivariable prediction model for individual prognosis or diagnosis (TRIPOD) [16]. This narrative review summarizes the current evidence on AI-based preterm birth prediction, evaluates the methodological quality and clinical applicability across different data modalities, and discusses the gaps that must be addressed for successful clinical translation.

Materials and methods

1. Ethics statement

This was a literature-based study; therefore, neither approval by the Institutional Review Board nor informed consent was required.

2. Study design

This is a narrative review based on a comprehensive search of academic databases.

3.Information sources and search strategy

We searched the PubMed, Embase, and Web of Science databases from January 2015 to December 2025. Search terms included combinations of “preterm birth”, “premature birth”, “preterm delivery”, “preterm labor”, “machine learning”, “artificial intelligence”, “deep learning”, “neural network”, “prediction”, and “risk model”. The reference lists of the identified systematic reviews were screened for additional relevant studies. Studies were included if they developed or validated machine learning or AI models to predict spontaneous preterm birth. We prioritized systematic reviews and meta-analyses, followed by original studies with external validation, large sample sizes, and novel methodological approaches.

Results

1. Systematic reviews and methodological quality

Several systematic reviews have synthesized the rapidly growing literature on AI applications for preterm birth prediction and representative studies across different data modalities are summarized in Table 1. Sharifi-Heris et al. [17] identified 13 studies using electronic health record data and reported a wide range of area under the receiver operating characteristic curve (AUC) values. Substantial heterogeneity exists in the study population, feature selection approaches, and validation methods. Yang et al. [18] conducted a comprehensive meta-analysis of 29 prediction model studies and identified methodological limitations (Table 2). According to the prediction model risk-of-bias assessment tool (PROBAST) criteria, 79% of the studies had a high overall risk of bias, with the analysis domain being the most frequently problematic owing to inadequate sample sizes, selection of predictors based on univariable analysis, and lack of calibration evaluation. The median adherence to the TRIPOD reporting guidelines was only 49%, indicating that many studies failed to report the essential information required for replication and clinical implementation.

Summary of representative studies on AI-based preterm birth prediction by data modality

Methodological quality assessment findings from systematic reviews

Akazawa and Hashimoto [19] conducted a systematic review of 22 studies that used AI for preterm birth prediction and identified electrohysterogram images, biological profiles, metabolic panels from amniotic fluid or maternal blood, and cervical ultrasound images as the primary data types used. They noted that most datasets were insufficient for robust AI model development, with only three studies utilizing databases exceeding 100,000 cases and that higher predictive accuracy was achieved with metabolic panels and electrohysterogram data. These systematic reviews consistently identified the lack of external validation as a critical gap, with most models being evaluated only on internal test sets from the same institution in which they were developed.

2. Electronic health record-based models

Electronic health records provide readily available data for the development of predictive models without the need for additional testing or specialized equipment. Yu et al. [20] developed a CatBoost model using demographic, obstetric, and laboratory variables from 22,603 singleton pregnancies, achieving an AUC of 0.70 in internal validation. Their model incorporated maternal age, maternal weight and height, parity, first-trimester laboratory values (including hemoglobin and platelet counts), and serial measurements of blood pressure, symphysis fundal height, abdominal circumference, and maternal weight gain in late pregnancy. SHAP-based feature importance analysis identified late-pregnancy diastolic blood pressure, changes in symphysis fundal height and abdominal circumference, maternal weight gain, and aspartate aminotransferase level at registration as the leading predictors.

Zhang et al. [21] compared five machine learning algorithms for preterm birth prediction using clinical data from Chinese hospitals. Their AdaBoost model achieved 100% accuracy for term deliveries but a lower sensitivity for detecting preterm cases, highlighting the challenge of class imbalance when preterm births represent only 11.7% of the dataset. Kong et al. [22] employed automated machine learning frameworks for large-scale prediction using electronic inpatient discharge data, demonstrating the feasibility of automated feature selection and model optimization. Huang et al. [23] developed a longitudinal model incorporating data from multiple prenatal visits, showing that prediction accuracy improved as gestational age advanced and more clinical information became available.

In the Korean context, Lee and Ahn [24] applied artificial neural network analysis to data from 596 obstetric patients at the Korea University Anam Hospital. Comparing six machine learning methods, including neural networks, logistic regression, decision trees, random forest, naïve Bayes, and support vector machines, they found that the artificial neural network achieved an accuracy of 0.91 with an AUC of 0.62. Variable importance analysis revealed that the neural network emphasized hypertension, diabetes mellitus, and prior cone biopsy as major predictors, whereas random forest placed more weight on cervical length, maternal age, and prior preterm birth history. This study provides a foundation for the development of locally validated prediction models using Korean population data.

3. Deep learning for ultrasound image analysis

Deep learning enables automated feature extraction from medical images without manual annotation, thereby potentially capturing visual patterns that are not apparent to human observers. Convolutional neural networks (CNNs) have been successfully applied in various obstetric imaging tasks. Burgos-Artizzu et al. [25] developed a CNN model that analyzed fetal lung ultrasound texture to predict neonatal respiratory morbidity and achieved an accuracy of 91.5%. While not directly predicting preterm birth, this study demonstrates the feasibility of deep learning for extracting clinically relevant features from obstetric ultrasound images.

For cervical assessment specifically, Ohtaka et al. [14] developed a CNN model for predicting preterm delivery in women admitted with threatened preterm labor. By analyzing transvaginal ultrasound images from 59 patients, the best-performing model achieved an accuracy of 71.8% with an AUC of 0.704. Notably, this performance exceeded that of experienced clinicians, while two expert physicians achieved accuracies of only 46.5% and 51.7% when visually assessing the same images. This finding suggests that deep learning can extract predictive features from cervical ultrasound that are not readily apparent through conventional visual inspection, potentially capturing the microstructural changes preceding overt cervical shortening. Kloska et al. [26] combined clinical parameters with blood test results and questionnaire data in multimodal machine learning models, demonstrating improved performance compared to single-modality approaches.

4. Cervical texture analysis and radiomics

In addition to cervical length measurements, quantitative analysis of cervical texture may provide additional predictive information reflecting the microstructural changes that precede measurable shortening. Baños et al. [27] demonstrated that ultrasound-derived textural features, including homogeneity, contrast, and entropy, correlated with gestational age and cervical maturation, suggesting that these features could serve as biomarkers for premature cervical ripening. Burgos-Artizzu et al. [28] demonstrated that combining automated cervical length measurement with texture analysis in the mid-trimester improved prediction compared with length alone.

Pachtman et al. [29] introduced a cervical heterogeneity index derived from grayscale histogram analysis of transvaginal ultrasound images. They found significantly higher heterogeneity values in women who subsequently delivered preterm than in those who delivered at term, suggesting that cervical tissue disorganization may be detected before overt length changes. These quantitative approaches offer the advantages of objectivity and reproducibility compared with subjective visual assessments.

5. Elastography-based prediction

Cervical elastography measures tissue stiffness, which decreases during cervical ripening and may decline prematurely in women at risk of preterm birth. Both strain elastography and shear wave elastography techniques have been investigated. Angelopoulou et al. [30] conducted a systematic review and meta-analysis of cervical elastography for preterm birth prediction, including 13 studies with 4,087 participants. They reported pooled sensitivity of 0.77 and specificity of 0.73, though substantial heterogeneity existed across studies in elastography techniques, measurement protocols, and outcome definitions.

Feng et al. [31] combined first-trimester cervical length with shear wave elastography measurements and achieved improved prediction compared to either parameter alone. This early-pregnancy assessment could enable early identification of high-risk women and timely initiation of preventive interventions. Patberg et al. [32] developed the E-cervix index by integrating multiple elastography-derived parameters measured at 18–22 weeks and demonstrated an incremental predictive value over the standard cervical length assessment. The combination of anatomical (length) and functional (stiffness) cervical assessments represents a promising approach for improved risk stratification.

6. Multi-omics integration and transformer models

Advanced machine learning architectures enable the integration of high-dimensional molecular data that may capture underlying pathophysiological processes. Camunas-Soler et al. [33] analyzed cell-free RNA profiles in maternal blood samples and identified transcriptomic signatures predictive of early and extremely early spontaneous preterm births. Their models achieved AUC of 0.80 for predicting delivery before 35 weeks. The identified genes reflected pathways including placental function, immune regulation, and cervical remodeling, providing biological plausibility for predictive associations.

Zhou et al. [15] developed a transformer-based model that integrated cell-free DNA and RNA sequencing data for preterm birth prediction. In their evaluation using data from 682 pregnancies, the cell-free DNA model alone achieved an AUC of 0.822, and the cell-free RNA model achieved an AUC of 0.851. Notably, integrating both data modalities within the transformer architecture achieved an AUC of 0.890, demonstrating a substantial improvement through multi-omics integration compared with single-modality approaches. Although these results are promising, the requirement for next-generation sequencing and specialized bioinformatics analysis limits their immediate clinical applicability in well-resourced settings.

Conclusion

This review identified a substantial growth in AI applications for preterm birth prediction, with approaches spanning electronic health records, ultrasound imaging, cervical texture analysis, elastography, and multi-omics molecular profiling. The reported discrimination metrics are often promising, with AUC values frequently exceeding 0.75 and exceeding 0.85 for some multi-omics approaches. Deep learning models for cervical ultrasound analysis have demonstrated the ability to extract predictive features that escape expert visual assessments. However, the clinical utility of these models remains uncertain owing to their pervasive methodological limitations.

The finding that 79% of studies had a high risk of bias according to the PROBAST criteria is concerning but consistent with broader patterns in clinical prediction model research [18]. Common methodological deficiencies include inadequate sample sizes leading to overfitting, inadequate handling of class imbalance, failure to account for missing data, and the absence of calibration assessment. A median TRIPOD adherence of only 49% indicates a widespread failure to report the essential methodological details required for replication and clinical implementation.

Compared with established clinical tools, AI-based models show the potential for improved discrimination. While fetal fibronectin testing shows a positive likelihood ratio of 5.42 in symptomatic women [7], several machine learning models have reported AUC values exceeding 0.80. The findings of Ohtaka et al. [14] that their CNN model outperformed experienced clinicians (71.8% vs. 46.5–51.7% accuracy) suggests that AI may capture predictive information from cervical images that humans cannot perceive. However, direct comparisons across studies are limited by population heterogeneity, outcome definitions (varying gestational age thresholds from 32 weeks to 37 weeks), and clinical contexts (asymptomatic screening vs. symptomatic evaluation).

In the Korean clinical context, the current prediction practices rely primarily on cervical length measurements combined with biomarkers. Park et al. [34] demonstrated that cervicovaginal fluid cytokines, particularly interleukin (IL)-6 and IL-17, could serve as predictive markers in symptomatic women with preterm labor, achieving a performance comparable to or exceeding that of fetal fibronectin. The study by Lee and Ahn [24] represents an important step toward locally validated AI prediction models, although the modest AUC of 0.62 and single-center design indicate the need for further development and validation across multiple Korean institutions.

Several barriers must be overcome before clinical translation. First, external validation across diverse populations is essential but is rarely performed. Models developed and validated at single institutions may not be generalizable to different healthcare settings, patient demographics, or clinical practices. Second, most studies reported only discrimination metrics without calibration assessment; calibration, ensuring that predicted probabilities match observed outcomes, is essential for clinical decision-making [35]. Third, the optimal threshold for classifying high-risk patients depends on the intended clinical action and relative costs of false positives and false negatives, which vary across clinical contexts.

Kelly et al. [36] outlined the key challenges in delivering clinical impact with AI in healthcare, including ensuring that model performance generalizes across diverse populations, integrating predictions into clinical workflows, establishing appropriate regulatory pathways, and building clinician trust through interpretability. The TRIPOD+AI statement [37] provides updated guidance for the transparent reporting of prediction models using machine learning methods. The DECIDE-AI guidelines [38] offer a framework for the early-stage clinical evaluation of AI decision support systems before full-scale clinical trials. Adherence to these reporting and evaluation frameworks is essential to advance the field beyond proof-of-concept studies to include clinically useful tools.

AI and machine learning approaches offer the potential to improve preterm birth prediction beyond current clinical methods by integrating diverse data sources and identifying complex patterns that are not apparent through traditional analysis. Current evidence demonstrates the feasibility of using multiple data modalities, including electronic health records, ultrasound imaging, cervical texture, elastography, and molecular biomarkers, with some multi-omics approaches achieving AUC values exceeding 0.85. However, the clinical utility of these models remains unclear owing to their pervasive methodological limitations. With 79% of the studies showing a high risk of bias according to the PROBAST criteria and a median TRIPOD adherence of only 49%, substantial improvement in methodological rigor is needed before clinical implementation can be recommended. Priority areas for future research include rigorous external validation across diverse populations and healthcare settings, adherence to TRIPOD+AI reporting standards, assessment of calibration and clinical utility beyond discrimination, and prospective evaluation of the impact on clinical decision-making and patient outcomes. Only by addressing these gaps can AI-based prediction tools fulfill their potential to improve outcomes in women and infants at risk of preterm birth.

Notes

Conflict of interest

No conflict of interest relevant to this article was reported.

Ethical approval

Not applicable.

Patient consent

Not applicable.

Funding information

None.

References

1. Ohuma EO, Moller AB, Bradley E, Chakwera S, Hussain-Alkhateeb L, Lewin A, et al. National, regional, and global estimates of preterm birth in 2020, with trends from 2010: a systematic analysis. Lancet 2023;402:1261–71.
2. Saigal S, Doyle LW. An overview of mortality and sequelae of preterm birth from infancy to adulthood. Lancet 2008;371:261–9.
3. Goldenberg RL, Culhane JF, Iams JD, Romero R. Epidemiology and causes of preterm birth. Lancet 2008;371:75–84.
4. Coutinho CM, Sotiriadis A, Odibo A, Khalil A, D’Antonio F, Feltovich H, et al. ISUOG practice guidelines: role of ultrasound in the prediction of spontaneous preterm birth. Ultrasound Obstet Gynecol 2022;60:435–56.
5. Romero R, Conde-Agudelo A, Da Fonseca E, O’Brien JM, Cetingoz E, Creasy GW, et al. Vaginal progesterone for preventing preterm birth and adverse perinatal outcomes in singleton gestations with a short cervix: a meta-analysis of individual patient data. Am J Obstet Gynecol 2018;218:161–80.
6. Seol HJ, Sung JH, Seong WJ, Kim HM, Park HS, Kwon H, et al. Standardization of measurement of cervical elastography, its reproducibility, and analysis of baseline clinical factors affecting elastographic parameters. Obstet Gynecol Sci 2020;63:42–54.
7. Honest H, Bachmann LM, Gupta JK, Kleijnen J, Khan KS. Accuracy of cervicovaginal fetal fibronectin test in predicting risk of spontaneous preterm birth: systematic review. BMJ 2002;325:301.
8. Rajkomar A, Dean J, Kohane I. Machine learning in medicine. N Engl J Med 2019;380:1347–58.
9. Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med 2019;25:44–56.
10. Bertini A, Salas R, Chabert S, Sobrevia L, Pardo F. Using machine learning to predict complications in pregnancy: a systematic review. Front Bioeng Biotechnol 2022;9:780389.
11. Ahn KH, Lee KS. Artificial intelligence in obstetrics. Obstet Gynecol Sci 2022;65:113–24.
12. Akazawa M, Hashimoto K, Noda K, Yoshida K. The application of machine learning for predicting recurrence in patients with early-stage endometrial cancer: a pilot study. Obstet Gynecol Sci 2021;64:266–73.
13. Lee Y, Kim SY. Potential applications of ChatGPT in obstetrics and gynecology in Korea: a review article. Obstet Gynecol Sci 2024;67:153–9.
14. Ohtaka A, Akazawa M, Hashimoto K. Deep learning algorithm for predicting preterm birth in the case of threatened preterm labor admissions using transvaginal ultrasound. J Med Ultrason (2001) 2024;51:323–30.
15. Zhou S, Guan C, Deng S, Zhu Y, Yang W, Zhang X, et al. A novel sequence-based transformer model architecture for integrating multi-omics data in preterm birth risk prediction. NPJ Digit Med 2025;8:536.
16. Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. Ann Intern Med 2015;162:55–63.
17. Sharifi-Heris Z, Laitala J, Airola A, Rahmani AM, Bender M. Machine learning approach for preterm birth prediction using health records: systematic review. JMIR Med Inform 2022;10:e33875.
18. Yang Q, Fan X, Cao X, Hao W, Lu J, Wei J, et al. Reporting and risk of bias of prediction models based on machine learning methods in preterm birth: a systematic review. Acta Obstet Gynecol Scand 2023;102:7–14.
19. Akazawa M, Hashimoto K. Prediction of preterm birth using artificial intelligence: a systematic review. J Obstet Gynaecol 2022;42:1662–8.
20. Yu QY, Lin Y, Zhou YR, Yang XJ, Hemelaar J. Predicting risk of preterm birth in singleton pregnancies using machine learning algorithms. Front Big Data 2024;7:1291196.
21. Zhang Y, Du S, Hu T, Xu S, Lu H, Xu C, et al. Establishment of a model for predicting preterm birth based on the machine learning algorithm. BMC Pregnancy Childbirth 2023;23:779.
22. Kong D, Tao Y, Xiao H, Xiong H, Wei W, Cai M. Predicting preterm birth using auto-ML frameworks: a large observational study using electronic inpatient discharge data. Front Pediatr 2024;12:1330420.
23. Huang C, Long X, van der Ven M, Kaptein M, Oei SG, van den Heuvel E. Predicting preterm birth using electronic medical records from multiple prenatal visits. BMC Pregnancy Childbirth 2024;24:843.
24. Lee KS, Ahn KH. Artificial neural network analysis of spontaneous preterm labor and birth and its major determinants. J Korean Med Sci 2019;34:e128.
25. Burgos-Artizzu XP, Perez-Moreno Á, Coronado-Gutierrez D, Gratacos E, Palacio M. Evaluation of an improved tool for non-invasive prediction of neonatal respiratory morbidity based on fully automated fetal lung ultrasound analysis. Sci Rep 2019;9:1950.
26. Kloska A, Harmoza A, Kloska SM, Marciniak T, Sadowska-Krawczenko I. Predicting preterm birth using machine learning methods. Sci Rep 2025;15:5683.
27. Baños N, Perez-Moreno A, Migliorelli F, Triginer L, Cobo T, Bonet-Carne E, et al. Quantitative analysis of the cervical texture by ultrasound and correlation with gestational age. Fetal Diagn Ther 2017;41:265–72.
28. Burgos-Artizzu XP, Baños N, Coronado-Gutiérrez D, Ponce J, Valenzuela-Alcaraz B, Moreno-Espinosa AL, et al. Mid-trimester prediction of spontaneous preterm birth with automated cervical quantitative ultrasound texture analysis and cervical length: a prospective study. Sci Rep 2021;11:7469.
29. Pachtman SL, Ghorayeb SR, Blitz MJ, Harris K, Vohra N, Sison CP, et al. Ultrasonic assessment of cervical heterogeneity for prediction of spontaneous preterm birth: a feasibility study. Am J Perinatol 2018;35:292–7.
30. Angelopoulou E, Gourounti K, Bolou A, Manesi M, Diamanti A. Cervical elastography as a predictive tool for preterm birth: a systematic review and meta-analysis. Cureus 2025;17:e92505.
31. Feng Q, Chaemsaithong P, Duan H, Ju X, Appiah K, Shen L, et al. Screening for spontaneous preterm birth by cervical length and shear-wave elastography in the first trimester of pregnancy. Am J Obstet Gynecol 2022;227:500e1–14.
32. Patberg ET, Wells M, Vahanian SA, Zavala J, Bhattacharya S, Richmond D, et al. Use of cervical elastography at 18 to 22 weeks’ gestation in the prediction of spontaneous preterm birth. Am J Obstet Gynecol 2021;225:525e1–9.
33. Camunas-Soler J, Gee EPS, Reddy M, Mi JD, Thao M, Brundage T, et al. Predictive RNA profiles for early and very early spontaneous preterm birth. Am J Obstet Gynecol 2022;227:72e1–16.
34. Park S, You YA, Yun H, Choi SJ, Hwang HS, Choi SK, et al. Cervicovaginal fluid cytokines as predictive markers of preterm birth in symptomatic women. Obstet Gynecol Sci 2020;63:455–63.
35. Steyerberg EW, Vergouwe Y. Towards better clinical prediction models: seven steps for development and an ABCD for validation. Eur Heart J 2014;35:1925–31.
36. Kelly CJ, Karthikesalingam A, Suleyman M, Corrado G, King D. Key challenges for delivering clinical impact with artificial intelligence. BMC Med 2019;17:195.
37. Collins GS, Moons KGM, Dhiman P, Riley RD, Beam AL, Van Calster B, et al. TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. BMJ 2024;385:e078378.
38. Vasey B, Nagendran M, Campbell B, Clifton DA, Collins GS, Denaxas S, et al. Reporting guideline for the early-stage clinical evaluation of decision support systems driven by artificial intelligence: DECIDE-AI. Nat Med 2022;28:924–33.

Article information Continued

Table 1

Summary of representative studies on AI-based preterm birth prediction by data modality

Data modality Study Sample size Algorithm Key features AUC Validation
EHR Yu et al. [20] (2024) 22,603 CatBoost Longitudinal prenatal data, antenatal visits, labs 0.70 Internal
Zhang et al. [21] (2023) 5,411 AdaBoost Clinical variables N/Aa Internal
Kong et al. [22] (2024) 715,962 AutoML (GBM/XGBoost) Discharge data 0.82–0.85 Internal
Huang et al. [23] (2024) 8,830 Elastic net LR Longitudinal prenatal data 0.62–0.71 Internal
Lee and Ahn [24] (2019) 596 ANN HTN, DM, cone biopsy, CL 0.62 Internal
Ultrasound Ohtaka et al. [14] (2024) 59 CNN Cervical images 0.70 Internal
Burgos-Artizzu et al. [25] (2019) 790 CNN Fetal lung texture N/Ab Internal
Burgos-Artizzu et al. [28] (2021) 633 CNN Cervical texture+CL 0.68–0.77 Internal
Kloska et al. [26] (2025) 50 SVM, LR, XGBoost Blood tests+questionnaire N/Ac Internal
Texture/radiomics Baños et al. [27] (2017) 700 images Texture analysis Cervical texture features N/A Cross-sectional
Pachtman et al. [29] (2018) 151 Heterogeneity index Grayscale histogram N/A Internal
Elastography Angelopoulou et al. [30] (2025) 4,087 (MA) Strain/SWE Cervical stiffness Pooled: 0.82 Meta-analysis
Feng et al. [31] (2022) 2,316 SWE 1st trimester CL+SWE 0.69 Internal
Patberg et al. [32] (2021) 742 E-cervix index Multi-parameter stiffness N/Ad Internal
Multi-omics Camunas-Soler et al. [33] (2022) 242 ML+cfRNA Cell-free RNA signatures 0.80 Internal
Zhou et al. [15] (2025) 682 Transformer cfDNA+cfRNA integration 0.89 Internal

AI, artificial intelligence; AUC, area under the receiver operating characteristic curve; EHR, electronic health record; N/A, not applicable; GBM, gradient boosting machine; LR, logistic regression; ANN, artificial neural network; HTN, hypertension; DM, diabetes mellitus; CL, cervical length; CNN, convolutional neural network; SVM, support vector machine; MA, meta-analysis; SWE, shear wave elastography; ML, machine learning; cfRNA, cell-free RNA; cfDNA, cell-free DNA.

a

Zhang et al. [21] reported accuracy only (non-preterm: 100%; preterm: 72.73%); AUC not reported.

b

Burgos-Artizzu et al. [25] predicted neonatal respiratory morbidity (not preterm birth directly); AUC not reported for preterm birth endpoint.

c

Kloska et al. [26] reported accuracy (82%), precision, recall, F1-score; AUC not reported.

d

Patberg et al. [32] reported odds ratios for elasticity contrast index; AUC not reported.

Table 2

Methodological quality assessment findings from systematic reviews

Quality domain Finding Study
PROBAST risk of bias 79% high overall risk of bias Yang et al. [18]
TRIPOD adherence Median 49% item compliance Yang et al. [18]
External validation Majority lack external validation (24/29 development-only studies) Yang et al. [18]
Sample size Frequently inadequate for predictor count Yang et al. [18]
Missing data handling Often unreported or inappropriate Yang et al. [18]
Insufficient sample size Most datasets too small for robust model development Akazawa and Hashimoto [19]

PROBAST, prediction model risk-of-bias assessment tool; TRIPOD, transparent reporting of multivariable prediction model for individual prognosis or diagnosis.