Results

The search strategy identified 1723 citations; following removal of duplicates and screening, 52 full text articles were assessed for eligibility (PRISMA Flow Diagram, Figure 1 ). This review included 11 studies with a total of 11 final prediction models identified.
The populations of the included studies are shown in Tables 2 and 3 . Four studies included only women with placenta praevia, four studies included only vaginal deliveries, two studies had a population consisting of CS (planned and unplanned) and one study had a population encompassing the general obstetric population.
The key findings of the studies are detailed in Table 2 including whether the study is to be interpreted as exploratory (requiring more research) or confirmatory (of use in clinical practice) as judged by the primary study authors. All candidate predictors and the predictors included in the final published models is listed in Table 3. The setting of the included studies were hospitals across the following countries; Italy, China, France, United States, United Kingdom, South Korea, Netherlands, Spain, Zimbabwe, Denmark and Egypt. The study designs included were eight cohort studies of which one used whole population registry data, and three case-control, of which one was nested within a population cohort. The number of participants included in each study ranged from 110 in a prospective cohort study to 56,967 in a retrospective cohort.
Despite the attempt to predict PPH across all studies, the chosen outcomes differed. Five studies listed PPH or massive haemorrhage as an outcome, three studies listed blood transfusion or massive blood transfusion as an outcome, two studies reported postpartum blood loss, and one study had a combined outcome of peripartum complications encompassing perioperative blood transfusion or uterine artery embolization or caesarean hysterectomy. There is also variation in the definition and method of measurement of each outcome as shown inTable 2 .
The quality of studies, assessed using the CHARMS checklist to assess risk of bias, is summarised in Table 4 . Overall there was a high risk of bias across the studies. The source of data was deemed of low/moderate risk of bias in eight studies due to the use of a retrospective design for measurement of predictor and outcome. Two studies
were at high risk of bias due to a lack of definition or method of measurement of the outcome to be predicted. There was a high risk of bias for the candidate predictors in three studies due to a lack of definition or predictors requiring subjective interpretation. Regarding sample size, six studies were of high risk of bias for sample size as a result of a low number of events per variable (EPV). Risk of bias for missing data was uncertain for all papers because none reported any missing data.
From the 11 studies there was a total of 97 unique variables selected as candidate predictors (range 5-23 per study) and 56 variables selected as predictors (range 5-15 per study) in the final models. The following predictors were found to be predictive in two or more studies: (parity n=4 studies), low antenatal haemoglobin (n=3), antepartum haemorrhage/bleed (n=3), maternal age ≥35 years old (n=4), high neonatal weight (n=2), multiple pregnancy (n=2), BMI ≥25 (n=2), previous CS (n=3), anterior placenta (n=2) and retained placenta (n=2).
The predictive ability of the statistical models evaluated using measures of calibration (concerned with agreement between the predicted probabilities of the outcome and the observed proportions of the outcome) and discrimination (how well the model can differentiate between patients with high and low risk) was evident in four and six
out of 11 studies respectively. Of the four studies to report calibration, two used the Hosmer-Lemeshow (H-L) test with Kim et al., reporting good calibration with a result of p=0.44 and Rubio-Alvarez et al., failing to report a result. However, Hosmer-Lemeshow test is not recommended for calibration assessment due to poor interpretation as it does not provide a direction or magnitude of the miscalibration and has limited power in small samples. Biguzzi presented a calibration plot demonstrating overall good performance, however, there was inadequate information relating to curve development. Ahmadzia et al., report calibration plots and association between predicted probability of transfusion and observed incidence in deciles of the risk score distribution. However, the authors have not reported, at the very least, a Hosmer-Lemeshow test nor demonstrated a suitable calibration plot.14 The calibration plots are described as curves but only display a point for each decile with no 95% confidence intervals. Ideally the calibration slope should be reported along with a calibration curve demonstrating the non-parametric relationship between observed outcome and predicted risk.28 Discrimination was reported as the area under the receiver operator curve (AUC) where 1 is perfect discrimination and 0.5 is no better than a coin toss. The AUC ranged from 0.70 to 0.9 across all studies as shown in Table 2.
Of 11 studies, four presented validated models deemed by their primary study authors as ready for use in clinical practice.14,18,19,22 Ahmadzia et al., present an online risk calculator developed in patients who underwent CS and Dunkerton et al., presented a decision tree based on Hothorn et al’s non-parametric recursive partitioning algorithm also developed in women who underwent a CS. Kim et al., presented a scoring system developed in women with placenta praevia and Rubio-Alvarez et al., present an Excel™ risk tool developed in women vaginally delivering singletons. However, Ahmadzia et al and Dunkerton et al did not externally validate their models – an important requirement before use in clinical practice.29 The discriminatory performance on external validation for Kim et al and for Rubio-Alvarez et al models were good with AUCs of 0.88 and 0.83 respectively.