Jack Wilkinson1
1 Centre for Biostatistics, Manchester
Academic Health Science Centre, Division of Population Health, Health
Services Research, and Primary Care.
Water, water everywhere, but not a drop to drink. Ewington et al.
[BJOG CURRENT ISSUE] have undertaken a systematic review of
prediction models for fetal macrosomia and large for gestational age.
The authors identified 111 models, described in 58 studies. This finding
alone should give us pause. We might ask whether it is a good use of
resources to have so many models developed for the same purpose, or if
instead it might have been preferable for most of this time, money and
effort to be directed elsewhere. Redundancy is not the only cause for
alarm however. The review authors note that, of the 111 models
identified, none were ready for clinical implementation. To date, this
massive effort has failed to benefit a single patient.
How can so much effort amount to so little? The authors critically
appraised the included studies using the PROBAST tool (Wolff, et al.,
2019, 170(1):51-58), judging only 5 of 58 studies to be at low risk of
bias. This suggests that, while a huge amount of work has been
undertaken in the development of these models, regrettably, most has not
been done proficiently. The authors drew attention to inadequate methods
of analysis as a recurring limitation, which could include flaws in
sample size determination, predictor selection, representation of
predictors in the modelling process, handling missing data, and
measuring model performance (for example, failing to consider model
calibration). Even the models which were at low risk of bias were not
suitable for implementation. Two relied on predictors which are not
routinely measured, rendering them impracticable, while the remainder
had not yet had their performance assessed in a separate dataset, which
is an essential step in the validation of a model.
Massive waste in prediction research appears to be the status quo across
medicine. To some extent, this might be attributable to naiveté of
researchers, who may be unfamiliar with methodological standards for the
development and evaluation of prognostic models and perhaps also with
the fact that a model developed using flawed methods might do more harm
than good. Prognostic research should not be undertaken without
sufficient methodological expertise. Another reason might be that much
research is done not for patient benefit so much as it is for
professional benefit. This provides an incentive to throw another model
onto an ever-expanding pile in order to add another line to one’s CV.
The fact that it is straightforward to develop a poor, albeit
publishable model exacerbates the issue; all one requires is a dataset,
some statistical software, and a gung-ho attitude. It would typically be
preferrable to consider whether potentially suitable models already
exist, and to undertake external validation studies. Systematic reviews
play an important role here, by detailing everything that has been done
to date. To illustrate, the present review identified two models
developed using sound methodology, which could now be subjected to
external validation. Finally, journals have an important gatekeeping
role to play, by refusing publication of models lacking clear
justification and rigorous methods.