Interpretation
An understanding of the degree of likelihood associated with various
symptoms and features in the clinical history can help assessment of
patients with possible endometriosis in primary care.
The negative association between elevated BMI and endometriosis shown in
the complete verification group is consistent with that demonstrated
previously.44 This was not replicated across other
groups. This may reflect a greater negative correlation between elevated
BMI in higher risk populations in the all surgical cohorts who may have
more severe disease. This possibility is consistent with previous
studies, demonstrating a significantly lower BMI in those with severe
compared to mild disease and a 12-14% decrease in the likelihood of
endometriosis being diagnosed for each unit increase in BMI
(kg/m2).32,45 The interplay between BMI and
endometriosis pathogenesis, however, remains poorly understood.
The trend of data from the partial verification and
database/self-reporting groups to demonstrate better performing accuracy
measures was likely a reflection of the selection of controls. This
effect seems to outweigh the possibility of an undiagnosed disease
burden in those not exposed to a surgical reference standard. The
accuracy of self-reported diagnosis of endometriosis has been assessed
and performs well,46 false attribution of disease in
the self-reporting group may therefore only present a small source of
bias.
A greater specificity than sensitivity of tests may be associated with
their correlation to disease severity. Dyschezia and dyspareunia have
been linked to severe disease due to the involvement of a precise
anatomical location in invasive disease, for example, but are less often
present in mild cases.47,48 Tests showing a greater
sensitivity such as dysmenorrhea were also less specific, which may only
become specific for endometriosis in more severe forms.
Previous systematic reviews have similarly highlighted the heterogeneity
and poor methodological quality of primary studies, limiting
interpretation of findings.17,49 As our methodology
allowed wide inclusion criteria, we applied a novel grading protocol to
more quantitively assess limitations. Grading of evidence for index
tests was performed for sensitivity and specificity by application of a
visual pentagon model for grading of test accuracy studies described by
Rogozinska and Khan.50 This methodology is described
in detail elsewhere but briefly, studies were given a score of 0 to -2
in each of 5 domains: design (study design type); risk of bias (QUADAS 2
risk of bias); indirectness (QUADAS 2 applicability); inconsistency
(visual assessment of inter-study variance in confidence intervals); and
imprecision (width of confidence intervals). The complete verification
group showed the fewest limitations, whist the database/self-reporting
studies showed very serious limitations. There was greater limitation in
the investigation category tests due to more highly selective
populations and a generally higher inter-study inconsistency and
imprecision.