Interpretation
An understanding of the degree of likelihood associated with various symptoms and features in the clinical history can help assessment of patients with possible endometriosis in primary care.
The negative association between elevated BMI and endometriosis shown in the complete verification group is consistent with that demonstrated previously.44 This was not replicated across other groups. This may reflect a greater negative correlation between elevated BMI in higher risk populations in the all surgical cohorts who may have more severe disease. This possibility is consistent with previous studies, demonstrating a significantly lower BMI in those with severe compared to mild disease and a 12-14% decrease in the likelihood of endometriosis being diagnosed for each unit increase in BMI (kg/m2).32,45 The interplay between BMI and endometriosis pathogenesis, however, remains poorly understood.
The trend of data from the partial verification and database/self-reporting groups to demonstrate better performing accuracy measures was likely a reflection of the selection of controls. This effect seems to outweigh the possibility of an undiagnosed disease burden in those not exposed to a surgical reference standard. The accuracy of self-reported diagnosis of endometriosis has been assessed and performs well,46 false attribution of disease in the self-reporting group may therefore only present a small source of bias.
A greater specificity than sensitivity of tests may be associated with their correlation to disease severity. Dyschezia and dyspareunia have been linked to severe disease due to the involvement of a precise anatomical location in invasive disease, for example, but are less often present in mild cases.47,48 Tests showing a greater sensitivity such as dysmenorrhea were also less specific, which may only become specific for endometriosis in more severe forms.
Previous systematic reviews have similarly highlighted the heterogeneity and poor methodological quality of primary studies, limiting interpretation of findings.17,49 As our methodology allowed wide inclusion criteria, we applied a novel grading protocol to more quantitively assess limitations. Grading of evidence for index tests was performed for sensitivity and specificity by application of a visual pentagon model for grading of test accuracy studies described by Rogozinska and Khan.50 This methodology is described in detail elsewhere but briefly, studies were given a score of 0 to -2 in each of 5 domains: design (study design type); risk of bias (QUADAS 2 risk of bias); indirectness (QUADAS 2 applicability); inconsistency (visual assessment of inter-study variance in confidence intervals); and imprecision (width of confidence intervals). The complete verification group showed the fewest limitations, whist the database/self-reporting studies showed very serious limitations. There was greater limitation in the investigation category tests due to more highly selective populations and a generally higher inter-study inconsistency and imprecision.