Test accuracy
Due to heterogeneity in methodology and study quality, meta-analysis was performed on studies from each group separately.
The accuracy of index tests to predicting endometriosis was variable, although results across groups were consistent. Each index test gave a positive likelihood for the presence of pelvic endometriosis, apart from a BMI ≥30kg/m2, which decreased the likelihood of disease. The positive likelihood ratio (LR+) for disease was highest in investigation tests and there was a trend towards a greater specificity than sensitivity. The summary results of bi/univariate meta-analysis are shown in Figure 3. An assessment of confidence in individual sensitivity and specificity of each test is displayed by a visual pentagon model, the methodology for this assessment is described in the discussion and legend shown in Figure 4.
Investigation category tests were the best performing overall and TVUSS finding of endometrioma gave the highest summary LR+ at 21.6, at sensitivity and specificity of 77.2% and 96.4% respectively. Serum CA-125 >35U/mL showed sensitivity and specificity of 55.8% and 92.7% respectively, with LR+ of 7.63. TVUSS finding of DIE had showed sensitivity and specificity of 86.5% and 80.2% with LR+ of 4.39.
Symptom based tests showed LR+ within a similar range: 1.47 (dysmenorrhoea) to 1.93 (dyspareunia). Symptoms showed a generally higher specificity than sensitivity. Dyspareunia showed the highest LR+ at 1.93 with a sensitivity and specificity of 36.3% and 81.1% respectively.
Family history of endometriosis showed a LR+ of 6.25 with a high specificity (98.5%) but low sensitivity (9.25%). The finding of BMI ≥30kg/m2 showed a decreased likelihood of diagnosis of endometriosis (LR+ 0.44).
Hierarchical Summary Receiver Operating Characteristics (HSROC) curves for index tests in each group are shown in Figures S6-8. The HSROC curves show the greatest area under the curve (AUC) for investigation category tests.
In the partial verification group, symptom index tests showed a greater LR+ than the complete verification group, range 2.47 (dysmenorrhoea) to 7.13 (dyschezia). Specificity was also higher, range 69% (dysmenorrhoea) to 92% (dyschezia).
In the database/self-reporting group symptom-based index tests performed similarly to other groups. In partial verification and database/self-reporting groups BMI ≥30 kg/m2 showed no correlation with disease and had 95% CI crossing 1.0. In all other index tests across all groups the 95% CI was >1.0.
The greatest inter-study variability in confidence intervals was shown in Forest plots for the symptom-based tests, notably pelvic pain. The inter study variance for specificity was generally lower than that for sensitivity, as was the overall width of confidence intervals. Forest plots for each index test in each group are shown in Supplementary Figures S9-15.
Sensitivity analysis performed for studies without any high-risk features is shown in Table 1. All studies included are from the complete verification group. Summary accuracy measures are consistent with those in this group for the majority of index tests, although sensitivity for TVUSS finding of endometrioma and DIE reduced to 69.8% and 73.4% respectively.