Modeling results in the context of other CASP14 groups and
automated model selection
The results, presented in Table 1 and Supplementary Table S1, do not
tell much about our relative success. To investigate our performance in
the CASP14 context, we compared our results (group “Venclovas”) with
those of three other top performing groups for models designated as
first (model 1). We also included our automatic scoring method
(“VoroMQA-select-new”) as a virtual group, allowing it to make
selections from all CASP14 multimeric models (produced by both automatic
servers and human groups). By doing this we aimed to test the
effectiveness of our automatic model scoring method in the best possible
scenario. For the performance comparison, we used the sum of z-scores of
two interface accuracy measures (ICS and IPS) and two global structure
accuracy measures (lDDT and TM-score) (Figure 2).
The comparison revealed that different features of our models were
predicted with different level of success. According to the accuracy of
intersubunit interfaces (ICS and IPS) we achieved the best results. In
particular, we were successful in predicting interface patches (IPS),
whereas the prediction of specific residue-residue contacts (ICS) was
somewhat less successful. On the other hand, the global structure
accuracy of our models is not so great compared to other top performing
groups. This is especially visible if we consider lDDT, an all-atom
score, largely reflecting the accuracy of individual subunits.
Interestingly, our automatic model selection method showed relatively
strong performance, taking the third position by any of the four scores.
Although this method performed worse than our human group on both
interface accuracy measures and TM-score, the results according to
all-atom accuracy (lDDT) were quite a bit better.
To look at different features in more detail, we examined per-target
z-scores. Z-score values were accumulated progressively for targets
ordered by the best ICS achieved by any group, which can be interpreted
as an estimate of the target difficulty. Figure 3 shows the resulting
plots for the models designated as first (model 1). In addition to the
data for the same top groups and ”VoroMQA-select-new”, the plots also
include the data for the best models provided by any predictor group.
The latter curve may be considered as a reference by representing the
upper limit of what could have been achieved in CASP14.
Interestingly, the per-target analysis (Fig. 3) revealed that the
relative success of different groups was dependent not only on the
evaluation measure as seen in Fig. 2, but also on the prediction
targets. According to the interface prediction accuracy, our group
dominated for most of the targets (Fig. 3A,B). On the other hand, if we
consider the global accuracy of models the picture is different.
According to TM-score (Fig. 3D) our models are below the
state-of-the-art for about half of targets, whereas according to lDDT
(Fig. 3C) this is true for nearly all the targets. To see whether our
models as assessed by lDDT were indeed significantly inferior to those
of other top groups, we examined the cumulative raw values
(Supplementary Figure S3). Surprisingly, it turned out that the absolute
differences between the groups, especially if evaluated using lDDT (Fig.
S3F), are relatively small. This indicates that in most cases subunit
structures were of comparable accuracy and that relatively large z-score
differences resulted from small structural improvements. The same
analysis performed with the CAD-score-based analogs of ICS, IPS and lDDT
scores led to similar conclusions (Supplementary Figure S4).
In addition to individual scores, we analyzed their combinations
reflecting either the interface prediction accuracy or the accuracy of
both the interface and the global structure. We performed this analysis
both for models designated as first (Fig. S5) and for the best-of-five
models (Fig. S6). The analysis of these combinations has further
corroborated above observations on our relative success in the interface
prediction and on target-dependent group performance. Interestingly, in
the analysis of best-of-five models our automatic selection method
(VoroMQA-select-new) was the best according to the interface accuracy
(Fig S6A,C) and close to the top according to the combined accuracy (Fig
S6B,D). Although having access to all the models VoroMQA-select-new had
an important advantage over other groups, the results suggest that this
automatic selection procedure is quite robust.