Figure 3: Distribution of the scores shown in Table 1 and Table
2 for A) Top1-pose RMSD, B) Top1-pose lDDT-PLI, C) Top5-pose RMSD, and
D) Top5-pose lDDT-PLI. The lines and the black dots in the bars
represent the median and the mean respectively.
As expected, the results for the Best pocket docking are better than
Blind docking, as the search space is restricted. DiffDock performs well
on this set despite only offering blind docking mode, as also reported
by the authors along with the suggestion to use DiffDock as a
ligand-specific pocket detector34. The
difference between the two scoring metrics is especially seen when
comparing blind docking results of GNINA and TankBind in Table 1. The
median RMSD is worse for GNINA indicating more “severe” failures which
bring up the RMSD, as it is an unbounded metric. In contrast, lDDT-PLI
is bounded, and all PLC poses beyond the thresholds used are assigned a
score of 0, and are less penalized by very bad predictions. In addition,
lDDT-PLI does not penalize parts of the ligand which are floating in
areas not in contact with the protein. All the tools have a significant
performance decrease when using AlphaFold models as input. This is
especially striking when considering Best pocket docking, where exact
side-chain and conformation positioning seem to be crucial for obtaining
the right ligand pose for physics-based docking tools, as seen in Figure
4, where the backbone RMSD of the AlphaFold model is 3.56 Å and it is
clear that a rearrangement has pushed a helix into the binding pocket,
preventing the correct ligand pose from being found. This trend is not
as striking for the deep learning tool DiffDock, as its training has
less reliance on side-chain atoms, although the performance is still
lower than on crystal structures.