RESULTS AND DISCUSSIONS
Overall performance
My team, PEZYFoldings got third place with GDT-TS (First place with the
Assessor’s formulae) in the single-domain category and tenth place in
the multimer category. Looking at the ranking on the all submitted
models, PEZYFoldings got fourth place with GDT-TS (First place with the
Assessor’s formulae) in the single-domain category and fourth place in
the multimer category. The improved ranking in the multimer category,
considering all submitted models, suggests that there is room for
enhancement in my ranking and selection process for multimeric
structures.
After the competition, all generated models, including unsubmitted ones,
were assessed based on their TM-scores29 (Fig. S1,
S2). Optimal chain mapping for the multimer targets were obtained using
US-align34,35 or MM-align30.
TM-scores were calculated using TM-score software. Among the 93
single-domain targets with available ground truth structures, 10 targets
had superior models displaying significant TM-score differences
(>0.1) compared to the submitted models. Likewise, among
the 36 multimer targets with available ground truth structures, three
targets possessed better models exhibiting substantial TM-score
differences (>0.1) compared to the submitted models. It is
important to note that these results cannot be directly attributed to
the inadequacy of my model ranking and selection procedure, as some
models were not processed in the ranking and selection step. Instead,
the results suggest the full potential of the structure prediction
components.
Notable targets
In this section, I will discuss specific targets that are likely to be
of particular interest to readers.
T1130
For T1130, I could obtain hits only from the MGnify26database. However, the identities between the hits and the query was
low. In addition, the target was described as an aphid protein;
therefore, the hits were suspicious. Furthermore, confidence scores of
the resulting models were poor. I performed de novo -like
structure prediction using the refinement model and built approximately
5,000 models. And structures with relatively high self-confidence scores
were submitted; however, these scores were lower than those of the usual
targets. The plDDT of the top structure was 68.99. This did not reach
the level often observed in successful predictions, which tend to have
plDDT above 80. Assessment after the competition showed that all the
produced structures had an insufficient TM-score (Fig. S1). According to
discussions during the CASP15 conference, the two teams with the highest
ranks in the single-domain category had hits for T1130. Therefore, my
poor performance on this target was caused by a deficiency in the
sequence similarity search conditions.
H1137
Due to the large size of H1137, domain parsing was performed through
visual inspections. Initially, the features derived from the MSA were
divided into multiple segments and concatenated, resulting in
approximately 1000-2000 amino acids in total and the predicted partial
structures (step 1, Table S3). Utilizing the outcomes of step 1, I
constructed additional partial structures (step 1.5, Table S3) to verify
the accuracy of my assumptions regarding subunit interactions. These
predictions suggested that the N-terminal regions of s1-s6 interacted
with s8 and s9, while s7 interacted with s8 and s9. Consequently, I
constructed partial structures using: 1) N-terminal regions of s1-s6 and
full-length s8 and full-length s9; 2) middle part of s1-s6; 3)
C-terminal regions of s1-s6; 4) N-terminal regions of s1-s6, N-terminal
regions of s7, full-length s8, and full-length s9; 5) full-length s7;
and 6) GFP domain of s7 (step 2, Table S3). The predicted partial
structures in step 2 were concatenated, and the subunit structures were
extracted. Note that because complete structures of H1137 were intended
to be built at the submission date of H1137, structures submitted as
independent subunit structures were from partially concatenated
structures.
Similar to the usual monomer targets, the sum of plDDTs was used as the
selection criterion. The refinement was not performed because the
performance of the refinement model on a partial structure was
considered poor and the entire assembly structure was too large to
process.
According to the single-domain category results in CASP15, I achieved
Z-scores greater than 2.0 for six targets. Five of these six targets
were the helical domains of H1137 subunits (D2 domains of s1, s2, s3,
s4, and s5), which were challenging to predict as monomers. Hence,
domain parsing that considers the interface was essential for my high
performance.
T1173-D2
Regarding T1173, the semi-automatic protocol did not yield structures
that displayed promising results when compared to the
ColabFold32 results (Fig. 2C, first panel and forth
panel). Using Quick BLASTP15,36,37 search, T1173 was
observed to be part of a longer sequence. Therefore, I extended 196 aa
from the N-terminus using a longer sequence and predicted the structures
again. After constructing several structures, I noticed that the quality
of the C-terminal region (based on visual inspection, Fig. 2C, second
panel) was inferior compared to the ColabFold result (Fig. 2C, fourth
panel). I examined the depth of the MSA and observed a highly skewed
distribution; the deepest part had more than 400,000 sequences, while
the last ten aa region had fewer than 1,000 sequences (Fig. 2A).
Consequently, I selected hits in the final ten aa, randomly selected 500
sequences from the original MSA, and flattened their distribution (Figs.
2A, 2B). The resulting models appeared satisfactory (Fig. 2C, third
panel). Nevertheless, it is important to note that other groups
submitted more accurate structures. The N-terminal region of D2
(positions 63-113 in the full-length sequence) in my MODEL 1 could not
be aligned with the ground truth structure using TM-score software.
Enhanced structures might have been achieved if I had included more
sequences from positions 63-113.
Assessment of the impact of individual
elements
Impact of the extended sequence similarity
search
To examine the impact of the extended sequence similarity search
process, I conducted an assessment of predictions after the competition,
focusing on the differences in input MSAs. I employed two types of MSAs
for predicting target structures. The first MSA set comprised MSAs
utilized by PEZYFoldings (PEZY-MSA), while the second set was generated
using the default settings of the AlphaFold2 or AlphaFold-Multimer
pipeline provided by the NBIS-AF2-standard and NBIS-AF2-multimer teams
(NBIS-MSA). I examined targets with a total length of 1200 aa or less.
However, out-of-memory errors occurred for T1124, T1132, and T1174.
Consequently, 54 single-domain targets were investigated. PEZY-MSA had
at least one more sequences than NBIS-MSA, except for the targets
T1133-D1, T1131-D1, T1122-D1, and T1119-D1 (Table S4) . Thus, I can
confirm that I obtained more evolutionarily related sequences than the
default settings in over 90% of cases. The number of sequences in
PEZY-MSA for specific targets could be smaller than those of NBIS-MSA
due to: 1) running hhblits21 against
UniRef3023 and BFD24 separately; 2)
not using the UniRef9038 database; 3) the number of
iterations against BFD was changed to two from three; 4) using a more
stringent e-value (0.00001) for jackhmmer19,20compared to the default settings (0.0001); and 5) applying
hhfilter22 to hits from BFD and MGnify. When
clustering sequences with a sequence identity threshold of 62%, a
criterion for effective sequence counts used in previous
studies13,39, I obtained larger values than the
default settings in 43 of the 54 cases (Table S4).
The ΔTM-score (TM-score of structures with PEZY-MSA minus the TM-score
of structures with NBIS-MSA) as a function of Nseq-NBIS-MSA (number of
sequences in NBIS-MSA) is illustrated in Figs. 3C and 3D. Seven and five
targets demonstrated a ΔTM-score >0.05 for MODEL 1 (the
model with the highest confidence) and the best model among the five
generated models, respectively (Fig. 3C, 3D). All targets with a
ΔTM-score >0.05 had an Nseq-NBIS-MSA of less than 1000. The
ΔTM-score for targets with Nseq-NBIS-MSA greater than 1000 was minimal,
which is consistent with the results in the original publication; the
quality of predictions by AlphaFold2 increases until the number of
sequences or Neff is approximately 100-10001. This
trend was also observed in the CASP15 results. Among the 53 targets, I
had nine targets with a Z-score greater than 1.0, and seven out of those
nine targets had an Nseq-NBIS-MSA of less than 1000 (Figure 4E, Table
S4).
Impact of the deep-learning-based refinement
model
The TM-scores of the models submitted to the competition website were
collected to investigate the refinement model’s effectiveness. The
TM-scores before and after the last refinement are summarized in Fig. 4.
Models subjected to docking or de novo -like structure predictions
were excluded. The refinement model improved the quality of some
predicted structures (Figs. 4A, 4D); however, from the point of view of
the performance in the competition, the differences in the TM-score were
indistinguishable (Figs. 4B, 4C, 4E, 4F). In other words, although the
refined structures had better accuracy than the original structures, the
other structures achieved the same or better levels of accuracy without
refinement. In CASP15, there were seven conventional antibody-antigen or
nanobody-antigen targets. I could build three out of seven targets with
an average
DockQ40 score
>0.49, which meets the medium-quality threshold in
CAPRI41 criteria. As mentioned in the introduction,
the refinement model was anticipated to perform well with antibodies.
However, the results obtained from the model indicate that further
efforts are required to reach the desired level of success.