References

1. Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596 , 583-589, doi:10.1038/s41586-021-03819-2 (2021).
2. Pereira, J. et al. High-accuracy protein structure prediction in CASP14. Proteins 89 , 1687-1699, doi:10.1002/prot.26171 (2021).
3. Jumper, J. et al. Applying and improving AlphaFold at CASP14.Proteins 89 , 1711-1721, doi:10.1002/prot.26257 (2021).
4. Evans, R. et al. Protein complex prediction with AlphaFold-Multimer. bioRxiv , 2021.2010.2004.463034, doi:10.1101/2021.10.04.463034 (2022).
5. Lin, Z. et al. Language models of protein sequences at the scale of evolution enable accurate structure prediction. bioRxiv , 2022.2007.2020.500902, doi:10.1101/2022.07.20.500902 (2022).
6. Baek, M. et al. Accurate prediction of protein structures and interactions using a 3-track network. bioRxiv , 2021.2006.2014.448402, doi:10.1101/2021.06.14.448402 (2021).
7. Wu, R. et al. High-resolution de novo structure prediction from primary sequence. bioRxiv , 2022.2007.2021.500999, doi:10.1101/2022.07.21.500999 (2022).
8. Pearce, R. & Zhang, Y. Toward the solution of the protein structure prediction problem. J Biol Chem 297 , 100870, doi:10.1016/j.jbc.2021.100870 (2021).
9. Ovchinnikov, S. et al. Protein structure determination using metagenome sequence data. Science 355 , 294-298, doi:10.1126/science.aah4043 (2017).
10. Mori, H. et al. PZLAST: an ultra-fast amino acid sequence similarity search server against public metagenomes.Bioinformatics , doi:10.1093/bioinformatics/btab492 (2021).
11. Ishikawa, H. et al. PZLAST: an ultra-fast sequence similarity search tool implemented on a MIMD processor. International Journal of Networking and Computing 12 , 446-466, doi:10.15803/ijnc.12.2_446 (2022).
12. Kitts, P.A. et al. Assembly: a resource for assembled genomes at NCBI. Nucleic Acids Res 44 , D73-80, doi:10.1093/nar/gkv1226 (2016).
13. Bryant, P., Pozzati, G. & Elofsson, A. Improved prediction of protein-protein interactions using AlphaFold2. Nat Commun13 , 1265, doi:10.1038/s41467-022-28865-w (2022).
14. Coordinators, N.R. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 43 , D6-17, doi:10.1093/nar/gku1130 (2015).
15. Altschul, S.F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res25 , 3389-3402 (1997).
16. Oda, T., Lim, K. & Tomii, K. Simple adjustment of the sequence weight algorithm remarkably enhances PSI-BLAST performance. BMC Bioinformatics 18 , 288, doi:10.1186/s12859-017-1686-9 (2017).
17. Oda, T. Refinement of AlphaFold-Multimer structures with single sequence input. bioRxiv , 2022.2012.2027.521991, doi:10.1101/2022.12.27.521991 (2023).
18. Eastman, P. et al. OpenMM 7: Rapid development of high performance algorithms for molecular dynamics. PLoS Comput Biol13 , e1005659, doi:10.1371/journal.pcbi.1005659 (2017).
19. Potter, S.C. et al. HMMER web server: 2018 update.Nucleic Acids Res 46 , W200-W204, doi:10.1093/nar/gky448 (2018).
20. Eddy, S.R. Accelerated Profile HMM Searches. PLoS Comput Biol7 , e1002195, doi:10.1371/journal.pcbi.1002195 (2011).
21. Remmert, M., Biegert, A., Hauser, A. & Soding, J. HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat Methods 9 , 173-175, doi:10.1038/nmeth.1818 (2011).
22. Steinegger, M. et al. HH-suite3 for fast remote homology detection and deep protein annotation. BMC Bioinformatics20 , 473, doi:10.1186/s12859-019-3019-7 (2019).
23. Mirdita, M. et al. Uniclust databases of clustered and deeply annotated protein sequences and alignments. Nucleic Acids Res45 , D170-D176, doi:10.1093/nar/gkw1081 (2017).
24. Steinegger, M., Mirdita, M. & Soding, J. Protein-level assembly increases protein sequence recovery from metagenomic samples manyfold.Nat Methods 16 , 603-606, doi:10.1038/s41592-019-0437-4 (2019).
25. UniProt, C. UniProt: the universal protein knowledgebase in 2021.Nucleic Acids Res 49 , D480-D489, doi:10.1093/nar/gkaa1100 (2021).
26. Mitchell, A.L. et al. MGnify: the microbiome analysis resource in 2020. Nucleic Acids Res 48 , D570-D578, doi:10.1093/nar/gkz1035 (2020).
27. Hyatt, D. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics11 , 119, doi:10.1186/1471-2105-11-119 (2010).
28. Gao, M., Nakajima An, D., Parks, J.M. & Skolnick, J. AF2Complex predicts direct physical interactions in multimeric proteins with deep learning. Nat Commun 13 , 1744, doi:10.1038/s41467-022-29394-2 (2022).
29. Zhang, Y. & Skolnick, J. Scoring function for automated assessment of protein structure template quality. Proteins 57 , 702-710, doi:10.1002/prot.20264 (2004).
30. Mukherjee, S. & Zhang, Y. MM-align: a quick algorithm for aligning multiple-chain protein complex structures using iterative dynamic programming. Nucleic Acids Res 37 , e83, doi:10.1093/nar/gkp318 (2009).
31. Letunic, I., Khedkar, S. & Bork, P. SMART: recent updates, new developments and status in 2020. Nucleic Acids Res 49 , D458-D460, doi:10.1093/nar/gkaa937 (2021).
32. Mirdita, M. et al. ColabFold: making protein folding accessible to all. Nat Methods 19 , 679-682, doi:10.1038/s41592-022-01488-1 (2022).
33. Fu, L., Niu, B., Zhu, Z., Wu, S. & Li, W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics28 , 3150-3152, doi:10.1093/bioinformatics/bts565 (2012).
34. Zhang, C. & Pyle, A.M. A unified approach to sequential and non-sequential structure alignment of proteins, RNAs, and DNAs.iScience 25 , 105218, doi:10.1016/j.isci.2022.105218 (2022).
35. Zhang, C., Shine, M., Pyle, A.M. & Zhang, Y. US-align: universal structure alignments of proteins, nucleic acids, and macromolecular complexes. Nat Methods 19 , 1109-1115, doi:10.1038/s41592-022-01585-1 (2022).
36. Camacho, C. et al. BLAST+: architecture and applications.BMC Bioinformatics 10 , 421, doi:10.1186/1471-2105-10-421 (2009).
37. Altschul, S.F. et al. Protein database searches using compositionally adjusted substitution matrices. FEBS J272 , 5101-5109, doi:10.1111/j.1742-4658.2005.04945.x (2005).
38. Suzek, B.E. et al. UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches.Bioinformatics 31 , 926-932, doi:10.1093/bioinformatics/btu739 (2015).
39. Kosciolek, T. & Jones, D.T. Accurate contact predictions using covariation techniques and machine learning. Proteins 84 Suppl 1 , 145-151, doi:10.1002/prot.24863 (2016).
40. Basu, S. & Wallner, B. DockQ: A Quality Measure for Protein-Protein Docking Models. PLoS One 11 , e0161879, doi:10.1371/journal.pone.0161879 (2016).
41. Lensink, M.F., Mendez, R. & Wodak, S.J. Docking and scoring protein complexes: CAPRI 3rd Edition. Proteins 69 , 704-718, doi:10.1002/prot.21804 (2007).