Databases preparation
Database 1, containing 29360 amino acid sequences, was built with NCBI BLAST at the following request:
Database 2, containing 2416 nucleotide sequences, was created using the NCBI Gene Database (https://www.ncbi.nlm.nih.gov/gene/).
The original databases contained a lot of junk items such as duplicates of certain sequences and truncated sequences.
Truncated sequences were excluded according to the following criteria: 450 aa residues ≤ number of amino acid residues in Hsp60 (amino acid sequences) ≤ 650 aa residues, 1350 bp ≤ number of nucleotides in Hsp60 gene (nucleotide sequences) ≤ 1950 bp. The criterion 99% ≤ PID ≤ 100% was used to remove duplicate amino acid sequences, where the PID is the percent identity of two compared sequences.