Figure 1. Symmetric matrix of the average PID values. The matrix contains 60 sub-clusters of Hsp60 sequences from 19 phyla. The X- and Y-axis items “Sub-cluster” are represented in the following format “Phylum #sub-cluster (number of sequences in a sub-cluster)”. Sub-clusters of Viruses have no phylum labels. The Y-axis “Kingdom” represents sub-clusters united by a higher taxonomic rank (Kingdom). The black frames and the X-axis “Cluster” show four clusters and their numbers (Roman numerals), which were obtained by clustering 60 sub-clusters. Clustering was performed using the UPGMA algorithm.
Apparently, the symmetric matrix contains four clusters. Cluster I contains 258 Hsp60 sequences from Metazoa (multicellular animals). It should be noted that, as a rule, in this cluster the average PID values are more than 60% and 50% within and between sub-clusters, respectively (Supplementary, PID, SD). The number of identical amino acid residues in the sequences reflects the degree of conservatism. Thus, the Hsp60 sequences belonging to cluster I can be considered as intermediate and highly conserved, as noted in our previous work14. Cluster II includes 608 Hsp60 sequences of Fungi, Plantae, Protozoa, and Metazoa. It should be noted that fungal Hsp60s were clustered into a small group with average PID values greater than 60% and 50% within and between sub-clusters, respectively, as observed for cluster I. Thus, the Hsp60 amino acid sequences from Fungi can also be classified as intermediate and highly conserved. Others sub-clusters in cluster II demonstrate intermediate and low sequence conservatism, with average PID values ranging from 40±30% to 62±13% within sub-clusters and from 24±4% to 47±19% between them.
The largest cluster III contains 18244 sequences (22 sub-clusters), mainly including Hsp60 of Bacteria, Plantae, and Archaea. It should be noted that within the 11 sub-clusters in cluster III, the average PID values vary from 53±10% (Firmicutes #2) to 73±8% (Cyanobacteria #1), indicating intermediate and highly conserved Hsp60 sequences. However, between the sub-clusters of cluster III, these values are less than 50%. Thus, it can be assumed that, in the whole, the level of conservatism in cluster III is low.
Finally, there is the smallest cluster IV containing 110 Hsp60 sequences from Viruses, Bacteria, Protozoa, and Metazoa. In this cluster, sub-clusters of Viruses #1, Apicomplexa #3, Chlamydia #1, and Apicomplexa #2 show average PID values of more than 50%. On the other hand, the PID values between the sub-clusters of cluster IV are quite low and range from 10±1% (Chlamydiae #1/Apicomplexa #2) to 33±2% (Apicomplexa #2/Apicomplexa #3). Moreover, the average PID values between cluster IV and other clusters are also low (Figure 1). This extremely low level of conservatism of Hsp60 sequences in cluster IV may indicate how these Hsp60s have evolved to distance themselves from others Hsp60.
In some studies2,4,17,25 Hsp60 is called a highly conserved protein. But, according to the obtained data, the percent of identical amino acid residues varies widely. Thus, Hsp60 is not a highly conserved protein.
It should be noted that metazoan Hsp60 sequences belonging to sub-clusters Arthropoda #2 and #3, and Nematoda #1 were not included in cluster I. To explain this phenomenon using a symmetric PID matrix (Supplementary, PID), the average PID values between Hsp60 sequences of these sub-clusters and others Hsp60 sequences belonging to each of 19 phyla were calculated (Figure 2; Supplementary, Metazoan artifacts).