Figure 1. Symmetric matrix of the average PID values. The
matrix contains 60 sub-clusters of Hsp60 sequences from 19 phyla. The X-
and Y-axis items “Sub-cluster” are represented in the following format
“Phylum #sub-cluster (number of sequences in a sub-cluster)”.
Sub-clusters of Viruses have no phylum labels. The Y-axis “Kingdom”
represents sub-clusters united by a higher taxonomic rank (Kingdom). The
black frames and the X-axis “Cluster” show four clusters and their
numbers (Roman numerals), which were obtained by clustering 60
sub-clusters. Clustering was performed using the UPGMA algorithm.
Apparently, the symmetric matrix contains four clusters. Cluster I
contains 258 Hsp60 sequences from Metazoa (multicellular animals). It
should be noted that, as a rule, in this cluster the average PID values
are more than 60% and 50% within and between sub-clusters,
respectively (Supplementary, PID, SD). The number of identical amino
acid residues in the sequences reflects the degree of conservatism.
Thus, the Hsp60 sequences belonging to cluster I can be considered as
intermediate and highly conserved, as noted in our previous
work14. Cluster II includes 608 Hsp60 sequences of
Fungi, Plantae, Protozoa, and Metazoa. It should be noted that fungal
Hsp60s were clustered into a small group with average PID values greater
than 60% and 50% within and between sub-clusters, respectively, as
observed for cluster I. Thus, the Hsp60 amino acid sequences from Fungi
can also be classified as intermediate and highly conserved. Others
sub-clusters in cluster II demonstrate intermediate and low sequence
conservatism, with average PID values ranging from 40±30% to 62±13%
within sub-clusters and from 24±4% to 47±19% between them.
The largest cluster III contains 18244 sequences (22 sub-clusters),
mainly including Hsp60 of Bacteria, Plantae, and Archaea. It should be
noted that within the 11 sub-clusters in cluster III, the average PID
values vary from 53±10% (Firmicutes #2) to 73±8% (Cyanobacteria #1),
indicating intermediate and highly conserved Hsp60 sequences. However,
between the sub-clusters of cluster III, these values are less than
50%. Thus, it can be assumed that, in the whole, the level of
conservatism in cluster III is low.
Finally, there is the smallest cluster IV containing 110 Hsp60 sequences
from Viruses, Bacteria, Protozoa, and Metazoa. In this cluster,
sub-clusters of Viruses #1, Apicomplexa #3, Chlamydia #1, and
Apicomplexa #2 show average PID values of more than 50%. On the other
hand, the PID values between the sub-clusters of cluster IV are quite
low and range from 10±1% (Chlamydiae #1/Apicomplexa #2) to 33±2%
(Apicomplexa #2/Apicomplexa #3). Moreover, the average PID values
between cluster IV and other clusters are also low (Figure 1). This
extremely low level of conservatism of Hsp60 sequences in cluster IV may
indicate how these Hsp60s have evolved to distance themselves from
others Hsp60.
In some studies2,4,17,25 Hsp60 is called a highly
conserved protein. But, according to the obtained data, the percent of
identical amino acid residues varies widely. Thus, Hsp60 is not a highly
conserved protein.
It should be noted that metazoan Hsp60 sequences belonging to
sub-clusters Arthropoda #2 and #3, and Nematoda #1 were not included
in cluster I. To explain this phenomenon using a symmetric PID matrix
(Supplementary, PID), the average PID values between Hsp60 sequences of
these sub-clusters and others Hsp60 sequences belonging to each of 19
phyla were calculated (Figure 2; Supplementary, Metazoan artifacts).