Figure captions
Figure 1. Symmetric matrix of the average PID values. The
matrix contains 60 sub-clusters of Hsp60 sequences from 19 phyla. The X-
and Y-axis items “Sub-cluster” are represented in the following format
“Phylum #sub-cluster (number of sequences in a sub-cluster)”.
Sub-clusters of Viruses have no phylum labels. The Y-axis “Kingdom”
represents sub-clusters united by a higher taxonomic rank (Kingdom). The
black frames and the X-axis “Cluster” show four clusters and their
numbers (Roman numerals), which were obtained by clustering 60
sub-clusters. Clustering was performed using the UPGMA algorithm.
Figure 2. Heat map showing normalized average PID values
between Hsp60 sequences belonging to sub-clusters Arthropoda #2,
Arthropoda #3, and Nematoda #1 and Hsp60 sequences belonging to each
of 19 phyla. The average PID values were normalized using the min-max
normalization method. The 19 phyla were sorted by NCBI Taxonomy.
Figure 3. The average amino acid composition of the Hsp60
sequences from 19 phyla: a - Heat map displaying normalized average PID
values between 19 phyla of Hsp60 sequences; b - The average amino acid
composition of the Hsp60 sequences for each of the 19 phyla compared to
the corresponding proteomic values. In Figure 3a the average values were
normalized using the min-max normalization method. The line “Summary”
presents the average normalized amino acid composition of Hsp60 for 19
phyla. In Figure 3b the amino acid profiles were represented as the
average amino acid composition of Hsp60 for each of 19 phyla compared to
the average amino acid composition of the respective proteomes. The
structure of color scale is as follows: Higher/Lower - the amino acid
content in Hsp60 is higher/lower than in proteomes, respectively;
Comparable - the amino acid content in Hsp60 is comparable to the
average proteomic value. The “Summary” line shows the average amino
acid profile of Hsp60 for 19 phyla. The groups were sorted using the
NCBI Taxonomy. Amino acids were sorted using an average amino acid
composition of 19220 Hsp60 sequences.
Figure 4. The average nucleotide composition of the Hsp60 genes
from 17 phyla: a – The average total GC contents at each positions of
codon of Hsp60 sequences and corresponding genomes; b - The average
content of GC1, GC2, and
GC3 in Hsp60 genes. Phyla were sorted by average total
GC content of Hsp60 sequences. Student’s t-test was used to compare the
average GC content of the Hsp60 sequences and the average GC content of
the corresponding genomes. The difference between two independent
samples of GC values is considered statistically significant if the
p-value is less than 0.05. Statistically indistinguishable average GC
values are marked with “ns” (non-significant).
Figure 5. Neutrality plots (GC1,2 vs.
GC3) for Hsp60 genes from 17 phyla. The
GC1,2 values represent the average GC content at the
first and second positions of codon (GC1 and
GC2), while GC3 values represent the GC
content at the third synonymous codon position. The solid line
represents the linear regression of GC1,2 versus
GC3, the correlation of which is described by the
regression coefficient R and its p-value. The correlation coefficient R
reflects the strength of the impact of GC3 on
GC1,2. The p-value characterizes the significance of R.
Changes in GC3 values actually affect the
GC1,2 values when the p-value of R is less than 0.05. In
turn, changes in GC3 are considered random, and the R
coefficient is not irrelevant when the p-value is greater than 0.05,
i.e. GC3 and GC1,2 values are not
correlated. The slope ε of the regression line indicates the
neutrality of the codon usage. Neutrality values were determined by
equation [ε × 100, %]. Slope values ranging from 0 to 1 were
calculated using the least-squares regression analysis. The dashed line
is a complete neutrality plot, which reflects the complete equilibrium
of the nucleotide composition of the gene/genome with directional
mutation pressure. The equilibrium point Ep was
defined as the intersection point of the neutrality plot (regression
line) and the complete neutrality plot. The Epvalue reflects the GC3 content of the gene/genome when
the mutation frequencies (AT→GC and GC→AT) are equal. The direction of
the mutational pressure, indicating an imbalance in the frequencies of
the AT→GC and GC→AT mutations, was determined in accordance with the
following conditions: the average GC content value less than theEp value reflects the AT mutational pressure; the
average GC content value greater than the Ep value reflects the GC mutational pressure. Phyla were sorted by
average total GC content of Hsp60 genes.
Figure 6. Neutrality plot for Hsp60 genes of Chordata
Figure 7. Nc-plots of codon usage bias in Hsp60 genes from the
17 phyla. Gray scatter plots represent ENC values versus
GC3 content for Hsp60 genes from 17 phyla. The black
bell-shaped curves represent the expected effective number of codons
(ENCexp ), i.e. predicted ENC values if codon
usage bias is influenced by GC3 content (GC content at
the third synonymous position of codons) in the Hsp60 gene only. Phyla
were sorted by the average total GC content of Hsp60 genes.
Figure 8. Clustering of ENC values for Hsp60 genes from
Chordata. Cluster #1 includes the ENC values of Hsp60 genes of
Mammalia, Aves, Reptilia, and Amphibia. Cluster #2 includes ENC values
of Hsp60 genes of Fish. Clustering was performed using the value of the
GC3 content corresponding to the equilibrium pointEp , which was determined earlier (see GC-content
and mutation pressure for codon usage).
Figure 9. Nc-plot of the average codon usage bias in the Hsp60
genes of 17 phyla. The plot space was divided into six quadrants using
the ENC and GC3 thresholds. The ENC thresholds,
reflecting the level of the Hsp60 gene expression, were as follows: ENC
< 40 for genes with high expression; 40 < ENC ≤ 55
for moderately expressed genes; ENC > 55 for low expressed
genes. The GC3 thresholds reflecting the direction of
the mutational pressure were as follows: GC3 <
0.5 represents the AT-mutation pressure; GC3> 0.5 represents the GC-mutation pressure. The average ENC
values were grouped according to the obtained quadrants: Group 1
(Apicomplexa, Firmicutes, and Bacteroidetes); Group 2 (Chlamydiae,
Streptophyta, Nematoda, Mollusca, Cyanobacteria, and Chordata); Group 3
(Euryarchaeota, Arthropoda, and Euglenozoa); Group 4 (Ascomycota,
Proteobacteria, Basidiomycota, Chlorophyta, and Actinobacteria). The
black bell-shaped curve represents the expected effective number of
codons (ENCexp ), i.e. predicted ENC values if the
codon bias is influenced only by the GC content at the third synonymous
position of codons (GC3) in the Hsp60 gene. The
horizontal dashed lines (ENC=40 and ENC=55) indicate ENC thresholds for
determining the codon usage bias and gene expression level. The vertical
dashed line indicates the GC3 content of 0.5. The
position of the ENC value regarding this line indicates the direction of
the mutation pressure affecting the Hsp60 genes (depicted by arrows).
Figure 10. Symmetric matrix of p-values of the t-test between
the ENC values of the Hsp60 genes from 17 phyla. The statistically
indistinguishable ENC values of Hsp60 genes of the two phyla, having a
t-test with a p-value greater than 0.05, are marked in black.
Statistically different ENC values of the Hsp60 genes of the two phyla,
having a t-test with a p-value less than 0.05, are marked in white. The
black frames and the “Cluster” X-axis represent the four clusters and
their numbers (Roman numerals). Clustering was carried out using the
UPGMA algorithm.
Figure 11. Average values of relative synonymous codon usage
(RSCU) for Hsp60 genes from 17 phyla. The RSCU values for each of 17
phyla were divided into two main groups according to the type of base at
the third synonymous position of codon: a - A/T-ending codons; b -
G/C-ending codons. Phyla were divided into three groups by the direction
of the mutational pressure (see Table 2) and sorted by the average total
GC content of Hsp60 genes. The codons were sorted by the average RSCU
value between 17 phyla.
Figure 12. Summary patterns of relative synonymous codon usage
for Hsp60 genes being under the different mutational pressure. The
average RSCU values of Hsp60 genes from phyla with the AT, AT/GC, and GC
mutational pressure were divided into two main groups according to the
type of base at the third synonymous position of codon: a - A/T-ending
codons; b - G/C-ending codons. The codons were sorted by the average
RSCU value between all 17 phyla.