Discussion:
The SARS-CoV-2 pandemic has cost more than 333 thousand lives already,
with many more deaths being reported each day. During this period, the
global community has yet to predict its virulence, seasonal variation,
carriage and immunity. However, it is clear that the fatality rate
varies by region and that the degree of virulence varies from person to
person. Some regions in Europe and North America were affected the most,
while most of Asia, Africa and Australia remain less affected. A close
analysis of this ssRNA genome has now become an elementary scientific
need.
This study has characterized the SARS-CoV-2 virus circulating in
Southeast Asia into 4 major groups and 2 sub-groups by studying common
non-synonymous mutations. Group 1 consists of 5 out of 7 Indonesian
sequences, 3 out of 8 sequences from Thailand and the only sequence from
Nepal. Group 2 involves 40% of the variants in this study. Strains
belong to this group coevolved with characteristic NS mutation,
NSP12_P323L and Spike_D614G. These variants were initially prevalent
in Europe and North America, and now constitute 68% of the virus all
over the world. A recent study analyzed 95 sequences and also found
NSP12_P323L variants to be at a higher frequency, and reported that
this variant was mostly found outside of Wuhan, China (Khailany, Safdar,
& Ozaslan, 2020). Another study suggests that RNA dependent
RNA-polymerase (RdRp) aa substitution at the 323rd position
(NSP12_P323L) causes RdRp fidelity, which, in turn, increases the
number of mutations within the virus and causes co-evolved mutations
(Pachetti et al., 2020). NSP12_P323L was co-evolved with Spike_D614G;
this particular non-silent spike protein mutation generates an
additional elastase cleavage site near the S1-S2 junction and thus
facilitates fusion and cell entry (Koyama, Weeraratne, Snowdon, &
Parida, 2020). This variant (Spike_D614G) was first observed in January
28, 2020 and was initially prevalent in Europe. Within 4 months, this
variant has now rapidly outcompeted its ancestral subtype all over the
world (Bhattacharyya et al., 2020). This explains the frequency of Group
2 variants in Southeast Asia and why these variants have subdivided into
additional sub-groups involving co-evolving mutations.
We differentiated Group 2 into 2 subgroups, 2a and 2b, which involve
N_203-204: RG> KR and NS3_Q57H amino acid substitutions,
respectively, along with NSP12_P323L and Spike_D614G. Several studies
(Ayub, 2020; Lorusso et al., 2020; Yin, 2020) mention trinucleotide
block mutations in nucleotides (28881-28883: GGG>AAC) which
resulted in 2 amino acid changes (N_203-204: RG> KR) and
affected the Serine-Arginine-rich motif of N protein. This trinucleotide
block mutations were found in 8 sequences, 3 of them were from Dhaka,
Bangladesh. NS3_Q57H mutation variants have been commonly found in the
USA (Mercatelli & Giorgi, 2020) and Europe and are predicted to be
deleterious (Issa, Merhi, Panossian, Salloum, & Tokajian, 2020).
Unlike the others, Group 3 was unique, with 4 coevolving mutations. Of
these, the NSP6_L37F mutation variant was common (Mercatelli & Giorgi,
2020); this mutation variant has also been frequently found in the UK,
USA, Australia and India. The other 3 mutations are relatively less
common and found mostly in India and Australia. Group 4, on the other
hand, consists of a characteristic NS8_L84S mutation variant, which was
declared as S type by Tang, X.L. et al. (Tang et al., 2020). This
mutation was later reported as C type by another group (Forster,
Forster, Renfrew, & Forster, 2020) and were clustered as S clade by
GISAID (Fuertes et al., 2020). Group 4 included 4 Bangladeshi variants,
isolated from the Chittagong district in May, 2020, along with 3 strain
sequences out of 8 from Thailand and only one strain sequence from
India.
A recent study conducted with 10,014 sequences identified 13 frequent
non-synonymous mutations (Mercatelli & Giorgi, 2020), while we found
only 7 of them, along with 3 less common mutation, at high frequency in
this region (Figure 2). Most of the spike protein mutations identified
in this region were also observed in Europe and North America. Spike
protein mutations with aa substitution at 614 position, found in 40% of
the studied strains in this region, were also prevalent in Europe and
North America. On the contrary, the amino acid substitutions found at
the 1109th position of the spike protein found in one Bangladeshi strain
was found to be globally common with one strain from Switzerland. We
observed another amino acid substitution in the spike protein at the
76th (Spike_T76I) position in an Indonesian strain, which was also
found in two strains from West Bengal, India (data non shown). This
specific amino acid substitution was identified on 55 occasions
according to the global database. Among them, 49 were from Australia,
suggesting that this variant might have transmitted from Australia.
Another spike protein amino acid substitution (Spike_E471Q) was found
in the receptor binding domain of the spike protein. Glutamate (E) was
replaced by Glutamine (Q) resulting in a conservative replacement that
may not contribute largely in binding to the ACE2 receptor.
Additionally, global mutation distribution statistics showed that
Spike_A829T mutation was observed in 31 sequences, all of them from
Thailand (Table 2). NSP2_I120F mutation was found in 9 of the 12 cases
from Dhaka, Bangladesh and NSP2_D92G mutation was present in 4 out of
the 5 sequences from Chittagong, Bangladesh (Data not shown). These
cities are separated by a distance of 250 kilometers, suggesting that
those viruses carrying novel mutations were circulating in an
area-specific manner.
NS mutation and phylogenetic analysis conducted through the Nextstrain
database was particularly useful in getting a closer look at mutation
variants and their possible routes of transmission. We found a common
N_203-204: RG> KR amino acid substitution (9 out of 12
strains) in Dhaka, Bangladesh. However instead of the common N_203-204:
RG> KR amino acid substitution, a less common aa
substitution was observed at the 202nd position of N protein (N_S202N)
among the most (5 out of 7) strains of Chittagong. The mutation
distribution database showed that strains having trinucleotide block
mutation in N protein were prevalent in Europe and that the N_S202N
mutant was found more commonly in recent strains of Saudi Arabia.
Phylogenetic analysis by Nextstrain also revealed that the Chittagong
strains (belongs to Nextstrain B4 clade and group 4 of our study) have
close relationship with the Saudi Arabian strains, while Dhaka strains
(A2 clade in nextstrain, group 2 in this study) are similar to the
European ones.
The geographical heat-map (Figure 5) of these non-synonymous mutations
indicate that most of these mutations were also frequently found in the
UK, USA, Australia, Saudi Arabia and other European countries, revealing
possible transmission routes to Southeast Asia. Phylogenetic analysis
with 329 genomes from this region by Nextstrain produced a similar
transmission route map (Figure 6). This study also confirmed, through
phylogenetic and mutation analysis, that a high percentage of Group 2
strains are linked to European and North American strains (A2 clade in
Nextstrain analysis) in India and Bangladesh.
We could not analyze the strains from Maldives, Bhutan and Timor-Leste
because they do not have whole genome sequence data of the virus at the
time of our analysis. Among the six countries with available genome
sequences of a good quality, only India, Bangladesh and Indonesia have
reported a higher number of SARS-CoV-2 infections. The frequencies of
infection have increased exponentially from mid-April, 2020. In our
study, we additionally analyzed 187 sequences (Figure 4) of which 100
(53%) sequences from India, Bangladesh, Thailand, Indonesia and
Srilanka showed characteristic NSP12_P323L and spike_D614G mutations,
which put them in the Group 2 cluster (Nextstrain clade A2). It was also
shown that Group 2 variants were not found earlier than the
10th of March in this study. The time plot data
delineates that this Group 2 cluster is emerging rapidly from 0% in
January and February, to 85% in May 2020. In contrast, group 1 strains
(similar to the ancestral strain) were not found after the
1st of April, suggesting that the European and North
American strains are the most recent predominant strains in this
subcontinent. A study conducted in early March reported that
NSP12_P323L (14408C>T) and spike_D614G
(23403A>G) mutations were recurrent in Europe and had not
been detected in Asia until then, supporting our statement (Pachetti et
al., 2020). Along with other co-evolving mutations, NSP12_P323L and
Spike_D614G probably provide variants with an evolutionary advantage
over their ancestral types, allowing them to survive and circulate in
this densely populated region.
Although a number of earlier studies hypothesized that high temperatures
and high humidity could result in reduced SARS-CoV-2 transmission, the
infection rate of SARS-CoV-2 is already increasing in this subcontinent.
Given that the European and North American variants (Nextstrain clade
A2) are emerging rapidly and that winter is approaching, the next wave
of SARS-CoV-2 may take place in Southeast Asia.