1. Introduction
In December, 2019 china reported a disease with pneumonia like conditions, resulting in respiratory malfunctioning due to some viral attack. Later that virus proved lethal and turned into global pandemic. World Health Organization (WHO) named the disease as “coronavirus disease 19 (COVID-19)”. Following the international standards of nomenclature, virus was declared as severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) due to its taxonomic and genomic relationship with the species of sever acute respiratory syndrome-related coronavirus[1].
In the initial stages of pandemic, it was centered to china only but Spain, Italy, Brazil, France, United States of America, Iran, and India were also severely affected in short period. World had seen the major lockdown of the history in the year 2020 to reduce the spread of CoV-2 that had greatly affected the economy of world powers. Instead of initial precautions taken by the people, the virus affected 108M people around the globe with 80.8M recoveries and 2.31M deaths till January 2021. World had also previously experienced corona as MERS-CoV and SARS-CoV that had affected Middle East and other countries to a large extent in 2012 and 2002. Coronaviruses profoundly spread in humans, other mammals and birds mainly affecting their respiratory, liver, and intestinal and nervous systems[2,3].
Human coronaviruses (HCoVs) were first identified in the mid-1960s. Till now seven HCoVs are known which include two α coronaviruses CoV-229E and CoV-NL63 and four β coronavirus as CoV-OC43, CoV-HKU1, SARS-CoV, MERS-CoV and CoV-2 [4,5]. As CoV-2 belongs to the formerly known family of coronaviruses it holds on to structural formations and show close genomic similarity to the SARS-CoV. The CoV-2 harbors a linear single-stranded positive RNA genome rapidly infecting vertebrates, named for the crown like spikes on their surface[6]. Subsequently after crown like surface projections it has spike protein (S), envelope protein (E), membrane protein (M), and nucleocapsid protein (N)[7]. These structural proteins are responsible for viral replication, virion-receptor attachments and thus involved in pathogenicity, spreading, and entry of virus into host organism.
Within a short period of time the virus shows its mutating ability, giving rise to new resistant more pathogenic strains which could be more difficult to counter. It may void the drugs designed against CoV-2 or may reduce the vaccine efficacy due to large number of variants. Genomic composition of CoV-2 shows 12 functional ORFs (open reading frames), 11 protein coding genes, with 12 expressed proteins and 5′ capped mRNA consist of 38% GC content with poly-A tail at 3′ end followed by UTR[6]. The ORFs are arranged on mRNA of CoV-2 as ORF1a, ORF1b, Spike (S), ORF3a, Envelope (E), Membrane (M), ORF6, ORF7a, ORF7b, ORF8, nucleocapsid (N), and ORF10 [8] (Figure 1). The genome of CoV-2 encodes 16 non-structural proteins (NSPs), four structural proteins, and other polyprotein1a and polyprotein1b[9]. Among the NSPs, replicase and protease are important for the viral genome replication, along with structural proteins and also potential drug targets [7,10,11].
Although both, structural and non-structural proteins of CoV-2 are important to investigate, here we investigated the variations, existed only in structural proteins because of their potential drug targets and vaccinal importance. This is the first comprehensive study in which we screened 2,95,000 complete genomes of SARS-CoV-2 for variants in the structural proteins. Exploring the degree of variations in the important target proteins might be helpful in projecting the pathogenicity and transmission of CoV-2 strains around world. Presence of large variations may lead to the confirmational changes in the targets, leading to therapeutic failure. Diagnostic accuracy may also be affected if proper screening has not been performed. Alternatively, geographic strains specific vaccine and antiviral might be more effective.