2 Materials and methods

2.1 SARS-CoV-2 variants

A total of 14,472,180 SARS-CoV-2 RNA sequences were downloaded from the GISAID database [18] on January 7th 2023 and annotated with Pangolin [19] (program’s version 4.2, Pangolin’s data version 1.17). Among 28,420 variants associated with the City of Berlin (Federal Republic of Germany), 3,755 ones satisfying the following conditions were selected: viral RNA does not contain unrecognized nucleotides; the difference between viral RNA length and the “Wuhan” reference variant (GISAID: EPI ISL 402125) is less than 5%. All selected RNA sequences were collected between the 14th January 2021 and 12th December 2022. All sequences were divided into two groups, “Ancestral variants” and “Omicrons”, according to their GISAID descriptions. “Ancestral variants” included variants of “Alpha”, “Beta”, “Delta”, and others (Table 1 of Supplementary materials) and collected before the 30th January 2022, while all “Omicrons” species were sequenced after 1st December 2021 (Figure 1).

2.2 Highly expressed miRNAs

miRNA-seq read count tables were downloaded from the GDC portal at https: //portal.gdc.cancer.gov/projects/ for colon (COAD) and lung (LUAD) tissues. Following the corresponding clinical description, the dataset samples were referred to as “Normal” (“Solid Tissue Normal”) and “Tumor” (“Primary Tumor”) groups. Only “Normal” samples were used to assess miRNA expression in the analysis: 8 entries in COAD dataset and 46 in the LUAD project. Highly expressed miRNAs were selected independently for each tissue by the following procedure: read counts associated with a particular miRNA by the samples were summed up; then, miRNAs were sorted by the total number of reads; finally, miRNAs accounting for 95% of all reads were selected; As a final expression of the miRNAs, corresponding CPM expressions were averaged by the samples.

2.3 Binding model

To model the binding interaction between miRNA and viral mRNAs, the regions of viral RNA that were a reverse complement to the seed region of miRNA (nucleotides from 2 to 7 on the 5’-end of mature miRNA) were mapped to SARSCoV-2 genomes. Following the common classification [20], such interactions are called “6mer”; they lead to translational inactivation of target mRNAs [21]. In addition to the reverse complementarity of the seed region, miRNA target prediction tools may rely on additional features. For example, TargetScan [22] utilizes context++ score based on site type, 30-supplementary pairing, local AU content and others, MirTarget [23] considers GC content of target site, and UTR length, while RNA22 [24] predicts target sites by placing “Target Islands” within 3’-UTRs, 5’-UTRs or CDS.
Commonly, the overlap of the targets by different software is weak at best [25,
26]. While RNA22 predicts more individual miRNA-mRNA interactions than TargetScan, it misses the majority of classical seed region binding sites in 3’-UTRs. Moreover, the prediction results heavily depend on the tool-specific internal thresholds, with sensitivity and specificity expected to vary depending on particular thresholds. To overcome these limitations, a bottom-up approach based on the mapping of SARS-CoV-2 regions with reverse complement “6mers” matching the seed regions within tissue-specific and relevant host miRNAs was utilized. All SARS-CoV-2 RNA sequences were aligned to the reference “Wuhan” variant using MAFFT tool [27]. After the alignment, we transformed viral RNAs coordinates on the “Wuhan” variant and annotated protein-coding regions using
“Wuhan” annotation (https://www.ncbi.nlm.nih.gov/nuccore/1798174254).

2.4 Statistical analysis

In all comparisons, the significances of the findings were evaluated in the nonparametric Mann–Whitney U tests [28]. In addition, the standard Student’s t-test procedure [29] was used to test the hypothesis that the Spearman correlation [30] was zero. The Monte-Carlo sampling procedure was used to show that the decrease in the occurrence of miRNA binding motifs in the “Omicron” sequence group was specific to the highly expressed miRNAs. More precisely, the expression profile of the highly expressed miRNAs was fixed and the miRNA sequences were randomly replaced with other miRNAs expressed in a tissue. After that, the weighted number of binding motifs in “Omicrons” and “Ancestral” groups were calculated: the number of binding motifs within a particular miRNA was multiplied by the proportion of its read counts, summed up and normalized by the total lengths of the compared viral genomes. Then, for “Ancestral variants” and “Omicrons” groups, the computed numbers were compared using the MannWhitney U test. The described procedure was repeated 1,000 times. Thus, the distribution of statistics characterizing the change in the number of binding motifs between “Ancestral variants” and “Omicrons” was obtained. Finally, using this distribution, the probability that the decrease in the number of binding motifs for highly expressed miRNAs was greater than for randomly selected ones was calculated.

2.5 Other technical details

The statistical analysis (hypothesis testing, correlation computation) was performed with SciPy library [31]. All vector computations were implemented using NumPy library [32] and parallelized using Parallel tool [33]. Miscellaneous computations with tables were produced with Pandas [34]. Finally, figures were made using Matplotlib [35] and Seaborn [36] libraries. The source code of the analysis can be found at https://github.com/zhiyanov/covid-miRNA-evolution.