3.1 Bilberry transcriptome sequencing and functional annotation
The raw sequencing reads of the bilberry transcriptomes yielded approximately 56 Gb of data and reached approximately 8 Gb per sample data. The read number for control samples was 70,627,656 bases, while it was 82,253,205 bases for the red light treated samples and 78,192,227 bases for the blue light treated samples (Table S2). MultiQC analysis proved that the processed reads were of good quality with Phred score >36 (Table S2). Read-mapping to a recently publishedV. corymbosum genome (Colle et al., 2019) resulted in 75-77% of total reads mapped including ~50% uniquely mapped to the genome. Using a draft genome of bilberry (Wu et al., 2021, unpublished), representing the same bilberry ecotype as our samples, enabled unique mapping of 83.5% of the filtered reads. A total of 671,952 transcripts and 472,876 unigenes were generated from the combined transcriptome assembly with mean contig lengths of 911 bp and 720 bp, respectively (Table 1). BUSCO analysis revealed that the combined assembly had 97.4% complete sequences when searched within 1375 orthologous groups of embryophyta_odb9 lineage (Table S3). The scores were slightly improved compared to genome-guided assembly indicating that the combined transcriptome in our analysis is a robust assembly and was subsequently used in this study.
In total, around 25,316 (61%) of putative protein IDs of bilberry transcripts showed significant hits in Swissprot and 25,280 (61%) in Pfam databases (Figure 2a). Around 60% of the sequences had hits with eggNOG (clusters of orthologous groups) and relatively high number of hits (65%) obtained from the KEGG database (Figure 2a). BLAST hits distribution among the top-25 species showed the highest homology inRhododendron williamsianum (33%) followed by Camelia sinensis var. (25%) and Actinidia chinensis var. (16%) (Figure S1). The top-hit species with some considerable matches obtained from the top DEGs (0.2%-2.6%) showed sequence similarities with V. myrtillus , V. macrocarpon and V. corymbosum .
There were 49,105 commonly co-expressed genes detected among the transcriptomes of the three different light treatments (Figure 2b). The distribution between the different light treatments is visualized in a Venn diagram showing that 1816 and 1686 genes were uniquely expressed among the red and blue light treatments, respectively (Figure 2b). The BLAST sequence similarity distribution within the query sequences (E-value cut-off 1.0-5) showed high number of positives to that of aligned reads length in the range of 70 - 90% suggesting a strong match between query and assembled known sequences from databases (Figure S2).