3.1 Bilberry transcriptome sequencing and functional annotation
The raw sequencing reads of the bilberry transcriptomes yielded
approximately 56 Gb of data and reached approximately 8 Gb per sample
data. The read number for control samples was 70,627,656 bases, while it
was 82,253,205 bases for the red light treated samples and 78,192,227
bases for the blue light treated samples (Table S2). MultiQC analysis
proved that the processed reads were of good quality with Phred score
>36 (Table S2). Read-mapping to a recently publishedV. corymbosum genome (Colle et al., 2019) resulted in 75-77% of
total reads mapped including ~50% uniquely mapped to
the genome. Using a draft genome of bilberry (Wu et al., 2021,
unpublished), representing the same bilberry ecotype as our samples,
enabled unique mapping of 83.5% of the filtered reads. A total of
671,952 transcripts and 472,876 unigenes were generated from the
combined transcriptome assembly with mean contig lengths of 911 bp and
720 bp, respectively (Table 1). BUSCO analysis revealed that the
combined assembly had 97.4% complete sequences when searched within
1375 orthologous groups of embryophyta_odb9 lineage (Table S3). The
scores were slightly improved compared to genome-guided assembly
indicating that the combined transcriptome in our analysis is a robust
assembly and was subsequently used in this study.
In total, around 25,316 (61%) of putative protein IDs of bilberry
transcripts showed significant hits in Swissprot and 25,280 (61%) in
Pfam databases (Figure 2a). Around 60% of the sequences had hits with
eggNOG (clusters of orthologous groups) and relatively high number of
hits (65%) obtained from the KEGG database (Figure 2a). BLAST hits
distribution among the top-25 species showed the highest homology inRhododendron williamsianum (33%) followed by Camelia
sinensis var. (25%) and Actinidia chinensis var. (16%) (Figure
S1). The top-hit species with some considerable matches obtained from
the top DEGs (0.2%-2.6%) showed sequence similarities with V.
myrtillus , V. macrocarpon and V. corymbosum .
There were 49,105 commonly co-expressed genes detected among the
transcriptomes of the three different light treatments (Figure 2b). The
distribution between the different light treatments is visualized in a
Venn diagram showing that 1816 and 1686 genes were uniquely expressed
among the red and blue light treatments, respectively (Figure 2b). The
BLAST sequence similarity distribution within the query sequences
(E-value cut-off 1.0-5) showed high number of
positives to that of aligned reads length in the range of 70 - 90%
suggesting a strong match between query and assembled known sequences
from databases (Figure S2).