2.3 Transcriptome assembly
The raw reads from Illumina were initially quality assessed using
MultiQC software (Andrews, 2010; Ewels et al., 2016). The adapter
contamination was removed using Trimmomatic tool specifically designed
for Illumina NGS data (Bolger et al., 2014), followed by the removal of
the residual rRNA reads by using sortMeRNA program (Kopylova et al.,
2012). The quality checking by MultiQC included assessment of sequence
quality score (phred >30), adapter content and position, GC
content, and ambiguous bases (Ns). Only the clean filtered reads were
used in our downstream analysis. A robust transcriptome was constructed
with Trinity v2.9.0 software pipeline (Grabherr et al., 2011) by
developing a combined redundant-over assembly from de novo and
genome-guided assembly using a bilberry genome sequence of the same
bilberry ecotype (Wu et al., 2021, unpublished). The draft genome was
indexed and align-mapped to the reads using STAR v2.6.1d software (Dobin
et al., 2013). The genome-guided Trinity output was concatenated withde novo transcriptome to form a combined assembly. EvidentialGene
tool (Gilbert, 2019) was used to remove the redundancy arising from
assemblies. The reads were further mapped to the published highbush
blueberry (V. corymbosum cv. Draper v1.0) genome (Colle et al.,
2019) using HISAT2 software to improve the annotation of assembly. The
best possible coding regions were identified using TransDecoder tool
(http://transdecoder.github.io),
which identifies a minimal length of open reading frames (ORFs) within
reconstructed Trinity transcripts. To assess the completeness of the
transcriptome assemblies, BUSCO tool v3.0 (Simão et al., 2015) was used
to validate the single copy genes on an evolutionary perspective.
Embryophyta orthologous database odb_v.10
(https://busco-archive.ezlab.org/v3/) was used to validate the
assembled transcriptomes.