2.5.2 RNA-seq data processing, de novo transcriptome assembly, and
transcript abundance estimation
We quality trimmed the raw sequencing reads by removing adaptors and
low-quality bases with Trimmomatic v 0.36 75 with a
4-base sliding window and quality threshold of 25. These reads were used
for de novo transcriptome reconstruction with Trinity v 2.8.476, following the protocol by 77.
Then, we clustered highly redundant transcripts, i.e. transcripts with
> 95% sequence similarity, using CD-HIT v 4.6.678, and selected the longest isoform per gene using
Trinity’s custom script
(get_longest_isoform_seq_per_trinity_gene.pl). The quality of the
assembly was assessed by mapping the trimmed reads back to the assembly
using Bowtie2 v 2.2.9 79 and its completeness by
searching for orthologues with BUSCO v 3.1.0 80 using
the viridiplantae-odb10 database as a reference (E-value cutoff for the
blast alignments: 1e-06).
We estimated transcript abundance within each individual sample using
RSEM 81 wrapped by scripts included in Trinity
(align_and_estimate_abundance.pl). This software first aligns the
sequenced reads back to the transcriptome, and then provides read counts
and normalized expression values for each transcript in each sample.
Finally, we created a count matrix with read counts across all samples.