Bioinformatics
The sequencing yielded on average 87.3 million paired-end sequences per sample (range 77.7–97.8 million sequences). Illumina adapters were removed with SeqPrep 1.2 (St John, 2014) following default settings with parameters -A and -B targeting the adapter sequences with identical selection. Any remaining PhiX sequences in the raw data were removed by mapping the reads using bowtie2 2.3.4.3 (Langmead & Salzberg, 2012) against the PhiX genome (NCBI Reference Sequence: NC_001422.1). Final quality trimming of the data was conducted with Trimmomatic 0.36 (Bolger, Lohse, & Usadel, 2014) with the following parameters: LEADING:20 TRAILING:20 MINLEN:50. The final quality of the trimmed reads were checked with FastQC 0.11.5 (Andrews, 2010) and MultiQC 1.7 (Ewels, Magnusson, Käller, & Lundin, 2016). On average 86.8 million sequences remained (range 77.3–97.2 million sequences) with a Phred quality score of 36–37 per base, and an average read length of 144 bp (range 139–147 bp).
Small subunit (SSU) 18S rRNA sequences were extracted from the quality trimmed data using SortMeRNA 2.1b (Kopylova, Noé, & Touzet, 2012) with the databases supplied with the software. Taxonomic classification was conducted with Kraken2 2.0.7 (D. E. Wood et al., 2019) using paired-end reads against the SILVA SSU r132 NR99 (Quast et al., 2013) (database downloaded 1 March 2019) and NCBI NT database (database downloaded 12 March 2019). To estimate the relative abundance of each taxa Bracken 2.5 was used on the Kraken2 outputs with default settings on the genus level (i.e. a count threshold of 10) (Lu et al., 2017). The Bracken output reports were combined into a biom-format file with the python package kraken-biom 1.0.1 (using parameters: —fmt hdf5 -max D –min G), and the biom-format file was converted to a tax table using the python package biom-format 2.1.7 (McDonald et al., 2012). The final data was normalized as relative abundances (%), and the data was analyzed in the software Explicet 2.10.5 (Robertson et al., 2013). Results for i)Nematoda (NCBI NT classifications, on average 478,331 sequences per sample); ii) Foraminifera (NCBI NT, average 13,913 sequences), and iii) Ciliophora (SILVA, average 774,027 sequences) were extracted and analyzed separately. The NCBI NT database was used for the Nematoda and Foraminifera data because, 1) the SILVA database is known to contain errors in the nematode classifications (Elias Broman et al., 2019; Holovachov, Haenel, Bourlat, & Jondelius, 2017), and the NCBI NT has previously been used to discern differences in nematode communities on a spatial scale in the Baltic Sea (Elias Broman et al., 2019); and 2) the SILVA database gave inaccurate classifications for Foraminifera , resulting in the identification of taxa never discovered in the Baltic Sea (more details in the discussion).