Microbial filtering and pattern comparison
We screened for loci from putative microbes in three different ways.
First, potential bacterial, viral and human sequence contamination were
removed via Blasting to reference sequences from GenBank following Maas
et al., (2018) (see their Supplemental Table 1 for Genbank data used).
Next, we ran Kraken (Wood and Salzberg, 2014), a fast sequence
classifier to BLAST (Altschul et al. , 1990) our loci against
bacterial databases with default settings. Finally, we used BlobTools
(Laetsch and Blaxter, 2017) to taxonomically partition reads and cut off
loci with >55% GC content, as we expect sponge microbes to
have higher GC content than sponge hosts (Horn et al. , 2016). The
identified microbial loci were filtered out using a custom made perl
script (Bi et al. , 2013).
Population genetic patterns of the sponge host were contrasted to sponge
microbial community patterns from five populations as studied by
Ferreira et al. , (2020) (B.1, B.2, B.3, P.4, P.5). Two datasets
from filtered 16s amplicon metabarcoding were downloaded from Ferreiraet al. (2020): the abundance of microbial genera (24 genera
total), and the presence/absence of the 35 most abundant operational
taxonomic units (OTUs). We compared three levels of variation among the
host genetic dataset and associated microbial community dataset: (1)
among genetic lineages of the host sponge (Lineage A and B, only Lineage
B, and one sub-lineage within Lineage B, as defined by Becking et
al. (2013)), (2) among two regions >1,400km apart (Berau
and Raja Ampat), and (3) among lakes within the same region
(<250km). We tested whether microbial community patterns were
related to sponge host population structure by running Mantel tests
(Legendre and Legendre, 2012) between the Bray-Curtis dissimilarity
matrix of the microbial communities and the genetic distance
(FST) matrix of sponge host.