Microbial filtering and pattern comparison
We screened for loci from putative microbes in three different ways. First, potential bacterial, viral and human sequence contamination were removed via Blasting to reference sequences from GenBank following Maas et al., (2018) (see their Supplemental Table 1 for Genbank data used). Next, we ran Kraken (Wood and Salzberg, 2014), a fast sequence classifier to BLAST (Altschul et al. , 1990) our loci against bacterial databases with default settings. Finally, we used BlobTools (Laetsch and Blaxter, 2017) to taxonomically partition reads and cut off loci with >55% GC content, as we expect sponge microbes to have higher GC content than sponge hosts (Horn et al. , 2016). The identified microbial loci were filtered out using a custom made perl script (Bi et al. , 2013).
Population genetic patterns of the sponge host were contrasted to sponge microbial community patterns from five populations as studied by Ferreira et al. , (2020) (B.1, B.2, B.3, P.4, P.5). Two datasets from filtered 16s amplicon metabarcoding were downloaded from Ferreiraet al. (2020): the abundance of microbial genera (24 genera total), and the presence/absence of the 35 most abundant operational taxonomic units (OTUs). We compared three levels of variation among the host genetic dataset and associated microbial community dataset: (1) among genetic lineages of the host sponge (Lineage A and B, only Lineage B, and one sub-lineage within Lineage B, as defined by Becking et al. (2013)), (2) among two regions >1,400km apart (Berau and Raja Ampat), and (3) among lakes within the same region (<250km). We tested whether microbial community patterns were related to sponge host population structure by running Mantel tests (Legendre and Legendre, 2012) between the Bray-Curtis dissimilarity matrix of the microbial communities and the genetic distance (FST) matrix of sponge host.