Lake characterization and read statistics
The physical and environmental profiles of the two lagoons and nine
marine lakes are provided in Table 1. In general, we observed higher
temperatures (30.8°C ±1.22°C) and lower salinities (27.3ppt ±2.7ppt) in
lakes than in lagoons (29°C and 33.5ppt). Connection to the surrounding
sea varied among lakes, with highly connected to highly isolated lakes
based on tidal amplitudes. For instance, lake P.4 was found to have the
highest connection with tidal amplitude representing 80% of that of the
surrounding sea, while lake P.1 was most isolated, with tidal amplitude
only being 7% of the surrounding sea.
After sequencing and demultiplexing we obtained 1,127,497,643 reads from
168 sponges. On average, we obtained 7,673,269 reads per individual.
Individuals with less than 2,000,000 reads were removed from subsequent
analyses. Based on the calculation table from Peterson et al.(2012) and on an estimated genome size of 600Mb and a size selection of
425-500bp, we expected to retain 27,300 RADtags. However, our de
novo reference retained only 14,442 tags when keeping RADtags with at
least 3X coverage and tags present in at least 70% of individuals.
Kraken and Blobtools identified 13 out of the 14,442 RADtags as possible
bacterial contamination. The RADtags mapped to Synechococcus sp.,
a Cyanobacteria genus, and were removed from the data set.
After filtering we retained 125 sponges with 973,697,804 reads in total,
with coverage ranging from 3.1 - 82.2X (average 24.0X). In total, 23,742
SNPs were called over all tags, and after selecting one SNP per tag we
retained 4,826 SNPs for subsequent analyses. Depending on the filtering
options (genotype calls or genotype likelihoods, coverage 3X or 10X,
included missing data 30%, 10%, 5% or 1%) the number of SNPs varied
from 56 to 4,826 (Supplemental Table 2).