Lake characterization and read statistics
The physical and environmental profiles of the two lagoons and nine marine lakes are provided in Table 1. In general, we observed higher temperatures (30.8°C ±1.22°C) and lower salinities (27.3ppt ±2.7ppt) in lakes than in lagoons (29°C and 33.5ppt). Connection to the surrounding sea varied among lakes, with highly connected to highly isolated lakes based on tidal amplitudes. For instance, lake P.4 was found to have the highest connection with tidal amplitude representing 80% of that of the surrounding sea, while lake P.1 was most isolated, with tidal amplitude only being 7% of the surrounding sea.
After sequencing and demultiplexing we obtained 1,127,497,643 reads from 168 sponges. On average, we obtained 7,673,269 reads per individual. Individuals with less than 2,000,000 reads were removed from subsequent analyses. Based on the calculation table from Peterson et al.(2012) and on an estimated genome size of 600Mb and a size selection of 425-500bp, we expected to retain 27,300 RADtags. However, our de novo reference retained only 14,442 tags when keeping RADtags with at least 3X coverage and tags present in at least 70% of individuals. Kraken and Blobtools identified 13 out of the 14,442 RADtags as possible bacterial contamination. The RADtags mapped to Synechococcus sp., a Cyanobacteria genus, and were removed from the data set.
After filtering we retained 125 sponges with 973,697,804 reads in total, with coverage ranging from 3.1 - 82.2X (average 24.0X). In total, 23,742 SNPs were called over all tags, and after selecting one SNP per tag we retained 4,826 SNPs for subsequent analyses. Depending on the filtering options (genotype calls or genotype likelihoods, coverage 3X or 10X, included missing data 30%, 10%, 5% or 1%) the number of SNPs varied from 56 to 4,826 (Supplemental Table 2).