Within-lineage population genetics
Within Lineage B, genetic patterns remained highly similar for all filters (but see supplemental figures and tables for differences). As conclusions remained the same, all further reported analyses were performed filtering on 3X coverage and max. 30% missing data, as this retained the most SNPs.
Population genetic diversity varied among lakes (Table 1, Supplemental Table 4). The highest genetic diversity was consistently found for the lagoon populations Bay and DAR, as seen for nucleotide diversity (π) (0.0101 and 0.0095, respectively), and for the expected heterozygosity (He) (0.157 and 0.117, respectively). Lowest genetic diversity was observed in populations P.1 (π = 0.0036, He = 0.054) and P.27 (π = 0.0037, He = 0.038). Population B.3 also showed low heterozygosity (He = 0.034), but relatively high nucleotide diversity (π = 0.0074). However, this may be an artefact of low sample size. When estimating heterozygosity from genotype likelihoods via ANGSD, we found the lowest heterozygosity for the populations P.5 (0.019) and P.27 (0.021).
The samples clustered per lake and lagoon location (Fig. 3, Supplemental Fig. 1, Supplemental Fig. 2). The first four Principal Components (PCs) in the Principal Component Analysis (PCA) explained 80.5% of total variation (Fig. 3A). PC1, explaining 45.6% of the variation, separated populations by geographic region, with the Raja Ampat lakes being distinct from the lakes in Berau. PC2, explaining 24.4% of variation, separated lake MIS01 from the other lakes. PC3 and PC4 (explaining 10.5% in total) further separated lagoon DAR and lakes P.5, and to a lesser extend P.1 and P.30. In the PC1 versus PC2 plot the lagoon populations (Bay and DAR) clustered towards the center of the graph, indicating them to be ancestral. For Bay, this continued for the PC3 versus PC4 plot, but not for DAR. Lakes P.27 and P.32 remained closely associated.
The Admixture analysis further supported the pattern of clustering per lakes (Fig. 3B). Convergence of likelihood values indicated the number of ancestral populations to be K = 9 (Supplemental Fig. 3, 4). When putative number of populations was set to 9, all populations were separated apart from B.2, which consisted of a mix of Bay and B.1 genetic lineages. Some admixture of B.1 genetic diversity into Bay and DAR populations was observed, indicating some genetic connection between these populations. Setting K at 7 or 8 indicated some admixture between P.30 and P.5 (K=8) or among P.27 and P.32 with Bay being a mixture of other populations (K=7). Setting K at 10 separated all populations.
Findings form the phylogenetic network were consistent with patterns found for PCA and Admixture plots (Fig. 3C, Supplemental Fig. 5). The network showed a high fit (fit = 99.2) and small degree of reticulation (d = 0.153), thus indicating a tree-like structure. The lagoon populations Bay and DAR showed higher reticulation than the marine lake populations, indicating higher intra-population diversity.
Pairwise fixation indices (F’ST) showed high levels of genetic structuring (0.629 ±0.133) (Fig. 3D). The F’STranged from 0.182 between Bay and B.2 to 0.778 between P.30 and P.32 (Supplemental Fig. 6, Supplemental Table 5). All pairwise comparisons were significant, except for the comparison between P.32 and B.2, potentially due to sample size (n = 4 and 2, respectively. The migration network among lakes indicated strongest relative bidirectional migration among lakes in Berau (Fig. 1D). Lagoon population Bay was linked to some degree to all other populations (relative fraction 0.4-1). Within Raja Ampat, bidirectional migration above the threshold of 0.4 was observed between P.5 and four other lakes (P.30, P.32, P.1, and P.4). There was low connectivity among lakes P.27, P.30, P.32 and P.1 in Raja Ampat (>0.4).