Analyses of genetic variation
To test for Fst outliers that may be under selection, we used BayeScan (Foll & Gaggiotti, 2008) using a false-discovery rate (FDR) of 0.01. We first compared among the three geographic regions (north, middle, and south) regardless of disease status to assess whether they needed to be kept separate for comparison among disease status groups. After detecting 125 Fst outliers among geographic groups (Fig. S1), we examined loci associated with disease status by running BayeScan within each region comparing allele frequencies between their respective symptomatic and asymptomatic individuals. After identifying outlier loci within each region, we recovered protein-coding genes annotated in the reference genome within 20,000 base pairs of each outlier SNPs. We then BLASTed the predicted protein sequence of these genes against the NCBI ’nr’ database (see Genome annotation below).
Identifying causative loci using single-locus methods, such as Fst, requires strong a signal. The power to identify relevant genomic regions using Fst may be weak if the phenotypic response is highly polygenic. In this case, the signal of differentiation may be distributed weakly across many loci. Therefore, we used a multivariate approach to identify such regions driving differentiation between symptomatic and asymptomatic sea stars. We used discriminant analyses of principal components (DAPC) (Jombart, Devillard, & Balloux, 2010) implemented in the R package ’adegenet’ (Jombart, 2015). Thresholds were arbitrability selected to separate visibly higher peaks. The position of outlier loci was then compared across the three geographic regions, and those from at least 2 different regions that were within 1 Mbp of each other were earmarked as informative for SSWS resilience. An NCBI BLASTp query was then performed on annotated genes found within 20,000 bp of the earmarked SNP loci.
We estimated Weir & Cockerham’s Fst (Weir & Cockerham, 1984) between disease-status groups using GPAT++ (Shapiro et al., 2013) their geographic regions. Significance levels were adjusted using the false discovery rate (FDR = 0.05) in R. To isolate potentially similarly influential genes from different populations, SNPs within 1 Mbp from at least one other population were plotted with Fst across the genome.