Analyses of genetic variation
To test for Fst outliers that may be under selection, we used BayeScan
(Foll & Gaggiotti, 2008) using a false-discovery rate (FDR) of 0.01. We
first compared among the three geographic regions (north, middle, and
south) regardless of disease status to assess whether they needed to be
kept separate for comparison among disease status groups. After
detecting 125 Fst outliers among geographic groups (Fig. S1), we
examined loci associated with disease status by running BayeScan within
each region comparing allele frequencies between their respective
symptomatic and asymptomatic individuals. After identifying outlier loci
within each region, we recovered protein-coding genes annotated in the
reference genome within 20,000 base pairs of each outlier SNPs. We then
BLASTed the predicted protein sequence of these genes against the NCBI
’nr’ database (see Genome annotation below).
Identifying causative loci using single-locus methods, such as Fst,
requires strong a signal. The power to identify relevant genomic regions
using Fst may be weak if the phenotypic response is highly polygenic. In
this case, the signal of differentiation may be distributed weakly
across many loci. Therefore, we used a multivariate approach to identify
such regions driving differentiation between symptomatic and
asymptomatic sea stars. We used discriminant analyses of principal
components (DAPC) (Jombart, Devillard, & Balloux, 2010) implemented in
the R package ’adegenet’ (Jombart, 2015). Thresholds were arbitrability
selected to separate visibly higher peaks. The position of outlier loci
was then compared across the three geographic regions, and those from at
least 2 different regions that were within 1 Mbp of each other were
earmarked as informative for SSWS resilience. An NCBI BLASTp query was
then performed on annotated genes found within 20,000 bp of the
earmarked SNP loci.
We estimated Weir & Cockerham’s Fst (Weir & Cockerham, 1984) between
disease-status groups using GPAT++ (Shapiro et al., 2013) their
geographic regions. Significance levels were adjusted using the false
discovery rate (FDR = 0.05) in R. To isolate potentially similarly
influential genes from different populations, SNPs within 1 Mbp from at
least one other population were plotted with Fst across the genome.