2.6.2 Epigenetic analyses
We calculated the methylation level at each SMP and individual sample and estimated the mean and standard deviation of DNA methylation per group, for each separate sequence context (i.e. CG, CHG and CHH), and across all contexts, using the complete SMP matrix (n=43,365 SMPs) as well as the complete and polymorphic SMP matrix (n=3,769 SMPs). All other analyses were performed for each of the two common garden experiments by comparing control vs. Cd-treated and control vs. Cu-treated plants from all four populations using the matrix of complete and polymorphic SMPs.
To evaluate response to each heavy metal, we carried out a distance-based RDA (dbRDA 94) to test the effect of population (n=4), treatment (n=2; C vs. Cd or C vs. Cu), and their interaction on genome wide DNA methylation with the model: epigenetic distance ~ Population * Treatment (functioncapscale in the vegan package; Oksanen et al., 2020). This constrained multivariate ordination approach provides an estimate of the amount of variation in epigenetic distances explained by our predictors, as well as the constrained ordination axes (RDA axes) used to visualize the structure of the data in a two-dimensional space. Pairwise epigenetic distances were estimated following Wang et al.96, where the distance (dst) between any two samples, s and t, was calculated either based on only the level of methylation or including the variance in methylation as follows:
\begin{equation} d_{\text{st}}=\ \sqrt[2]{\sum_{j=1}^{n}\left\{\frac{1}{2n}\left(x_{\text{sj}}^{m}-\ x_{\text{tj}}^{m}\right)^{2}\ +\ \frac{1}{2n}\left(x_{\text{sj}}^{v}-\ x_{\text{tj}}^{v}\right)^{2}\right\}}\nonumber \\ \end{equation}
Where n is the number of cytosine positions (SMPs);\(x_{\text{sj}}^{m}\) and \(x_{\text{tj}}^{m}\) represent the methylation level of cytosine j in samples s and t respectively; and \(x_{\text{sj}}^{v}\) and \(x_{\text{tj}}^{v}\)represent the methylation variance of cytosine j in samples s and t respectively. The methylation variance (\(x_{\text{ij}}^{v}\)) was estimated as the squared difference between the methylation level of cytosine j in sample i (\(x_{\text{ij}}^{m}\)) and the average methylation of cytosine j across all samples in the group to which sample i belongs (\({\overset{\overline{}}{x}}_{j}^{m}\)). Using the formula above, we also accounted for the variation in methylation profiles within treatment groups.
We ran separate dbRDA models for each common garden experiment (i.e. C vs. Cd and C vs. Cu) for all sequence contexts together and separately for each sequence context. In each of these cases, we first ran the model using the epigenetic distance matrix calculated using only differences in methylation levels between each pair of samples (using only the left-hand side under the square root of the formula above). Then, we ran the model using the distance matrix calculated from both methylation level and methylation variance (full formula). We tested the overall significance, as well as the significance of each predictor and their interaction using a permutation test with 9999 permutations. We calculated the proportion of total variance explained by our models, adjusted by the number of predictors included in the model (adj. R2), with the function RsquareAdj (vegan package) which provides a unique adj. R2 for the full model. We also calculated individual adj. R2 for each predictor by modelling the effect of each predictor while accounting for the others using partial constrained dbRDA with the following models: i) epigenetic distance ~ Treatment + Condition(Population); ii) epigenetic distance ~ Population + Condition(Treatment); iii) epigenetic distance ~ Population:Treatment + Condition(Treatment) + Condition(Population).
We identified differentially methylated cytosine positions (DMPs) as cytosines with significant differences in their mean methylation level between control and Cu-treated, and control and Cd-treated plants using the R package DSS 97. First, we modeled the methylation frequency at each cytosine position within each group using a beta-binomial distribution with arcsine link function and the formula “~ 0 + group” (function DMLfit.multifactor ). This transformation proved useful to model DNA methylation because it reduces the heteroscedasticity of the data by stabilizing its variance98,99. Second, we performed Wald tests to detect differential methylation between groups at each position using the function DMLtest.multifactor which reports adjusted p-values by the Benjamini-Hochberg method (i.e. FDR). Cytosines were considered to be differentially methylated (i.e. DMPs) when the FDR ≤ 0.05 and the methylation change was ≥ 10%.