2.2 Statistical Methods
All data analysis was both performed and visualized in the R programming
environment (version 4.1.0, R Core Team 2013). Diagnostic plots
were used to verify all parametric modeling assumptions, with log
transformations being performed when necessary to satisfy assumptions.
Linear models were fit using numerical plant traits as response
variables (biomass, individual biomass, maximum growth height, leaf
area, leaf dry weight, reproduction, survivorship), with treatments
(site, species, soil type) as predictor variables. Sites were analyzed
categorically representing a wide variety of environmental variables,
rather than across specific variables. Full models with all interaction
combinations, as well as all simplified model structures were compared
in the package MuMIn (version 1.43.17, Bartoń 2020). The
best performing model for each response variable were selected based on
Akaike’s Information Criterion (AIC; Bozdogan 1987; Wagenmakers and
Farrell 2004), except where multiple models were indistinguishable
(𝛿AIC < 2), in which case the simplest model structure
was selected.
To identify which traits were most associated with individual predictors
(species, soil, climate), we created confusion and importance matrices
using the package randomForest (version 4.6-14, Liaw and
Wiener 2001). Out-of-bag error rates (OOB) were derived from confusion
matrices to estimate the relative error of traits in treatment
differentiation. OOB values were standardized through the calculation of
percent difference from random classification. For the three most
deterministic variables, mean decrease accuracies corrected by the
sample size are reported as percentages, representing the estimate of
misclassification that would occur if a variable were removed from the
model. Generalized linear models and analysis of variance (ANOVA) models
were fit on both top multiple linear regression models as well as
physiological variables with high determinant power. Tukey’s Honest
Significant Distance was used post-hoc to identify differences across
treatments using the package multcomp (version 1.4-14,Hothorn et al. 2020). This procedure was also used to test biomass
differences across climate, species, and soil.
To investigate intra-specific trends, individual biomass was calculated
by dividing species biomasses by the number of survivors of each
species. The effects of local soil at a specific climate were isolated
by calculating the difference in trait responses between the local and
reference soil communities. The effects of climate on plant traits were
isolated by comparing reference soil communities across the climates. We
used a Principal Component Analysis (PCA) using the packageFactoMineR to analyze multivariate data and identify
physiological variable contributions to climate and species differences
(version 2.4, Lê, Josse, and Husson 2008). Individuals were
equally weighted, with variables being positively shifted and logit
transformed to standardize relative contributions. PCA dimensional
analysis was performed by calculating correlations across maximum
height, survival, species biomass, reproduction, leaf dry weight, and
leaf area. No derived variables were included in the PCA or dredge
modeling to eliminate issues of covariance.