2.2 Statistical Methods
All data analysis was both performed and visualized in the R programming environment (version 4.1.0, R Core Team 2013). Diagnostic plots were used to verify all parametric modeling assumptions, with log transformations being performed when necessary to satisfy assumptions. Linear models were fit using numerical plant traits as response variables (biomass, individual biomass, maximum growth height, leaf area, leaf dry weight, reproduction, survivorship), with treatments (site, species, soil type) as predictor variables. Sites were analyzed categorically representing a wide variety of environmental variables, rather than across specific variables. Full models with all interaction combinations, as well as all simplified model structures were compared in the package MuMIn (version 1.43.17, Bartoń 2020). The best performing model for each response variable were selected based on Akaike’s Information Criterion (AIC; Bozdogan 1987; Wagenmakers and Farrell 2004), except where multiple models were indistinguishable (𝛿AIC < 2), in which case the simplest model structure was selected.
To identify which traits were most associated with individual predictors (species, soil, climate), we created confusion and importance matrices using the package randomForest (version 4.6-14, Liaw and Wiener 2001). Out-of-bag error rates (OOB) were derived from confusion matrices to estimate the relative error of traits in treatment differentiation. OOB values were standardized through the calculation of percent difference from random classification. For the three most deterministic variables, mean decrease accuracies corrected by the sample size are reported as percentages, representing the estimate of misclassification that would occur if a variable were removed from the model. Generalized linear models and analysis of variance (ANOVA) models were fit on both top multiple linear regression models as well as physiological variables with high determinant power. Tukey’s Honest Significant Distance was used post-hoc to identify differences across treatments using the package multcomp (version 1.4-14,Hothorn et al. 2020). This procedure was also used to test biomass differences across climate, species, and soil.
To investigate intra-specific trends, individual biomass was calculated by dividing species biomasses by the number of survivors of each species. The effects of local soil at a specific climate were isolated by calculating the difference in trait responses between the local and reference soil communities. The effects of climate on plant traits were isolated by comparing reference soil communities across the climates. We used a Principal Component Analysis (PCA) using the packageFactoMineR to analyze multivariate data and identify physiological variable contributions to climate and species differences (version 2.4, Lê, Josse, and Husson 2008). Individuals were equally weighted, with variables being positively shifted and logit transformed to standardize relative contributions. PCA dimensional analysis was performed by calculating correlations across maximum height, survival, species biomass, reproduction, leaf dry weight, and leaf area. No derived variables were included in the PCA or dredge modeling to eliminate issues of covariance.