Statistical analyses
All analyses were conducted in R version 4.1.1 (R Core Team 2021), generally using the same approach; for brevity, here we first describe our general approach before describing each analysis. We used generalized linear mixed effects models with package ‘glmmTMB’ (version 1.1.4) (Brooks et al.2017). All models incorporated the same random effects structure that reflected our nested study design: (1|site/grid/block). ‘Productivity’, which was included as a predictor in all analyses, was rescaled to center on zero with a standard deviation of 1 using the ‘scale’ function to ease model fitting. For any analyses where alternative model structures needed to be compared, we selected the most appropriate model using an AIC comparison, where the model selected had the lowest AIC by at least 2 points, and we used package ‘DHARMa’ (version 0.4.6; (Hartig 2022) to test model assumptions, for example overdispersion. If any issues were detected, we modified our model until all DHARMa tests were satisfied. To assess the significance of parameters in our models, we used the ‘car’ (version 3.0.12) package, starting with type III sums of squares for models including interactions, defaulting to type II sums of squares if interactions were non-significant (Fox and Weisberg 2019). For plotting, fitted relationships between variables based on our fitted models with 95% confidence intervals were extracted using the ‘ggpredict’ function in the ‘ggeffects’ package (version 1.2.3) (Lüdecke 2018).
We first tested patterns at the species level, examining how growth rates varied across productivity. We applied our general approach above to perform separate analyses for each species, modeling seed production as a Poisson-distributed response variable, and productivity, the presence or absence of neighbors, and their interaction as explanatory variables. Species could differ in whether or not productivity was a linear or quadratic predictor, and whether zero inflation was accounted for as an intercept or allowed to change across the gradient (Table S1). When zero-inflated models were used, the model fits two components: the zero component, which is interpreted as a failure to germinate and survive to adulthood, and the conditional component, which is interpreted as the reproductive output of adults (Bontrager & Angert 2019). We repeated these analyses for each species, with occupancy (i.e., presence/absence) as a binomial-distributed response variable.
We next tested how species-specific responses integrated to predict persistence and occupancy of species at a community level. In each ‘abiotic + biotic’ plot, we calculated the proportion of species that could persist (i.e., λ ≥ 1) and the proportion of species that naturally occurred. These proportions were used as a binomially-distributed response variable, including a ‘weight’ of number of species (i.e., number of ‘trials’; usually four but fewer in cases where tubes were lost). Note that here, species richness was observed out of four potential species, warranting a binomial distribution. For this analysis we included the following predictors: productivity (competing models with linear and quadratic relationships), type of data (i.e., three levels, either persistence with neighbors, persistence without neighbors, or occupancy), and their interaction. Separating our proportions by type of data allowed us to identify how biotic interactions altered mismatches between patterns of occupancy and persistence.
Lastly, we coupled our persistence and occupancy data to determine how dispersal limitation, excess, and species-sorting varied across productivity. When dispersal is limited, species are absent from locations that could otherwise support their persistence (outcome 1), whereas when there is excess dispersal relative to the scale of spatiotemporal variation, leading to sink or pseudo-sink populations, species are present in locations that do not support persistence (outcome 2). Dispersal is ‘sufficient’ (Leibold & Chase 2017) when species are present where they can persist and absent where they cannot, also known as ‘species sorting’ (outcome 3). We categorized each species in each plot as falling in one of these three outcomes; these categorical data were then used as a response variable in a multinomial model using the ‘mblogit’ function in the ‘mclogit package’ (version 0.9.6) (Elff 2023). We chose a multinomial model (analogous to a binomial except allowing more than two outcomes) because the three outcomes were dependent and required to sum to one. As with our other analyses, we used productivity, neighbor treatment, and their interaction as explanatory variables. Predictions for this model were computed with ‘predict’ in contrast to our other analyses that used ‘ggpredict’ as the latter was not compatible with multinomial models.