Statistical analyses
All analyses were conducted in R version 4.1.1 (R Core Team 2021),
generally using the same approach; for brevity, here we first describe
our general approach before describing each analysis. We used
generalized linear mixed effects models with package ‘glmmTMB’ (version
1.1.4) (Brooks et al.2017). All models incorporated the same random effects structure that
reflected our nested study design: (1|site/grid/block).
‘Productivity’, which was included as a predictor in all analyses, was
rescaled to center on zero with a standard deviation of 1 using the
‘scale’ function to ease model fitting. For any analyses where
alternative model structures needed to be compared, we selected the most
appropriate model using an AIC comparison, where the model selected had
the lowest AIC by at least 2 points, and we used package ‘DHARMa’
(version 0.4.6; (Hartig
2022) to test model assumptions, for example overdispersion. If any
issues were detected, we modified our model until all DHARMa tests were
satisfied. To assess the significance of parameters in our models, we
used the ‘car’ (version 3.0.12) package, starting with type III sums of
squares for models including interactions, defaulting to type II sums of
squares if interactions were non-significant (Fox and Weisberg 2019).
For plotting, fitted relationships between variables based on our fitted
models with 95% confidence intervals were extracted using the
‘ggpredict’ function in the ‘ggeffects’ package (version 1.2.3)
(Lüdecke 2018).
We first tested patterns at the species level, examining how growth
rates varied across productivity. We applied our general approach above
to perform separate analyses for each species, modeling seed production
as a Poisson-distributed response variable, and productivity, the
presence or absence of neighbors, and their interaction as explanatory
variables. Species could differ in whether or not productivity was a
linear or quadratic predictor, and whether zero inflation was accounted
for as an intercept or allowed to change across the gradient (Table S1).
When zero-inflated models were used, the model fits two components: the
zero component, which is interpreted as a failure to germinate and
survive to adulthood, and the conditional component, which is
interpreted as the reproductive output of adults
(Bontrager & Angert 2019).
We repeated these analyses for each species, with occupancy (i.e.,
presence/absence) as a binomial-distributed response variable.
We next tested how species-specific responses integrated to predict
persistence and occupancy of species at a community level. In each
‘abiotic + biotic’ plot, we calculated the proportion of species that
could persist (i.e., λ ≥ 1) and the proportion of species that naturally
occurred. These proportions were used as a binomially-distributed
response variable, including a ‘weight’ of number of species (i.e.,
number of ‘trials’; usually four but fewer in cases where tubes were
lost). Note that here, species richness was observed out of four
potential species, warranting a binomial distribution. For this analysis
we included the following predictors: productivity (competing models
with linear and quadratic relationships), type of data (i.e., three
levels, either persistence with neighbors, persistence without
neighbors, or occupancy), and their interaction. Separating our
proportions by type of data allowed us to identify how biotic
interactions altered mismatches between patterns of occupancy and
persistence.
Lastly, we coupled our persistence and occupancy data to determine how
dispersal limitation, excess, and species-sorting varied across
productivity. When dispersal is limited, species are absent from
locations that could otherwise support their persistence (outcome 1),
whereas when there is excess dispersal relative to the scale of
spatiotemporal variation, leading to sink or pseudo-sink populations,
species are present in locations that do not support persistence
(outcome 2). Dispersal is ‘sufficient’
(Leibold & Chase 2017) when
species are present where they can persist and absent where they cannot,
also known as ‘species sorting’ (outcome 3). We categorized each species
in each plot as falling in one of these three outcomes; these
categorical data were then used as a response variable in a multinomial
model using the ‘mblogit’ function in the ‘mclogit package’ (version
0.9.6) (Elff 2023). We chose
a multinomial model (analogous to a binomial except allowing more than
two outcomes) because the three outcomes were dependent and required to
sum to one. As with our other analyses, we used productivity, neighbor
treatment, and their interaction as explanatory variables. Predictions
for this model were computed with ‘predict’ in contrast to our other
analyses that used ‘ggpredict’ as the latter was not compatible with
multinomial models.