2.3 Analysis of influencing factors of the species richness
pattern
2.3.1 Environment variables
Based on previous studies and the four hypotheses, we selected 12
environmental predictors in four categories to evaluate the hypotheses
explaining the distribution pattern of rodent species richness in China.
The hypotheses and their associated variables are
- Energy-water: the availability of energy and water can be measured
using many indicators, such as temperature, precipitation, and solar
radiation (Pandey et al., 2020). We selected annual mean temperature
(MAT), annual precipitation (DVL), potential evapotranspiration (PET),
and actual evapotranspiration (AET) as substitute variables. We
extracted MAT and DVL from the WorldClim (https://worldclim.org/)
database as measures of temperature and water effectiveness variables,
PET and AET were obtained from the Global Land Evaporation Amsterdam
Model (GLEAM) (https://www.gleam.eu/).
- Habitat heterogeneity: the mean elevation (MELV), elevation range
(ELR), and the number of vegetation types (VEG) within a single grid,
the most commonly used predictors to represent information on habitat
heterogeneity, were selected as habitat heterogeneity factors. These
values were obtained from the RESDC.
- Climate seasonality: temperature seasonality (TES), annual temperature
range (ART), and precipitation seasonality (PRS) were used as proxies
for short-term climate seasonality. All factors were obtained from the
WorldClim database download.
- Human factors: we used the human impact index (HII) and human
footprint index (HFI) as proxy variables representing human-induced
effects. The HII and HFI data were downloaded from the archives of the
Wildlife Conservation Society (http://sedac.ciesin.colum
bia.edu/data/).
2.3.2 Data analysis
Species richness was defined as the number of species in each cell grid.
Species richness data usually show non-normal distribution. The species
richness data were square root transformed before regression analysis to
avoid the impact of skewed data distribution on statistical analysis. We
used a simple regression analysis of all species richness, non-endemic
species richness, and endemic species richness for each environmental
variable to explore the potential mechanisms of individual factors in
explaining the distribution patterns of species richness.
To evaluate the relative importance of the predictive variables, we
separated the environmental factors into four distinct predictor sets
based on our main research objectives: (a) energy-water (EW), (b)
climatic seasonality (CS), (c) habitat heterogeneity (HH), and (d) human
factors (HE). Because all predictors were highly correlated, we
eliminated collinearity by performing a principal component analysis
(PCA) in each prediction set. The squared term of the predictor variable
was included in the principal component analysis, considering the
nonlinear relationship between the response variable and environmental
factors. We extracted the first two principal components of each
prediction set, which accounted for 94% of energy-water, 87% of
habitat heterogeneity, 99% of climatic seasonality, and 99% of human
factors (Table S3).
To make the model coefficients comparable, the principal components
extracted from all factor sets were standardized (standard deviation = 1
and mean = 0). Multiple linear regression used ordinary least squares
(OLS) to determine the most appropriate predictors that explain the
richness of the three response variables. We followed the variable
backward selection method to identify the optimal model. The optimal
linear regression model was determined using the stepAIC function in R
in combination with the Akaike Information Criterion (AIC). Variance
inflation factors (VIF) were used to test for multicollinearity between
predictor variables, and we selected predictors with VIF < 5
(Dormann et al., 2013) (Table S4). Because spatial autocorrelation
affects the explanatory power of regression models, the spatial
autocorrelation of the residuals of multiple regression models was
assessed using Moran’s I method. Because the residuals of the multiple
regression models all had significant spatial autocorrelation (p< 0.001), the spatial linear simultaneous autoregressive error
model (SAR) was further developed using the predictor variables from the
optimal model. The explanatory power of the predictor variables for
species richness was measured using Pseudo-R2(the square of the correlation coefficient between the predicted and
actual values of the model for the non-spatial component) (Kissling &
Carl, 2007). The relative importance between the predictor variables was
also compared using standard regression coefficients.
Finally, we performed variance partitioning to assess the pure effects
of the predictor variables and their joint contributions to better
explain the distribution patterns of species richness. Wayne diagrams
were used to show various factor sets’ pure and shared effects.
Statistical analysis for this study was performed in R 4.0.5
(http://www. r-project.org). The ”psych” package was used for principal
component analysis, the ”MASS” package for optimal model selection, the
”vegan” package for variance partitioning, and the ”spedep” package for
spatial autoregressive model building.