2.3 Analysis of influencing factors of the species richness pattern
2.3.1 Environment variables
Based on previous studies and the four hypotheses, we selected 12 environmental predictors in four categories to evaluate the hypotheses explaining the distribution pattern of rodent species richness in China. The hypotheses and their associated variables are
  1. Energy-water: the availability of energy and water can be measured using many indicators, such as temperature, precipitation, and solar radiation (Pandey et al., 2020). We selected annual mean temperature (MAT), annual precipitation (DVL), potential evapotranspiration (PET), and actual evapotranspiration (AET) as substitute variables. We extracted MAT and DVL from the WorldClim (https://worldclim.org/) database as measures of temperature and water effectiveness variables, PET and AET were obtained from the Global Land Evaporation Amsterdam Model (GLEAM) (https://www.gleam.eu/).
  2. Habitat heterogeneity: the mean elevation (MELV), elevation range (ELR), and the number of vegetation types (VEG) within a single grid, the most commonly used predictors to represent information on habitat heterogeneity, were selected as habitat heterogeneity factors. These values were obtained from the RESDC.
  3. Climate seasonality: temperature seasonality (TES), annual temperature range (ART), and precipitation seasonality (PRS) were used as proxies for short-term climate seasonality. All factors were obtained from the WorldClim database download.
  4. Human factors: we used the human impact index (HII) and human footprint index (HFI) as proxy variables representing human-induced effects. The HII and HFI data were downloaded from the archives of the Wildlife Conservation Society (http://sedac.ciesin.colum bia.edu/data/).
2.3.2 Data analysis
Species richness was defined as the number of species in each cell grid. Species richness data usually show non-normal distribution. The species richness data were square root transformed before regression analysis to avoid the impact of skewed data distribution on statistical analysis. We used a simple regression analysis of all species richness, non-endemic species richness, and endemic species richness for each environmental variable to explore the potential mechanisms of individual factors in explaining the distribution patterns of species richness.
To evaluate the relative importance of the predictive variables, we separated the environmental factors into four distinct predictor sets based on our main research objectives: (a) energy-water (EW), (b) climatic seasonality (CS), (c) habitat heterogeneity (HH), and (d) human factors (HE). Because all predictors were highly correlated, we eliminated collinearity by performing a principal component analysis (PCA) in each prediction set. The squared term of the predictor variable was included in the principal component analysis, considering the nonlinear relationship between the response variable and environmental factors. We extracted the first two principal components of each prediction set, which accounted for 94% of energy-water, 87% of habitat heterogeneity, 99% of climatic seasonality, and 99% of human factors (Table S3).
To make the model coefficients comparable, the principal components extracted from all factor sets were standardized (standard deviation = 1 and mean = 0). Multiple linear regression used ordinary least squares (OLS) to determine the most appropriate predictors that explain the richness of the three response variables. We followed the variable backward selection method to identify the optimal model. The optimal linear regression model was determined using the stepAIC function in R in combination with the Akaike Information Criterion (AIC). Variance inflation factors (VIF) were used to test for multicollinearity between predictor variables, and we selected predictors with VIF < 5 (Dormann et al., 2013) (Table S4). Because spatial autocorrelation affects the explanatory power of regression models, the spatial autocorrelation of the residuals of multiple regression models was assessed using Moran’s I method. Because the residuals of the multiple regression models all had significant spatial autocorrelation (p< 0.001), the spatial linear simultaneous autoregressive error model (SAR) was further developed using the predictor variables from the optimal model. The explanatory power of the predictor variables for species richness was measured using Pseudo-R2(the square of the correlation coefficient between the predicted and actual values of the model for the non-spatial component) (Kissling & Carl, 2007). The relative importance between the predictor variables was also compared using standard regression coefficients.
Finally, we performed variance partitioning to assess the pure effects of the predictor variables and their joint contributions to better explain the distribution patterns of species richness. Wayne diagrams were used to show various factor sets’ pure and shared effects.
Statistical analysis for this study was performed in R 4.0.5 (http://www. r-project.org). The ”psych” package was used for principal component analysis, the ”MASS” package for optimal model selection, the ”vegan” package for variance partitioning, and the ”spedep” package for spatial autoregressive model building.