2.2 Distribution pattern calculation
We compiled a database of rodent species distribution in China. Species distribution data were obtained mainly from the following sources: 1) the research results of Zhou et al. (2002) and Xing et al. (2008); 2) National Zoological Museum of China, NZMC; 3) Global Biodiversity Information Facility (GBIF); and 4) distribution and collection records available in books or literature (Jiang et al., 2015; Ge et al., 2018; Liu et al., 2019; Li et al., 2019; Cheng et al., 2021; Jackson et al., 2022). After removing null values, offset values, and redundant data from the distribution records, 237 species of rodents in two orders were included in the analysis of this study. There were 67 endemic and 170 non-endemic species (Jiang et al., 2015; Wei et al., 2021) (Table S1).
MaxEnt (v3.4.1) was used for ecological niche modeling (ENM) of potential rodent habitat areas in China. Considering that the MaxEnt model requires at least five different coordinate values for each species to produce more accurate results, six points were used as the minimum criteria for calculating species distribution in this study. The potential habitats of 210 rodent species with six or more distribution points were simulated using ENM to determine the potential species richness of rodents in China. Based on the characteristics of distribution data and rodent habits, 26 environmental variables were selected. The five categories of predictors were climate, topography, vegetation, soil, and human activity intensity (Table S2). Chinese administrative vector boundaries were obtained from the Data Center for Resources and Environmental Sciences at the Chinese Academy of Sciences (RESDC) (http://www.resdc.cn) and converted to 1 km2resolution.
The correlation of environmental variables was detected using the ENMTools (Warren et al., 2021) package in R 4.0.5 (http://www.r-project.org). The variables that were not highly correlated (r < 0.7) were used in the model prediction to reduce the complexity of the model (Table S2). The percentage of random test data was set to 25%, 10 sub-models were generated using the bootstrap function of the MaxEnt model, and the average of the output of the 10 sub-models on each image element was calculated as the final prediction result of the species. Because each species has a different degree of tolerance to the environment, the suitable habitat threshold for each species was divided based on the critical value of the available distribution records. The growth suitability at each sampling point was extracted from the plot of the calculated growth suitability. The standard deviation σ and mean value μ were calculated according to the theory of normal distribution, μ-σ was selected as the threshold value, transforming the species distribution probability maps into 0/1 binary distribution maps. The model accuracy was evaluated using receiver operating characteristic (ROC) curves. The area enclosed by the ROC curve and horizontal axis is the AUC value (Hanley & McNeil, 1982), which can be used to measure the strengths and weaknesses of the model. For species with predicted AUC values less than 0.8, the MaxEnt model was optimized using the ENMeveal package in R (Muscarella et al., 2014), and the model was run again.
The distribution ranges for the 27 species with less than six recorded distribution points defaults to the grids where the distribution points were located. The distribution range layer was converted into a 0/1 binary distribution map. Finally, the binary distribution map of 237 species was superimposed on the grid map, and the number of species appearing in a single grid was counted to obtain the species richness distribution map.