2.3 Construction of species distribution model
Ten modeling (GLM, GBM, CTA, RF, GAM, ANN, SRE, FDA, MARS and MAXENT) algorithms provided by ”Biomod2” package were used to predict the potential distribution of T. chinense . All models use default parameters except the MAXENT model.
The prediction accuracy of MAXENT model is affected by parameter settings. We tested the complexity and performance of the MAXENT model under different settings of regularization multiplier (RM) and feature class (FC) used the kuenm package in R 3.6.3 (Cobos et al., 2019). Candidate models were created by combining 17 RM values and all 31 possible combinations of five FC (L: linear feature, Q: secondary feature, H: fragmentation feature, P: product feature and T: threshold feature). According to Akaike information criterion (AICc) model of the delta on the choice of the optimal model, when the minimum value AICc (deltaAICc = 0), it is considered to be the optimal model (Cobos et al., 2019). The optimized MAXENT software parameters were RM=3 and FC=LQPT.
In species distribution modeling using the Biomod2 package, 70% of occurrence data was selected as training data, and the rest was used as testing data. The above process has been carried out five times. In order to reduce spatial bias and better simulate the actual distribution of species, we created 5,000 pseudo-absence points, repeated 3 times and modeled. In the end, 150 layers were generated. We evaluated each model using the true skill statistic (TSS) and the area under the receiver operating characteristic curve (AUC) (Bucklin et al., 2015; X. Zhang et al., 2020). The closer the TSS value and AUC value are to 1, the more reliable the prediction will be (Zhao et al., 2021; Freitas et al., 2019). We used the model with large average TSS (≥ 0.8) and AUC (≥ 0.9) values to calculate the final species distribution layer.