Environmental Variables and Model Construction
For Species Distribution Modeling (SDM), we collected 93 records of occurrence from three sources: (a) our field surveys (16 occurrence points) carried out in different parts of the study area (2021e2022), (b) the Global Biodiversity Information Facility (GBIF) and direct observations up until the 18th of September 2022 (48 occurrence points) and (c) distribution records compiled by published books and papers (29 occurrence points), then removed duplicated data. This selection process reduced our occurrence records to 77 data points that were used for the distribution modeling approach.
We extracted 19 substantial bioclimatic variables (bio1-bio19) from a 30 arc-seconds (~1 km) resolution dataset in WorldClim-Global Climate data for the H. leucisculus habitat (http:// www.worldclim.org). In addition, slope data from the digital elevation model (DEM) of Iran as an additional geographical input and a Human Footprint Model (HFM) (Sanderson et al., 2002) to evaluate the anthropogenic effects on theH. leucisculus habitat were used.
All layers were projected onto the UTM grid, with WGS1984 datum. This bioclimatic and environmental variables are used in the assessment of freshwater fish species distribution (Warren et al., 2013; Hong et al., 2022). To bioclimatic variables the Principal Component Analysis (PCA) was tested for multicollinearity among predictors by calculating coefficients and used an r < 0.80 criteria to select which variables in the distribution models for the present study. The final set of variables were as follows: BIO1 = annual mean temperature; BIO7 = Temperature annual range; BIO8 = Mean Temperature of Wettest Quarter; BIO9; Mean temperature of driest quarter, BIO13 = Precipitation of wettest period; BIO14 = Precipitation of driest period and BIO16 = Precipitation of wettest quarter. The future distribution pattern ofH. leucisculus for the year 2080 (average of 2061-2080 period) was estimated according to two Shared Socio-economic Pathways (SSPs): 126 and 585 of the MRI-ESM2 and was estimated based on CMIP6.
For model fitting and evaluation, many pseudo-absence were used in the Create Random Point tool in ArcGIS 10.8 to provide more accurate predictions. An ensemble model approach was used to H. leucisculus distribution model (Thuiller et al., 2009) using R v. 4.1.3 (R Development Core Team., 2014) with the BIOMOD2 package (Thuiller et al., 2016). The nine modelling techniques including: the Generalized Linear Model (GLM), Generalized Boosting Method (GBM), Maximum Entropy (MaxEnt), Classification Tree Analysis (CTA), Artificial Neural Network (ANN), Surface Range Envelops (SRE), Random Forest (RF), Multivariate Adaptive Regression Splines (MARS), and Flexible Discriminant Analysis (FDA), were applied.
Area Under the receiver operating Curve (AUC= ROC), Cohen’s Kappa (KAPPA) metrics and the True Skill Statistic (TSS) were evaluated to model performance (Allouche et al., 2006; Zipkin et al., 2012).