Environmental Variables and Model Construction
For Species Distribution Modeling (SDM), we collected 93 records of
occurrence from three sources: (a) our field surveys (16 occurrence
points) carried out in different parts of the study area (2021e2022),
(b) the Global Biodiversity Information Facility (GBIF) and direct
observations up until the 18th of September 2022 (48 occurrence points)
and (c) distribution records compiled by published books and papers (29
occurrence points), then removed duplicated data. This selection process
reduced our occurrence records to 77 data points that were used for the
distribution modeling approach.
We extracted 19 substantial bioclimatic variables (bio1-bio19) from a 30
arc-seconds (~1 km) resolution dataset in
WorldClim-Global Climate data for the H. leucisculus habitat
(http:// www.worldclim.org). In
addition, slope data from the digital elevation model (DEM) of Iran as
an additional geographical input and a Human Footprint Model (HFM)
(Sanderson et al., 2002) to evaluate the anthropogenic effects on theH. leucisculus habitat were used.
All layers were projected onto the UTM grid, with WGS1984 datum. This
bioclimatic and environmental variables are used in the assessment of
freshwater fish species distribution (Warren et al., 2013; Hong et al.,
2022). To bioclimatic variables the Principal Component Analysis (PCA)
was tested for multicollinearity among predictors by calculating
coefficients and used an r < 0.80 criteria to select which
variables in the distribution models for the present study. The final
set of variables were as follows: BIO1 = annual mean temperature; BIO7 =
Temperature annual range; BIO8 = Mean Temperature of Wettest Quarter;
BIO9; Mean temperature of driest quarter, BIO13 = Precipitation of
wettest period; BIO14 = Precipitation of driest period and BIO16 =
Precipitation of wettest quarter. The future distribution pattern ofH. leucisculus for the year 2080 (average of 2061-2080 period)
was estimated according to two Shared Socio-economic Pathways (SSPs):
126 and 585 of the MRI-ESM2 and was estimated based on CMIP6.
For model fitting and evaluation, many pseudo-absence were used in the
Create Random Point tool in ArcGIS 10.8 to provide more accurate
predictions. An ensemble model approach was used to H.
leucisculus distribution model (Thuiller et al., 2009) using R v. 4.1.3
(R Development Core Team., 2014) with the BIOMOD2 package (Thuiller et
al., 2016). The nine modelling techniques including: the Generalized
Linear Model (GLM), Generalized Boosting Method (GBM), Maximum Entropy
(MaxEnt), Classification Tree Analysis (CTA), Artificial Neural Network
(ANN), Surface Range Envelops (SRE), Random Forest (RF), Multivariate
Adaptive Regression Splines (MARS), and Flexible Discriminant Analysis
(FDA), were applied.
Area Under the receiver operating Curve (AUC= ROC), Cohen’s Kappa
(KAPPA) metrics and the True Skill Statistic (TSS) were evaluated to
model performance (Allouche et al., 2006; Zipkin et al., 2012).