After performing the PCA, we selected attributes that showed the highest
variance among all principal components as input for RFA, totalizing 12
attributes: aridity index, precipitation seasonality, water table depth
(WTD), height above the nearest drainage (HAND), reservoir area,
hydrological disturbance index, streamflow elasticity, porosity,
permeability, hydraulic conductivity, mean elevation, and mean slope.
The precipitation seasonality indicates the timing between the
precipitation seasonal cycle and the temperature seasonal cycle. Values
of this attribute close to +1 indicate the occurrence of summer
precipitation while values close to -1 indicate winter precipitation
(Almagro et al., 2020). Additionally, we added the Brazilian biomes —
Amazon, Cerrado, Caatinga, Atlantic Forest, Pantanal, and Pampa — and
soil texture — clay, clay loam, loam, sandy clay, sandy loam, and
sandy clay-loam — as categorical variables to the analysis by using
the One-Hot encoding method (Pedregosa et al. , 2011). This method
converted these variables into numerical ones by treating them with
equal order.
We applied the classifier and regressor classes of the Random Forest
algorithm (Pedregosa et al. , 2011) to a total of 24 attributes.
The classifier class correlated the 12 attributes to ECI values by the
majority vote across the decision trees while the regressor considered
the average correlation in the ensemble of the decision trees. We also
applied 10-fold cross-validation and tested different hyper-parameters,
such as numbers of ensembles and the maximum depth of the trees to
control the quality of the forest. All analyses were carried out by
using a Python script available at
http://doi.org/10.5281/zenodo.4247710.
RESULTS
The effective area of about 16% of the studied catchments was larger
than double (dark blue circles on the coast) of their corresponding
topographic areas. On the other hand, 13% of the effective catchment
areas were smaller than half (dark red circles in the northeast) of
their topographic areas (Figure 3, the histogram is available in S2). A
clear pattern was noted in Caatinga, Cerrado, and Atlantic Forest
although we did not observe a clear tendency of an ECI sign in the
Amazon, Pampa, and Pantanal biomes. In the Caatinga (predominantly
semiarid region) and the Cerrado biomes, our analysis demonstrated that
catchments have their effective area smaller than the topographic area
whilst most catchments presented the effective area larger than the
topographic area in the Atlantic Forest biome.