Discussion
The t-SNE map provides an novel and intuitive way to visualize
similarities for large sets of stream hydrographs, as ADHs with similar
shapes remain close on the 2D map. Here, we propose that the distance
between points on the 2D map can be used as a similarity metric among
watersheds, or for a given watershed the space it occupies on the map
can be used to infer its relative variability in flows. In terms of
grouping watersheds, t-SNE is superior to PCA with respect to
separability of ADHs from different flow regimes. Furthermore, t-SNE is
particularly suited to large data sets, requires less computational
power, and is more interpretable than conventional visualization tools
(i.e. pairwise similarity matrix). While we have only used data from
western North America, t-SNE can be more broadly applied to larger or
more constrained data sets.
If new hydrographs are obtained, ADHs can be projected on the t-SNE map
with the trained encoder, allowing hydrographs to be quickly associated
with like counterparts and information, such as flow regime type,
seasonal pattern, and dominant hydrological processes can be estimated
according to its nearest neighbors on t-SNE map. In this way, this
approach can be used for comparative analysis, flow regime
classification and regionalization and potentially for change detection.
A Novel Similarity Metric
To confirm the validity of t-SNE distance as a metric of similarity, the
pairwise t-SNE distances between ADHs on the 2-D map are compared with
the counterparts of cross correlations (Xcorr), which is a conventional
metric used to measure similarity between time series. Xcorr is
calculated for every pair of ADHs. For every single ADH, the average of
Xcorr for all other ADHs is 0.20, while that for its 10 nearest neighbor
is 0.87. It manifest that ADHs close on t-SNE map indeed share
considerable similarity of flow regime pattern. Regression analysis
between t-SNE distance and Xcorr indicates a significant relationship
(p-value < 0.01) with a Spearman R of -0.80.
Nash-Sutcliffe Efficiency (NSE) is another widely-used metric that
measure the consistency between hydrographs. The relationship between
t-SNE distance and NSE is also statistically significant and with
Spearman R of -0.75.
Cross Correlation Matrix (CCM) is an alternative method to visualize
pairwise similarity between hydrographs, yet is impractical for large
datasets. Our dataset produces a 17110-by-17110 CCM, with more than 140
million entries (after removing twins), making it computationally
expensive to generate and difficult to recognize and interpret patterns
at this scale. On the other hand, the t-SNE map provides an intuitive
and efficient way to visualize similarity of ADHs, which is simply
indicated by their distance on the map.
Misclassification of Mixed
Regimes
Misclassification is a ubiquitous feature of machine learning
algorithms, and Class 2 had the highest misclassification rate in our
procedure. This is unsurprising as Class 2 is a mixed regime with both
snowmelt driven freshet and high flows in winter, is a relative
superposition of Class 7 and Class 1, and lies between them on the 2-D
map. Depending upon winter temperatures, the ADHs in Class 2 tend to
shift towards Class 1 in warm years and Class 7 in cold years, which is
expected as warm anomalies bring rain (which prevails in the PNW) and
cold anomalies enhance snow accumulation and melt (such as in the
Canadian Rockies). We presume that watersheds that are more sensitive to
climate anomalies are also more subject to misclassification.
Labeling Strategy
The performance of the t-SNE map is based on the separability of ADHs
among different flow regimes, and the accuracy of the KNN classification
provides a quantitative measure of this separability. Data quality of
labelled ADHs is critical as it directly links to classification
accuracy. While our strategy allowed a large number of samples to be
labelled in a practical time frame, there is the potential for
mislabeling ADHs for atypical years. The general nature of hydrological
variability can impart a large variance on ADHs, and that in some years
hydrographs have flow patterns that are unrepresentative of their
labelled class. This inherent variability also explains why some samples
from each class are apart from the majority on the t-SNE map.
Manually labelling ADHs is time consuming and there is considerable
subjectivity and process knowledge that is required for meaningful
classification. To remove subjectivity, Generative Adversarial Networks
(GAN) have been used to artificially generate samples that closely mimic
real ones (Goodfellow et al., 2014). GAN samples preserve the main
patterns of the training data and introduce some random variation, and
have considerable potential to create high-quality labelled samples of
ADHs. Fed with a limited number of ADHs, GAN can generate an infinite
number of samples for each flow regime, enlarging the sample size and
reducing the influence of human subjectivity. While not used in this
work, we suggest GAN is a promising tool for improving classification.
Optimal Encoder Selection
An effective encoder is a critical component of this approach as it
determines the reliability of the mapping function between ADHs and the
t-SNE data points and allows insertion of new data onto the t-SNE map.
In this work, we tested 55 encoder models with various network
architectures and activation functions (see Table 2) before selecting
the optimal encoder (i.e. with minimum MAE) for ADH dataset. Here, we
demonstrate the sensitivity of encoders to network depth and activation
function type.
In many deep learning applications, deeper networks prevail due to their
strong capability in recognizing and processing complicated patterns of
data (Goodfellow et al., 2016). However, in our case, the encoder’s
performance did not consistently improve with the depth of network. A
clear reduction in MAE was observed when increasing the number of layers
from one to four, but further deepening of the layers did not show
improvement (Fig. 9). The relative few number of layers was likely due
to the relative simplicity of ADHs compared with photographic images,
which are the most common application of DL research.
Another method to enhance encoder performance was to widen the layers of
the network, as increasing the nodes of the initial layer from 128 to
1024 provided marked improvement in MAE (Fig. 9). However, further
expansion to 2048 nodes only marginally improved encoder’s performance,
suggesting wider layers are unnecessary. In the trials, we found the
nine-layer network with an initial layer of 2048 nodes (N2048-L9, see
Encoder 39 in Table 2) produced the lowest MAE of testing set, so this
architecture is employed by the final encoder.
Using LeakyReLU as the activation function instead of ordinary ReLU
substantially improved the encoder’s performance. Decreasing the values
of a from 0.4 to 0.02, we found the MAE of testing set
continuously declined and reached to the minimum at a =
0. 07, but with a slight increase below that (Fig. 9). With an
appropriate a , LeakyReLU reduced the MAE by 10%. Our final
encoder employed a LeakyReLU with a = 0. 07 as the
activation function.
Figure 9: Changes of encoders performance with width and depth of
network (left) and alpha of LeakyReLU activation function (right).
Gaps of loss between the training and testing set are very common in
supervised learning, as models always tend to perform less reliable on
”unseen” data. However, huge gaps are often considered a sign of
overfitting. In this case, there is a relatively large gap between the
MAE of training and testing set, so Dropout algorithm (Srivastava et
al., 2014) was applied for the top layers in selected the network in
order to improve model generalization. During training, Dropout
deactivate a number of randomly selected nodes (neurons) at each
iteration. As a result, it prevents models from depending exclusively on
certain neurons and therefore improve generalization. In this case, we
found that Dropout effectively narrowed the gap of MAE between training
and testing set. However, the reduction is mostly led by increasing loss
of training set, while the loss of testing set largely remain the same.
This indicated that model generalization had little improvement.
Presumably, a more effective way to enhanced encoder generalization is
to increase diversity and sample size of ADHs datasets.
Potential use of t-SNE
maps
As a clustering technique, t-SNE is a powerful technique that has had
very limited application to hydrological data sets (Mazher, 2020; Liu et
al., 2021), yet there are many potential uses as ’big data’ and machine
learning emerge in hydrology. First, the t-SNE map provides an intuitive
way to visualize the similarity of ADHs and other large time series data
sets, and the encoder improves its practical use by turning the t-SNE
from a non-parametric to a parametric method. It overcomes a major
challenge of t-SNE by allowing new data to be projected on an existing
map as it becomes available. The approach of information visualization
can benefit a broad range of research that seeks to establish similarity
among time series such and need not be restricted to streamflow
hydrographs. For example, watershed classification, regionalization and
streamflow change detection are all potential future areas of research
using this methodology. t-SNE can also be used to infer similarity in
other time series and signals to assess natural groupings and patterns.
Perhaps the most obvious use of t-SNE is in watershed classification as
the parametric t-SNE technique can effectively identify natural
groupings of watersheds and place new data within these groupings. While
there is considerable research over the past few decades on hydrological
classification (Wagener et al., 2007), t-SNE is a novel data-driven
approach that allows fast identification of homogeneous and similar
regimes, and process inference can be quickly transferred to their
nearest neighbours. If the location is close enough on the t-SNE map, it
would be possible to reconstruct hydrographs of ungauged watersheds
according to their nearest neighbours (Patil and Stieglitz, 2012).
Another potential utilization of t-SNE is identifying information
redundancy, which is important in designing hydrometric and other
monitoring networks (Coulibaly et al., 2013). For example, a number of
highly homogeneous watersheds on the t-SNE map from the PNW suggest high
consistency in the hydrographs and potential information redundancy in
the network. However, the validity of information redundancy detected by
t-SNE should be further verified with entropy-based approaches (Singh,
1997; Mishra and Coulibaly, 2009).