Discussion

The t-SNE map provides an novel and intuitive way to visualize similarities for large sets of stream hydrographs, as ADHs with similar shapes remain close on the 2D map. Here, we propose that the distance between points on the 2D map can be used as a similarity metric among watersheds, or for a given watershed the space it occupies on the map can be used to infer its relative variability in flows. In terms of grouping watersheds, t-SNE is superior to PCA with respect to separability of ADHs from different flow regimes. Furthermore, t-SNE is particularly suited to large data sets, requires less computational power, and is more interpretable than conventional visualization tools (i.e. pairwise similarity matrix). While we have only used data from western North America, t-SNE can be more broadly applied to larger or more constrained data sets.
If new hydrographs are obtained, ADHs can be projected on the t-SNE map with the trained encoder, allowing hydrographs to be quickly associated with like counterparts and information, such as flow regime type, seasonal pattern, and dominant hydrological processes can be estimated according to its nearest neighbors on t-SNE map. In this way, this approach can be used for comparative analysis, flow regime classification and regionalization and potentially for change detection.

A Novel Similarity Metric

To confirm the validity of t-SNE distance as a metric of similarity, the pairwise t-SNE distances between ADHs on the 2-D map are compared with the counterparts of cross correlations (Xcorr), which is a conventional metric used to measure similarity between time series. Xcorr is calculated for every pair of ADHs. For every single ADH, the average of Xcorr for all other ADHs is 0.20, while that for its 10 nearest neighbor is 0.87. It manifest that ADHs close on t-SNE map indeed share considerable similarity of flow regime pattern. Regression analysis between t-SNE distance and Xcorr indicates a significant relationship (p-value < 0.01) with a Spearman R of -0.80. Nash-Sutcliffe Efficiency (NSE) is another widely-used metric that measure the consistency between hydrographs. The relationship between t-SNE distance and NSE is also statistically significant and with Spearman R of -0.75.
Cross Correlation Matrix (CCM) is an alternative method to visualize pairwise similarity between hydrographs, yet is impractical for large datasets. Our dataset produces a 17110-by-17110 CCM, with more than 140 million entries (after removing twins), making it computationally expensive to generate and difficult to recognize and interpret patterns at this scale. On the other hand, the t-SNE map provides an intuitive and efficient way to visualize similarity of ADHs, which is simply indicated by their distance on the map.

Misclassification of Mixed Regimes

Misclassification is a ubiquitous feature of machine learning algorithms, and Class 2 had the highest misclassification rate in our procedure. This is unsurprising as Class 2 is a mixed regime with both snowmelt driven freshet and high flows in winter, is a relative superposition of Class 7 and Class 1, and lies between them on the 2-D map. Depending upon winter temperatures, the ADHs in Class 2 tend to shift towards Class 1 in warm years and Class 7 in cold years, which is expected as warm anomalies bring rain (which prevails in the PNW) and cold anomalies enhance snow accumulation and melt (such as in the Canadian Rockies). We presume that watersheds that are more sensitive to climate anomalies are also more subject to misclassification.

Labeling Strategy

The performance of the t-SNE map is based on the separability of ADHs among different flow regimes, and the accuracy of the KNN classification provides a quantitative measure of this separability. Data quality of labelled ADHs is critical as it directly links to classification accuracy. While our strategy allowed a large number of samples to be labelled in a practical time frame, there is the potential for mislabeling ADHs for atypical years. The general nature of hydrological variability can impart a large variance on ADHs, and that in some years hydrographs have flow patterns that are unrepresentative of their labelled class. This inherent variability also explains why some samples from each class are apart from the majority on the t-SNE map.
Manually labelling ADHs is time consuming and there is considerable subjectivity and process knowledge that is required for meaningful classification. To remove subjectivity, Generative Adversarial Networks (GAN) have been used to artificially generate samples that closely mimic real ones (Goodfellow et al., 2014). GAN samples preserve the main patterns of the training data and introduce some random variation, and have considerable potential to create high-quality labelled samples of ADHs. Fed with a limited number of ADHs, GAN can generate an infinite number of samples for each flow regime, enlarging the sample size and reducing the influence of human subjectivity. While not used in this work, we suggest GAN is a promising tool for improving classification.

Optimal Encoder Selection

An effective encoder is a critical component of this approach as it determines the reliability of the mapping function between ADHs and the t-SNE data points and allows insertion of new data onto the t-SNE map. In this work, we tested 55 encoder models with various network architectures and activation functions (see Table 2) before selecting the optimal encoder (i.e. with minimum MAE) for ADH dataset. Here, we demonstrate the sensitivity of encoders to network depth and activation function type.
In many deep learning applications, deeper networks prevail due to their strong capability in recognizing and processing complicated patterns of data (Goodfellow et al., 2016). However, in our case, the encoder’s performance did not consistently improve with the depth of network. A clear reduction in MAE was observed when increasing the number of layers from one to four, but further deepening of the layers did not show improvement (Fig. 9). The relative few number of layers was likely due to the relative simplicity of ADHs compared with photographic images, which are the most common application of DL research.
Another method to enhance encoder performance was to widen the layers of the network, as increasing the nodes of the initial layer from 128 to 1024 provided marked improvement in MAE (Fig. 9). However, further expansion to 2048 nodes only marginally improved encoder’s performance, suggesting wider layers are unnecessary. In the trials, we found the nine-layer network with an initial layer of 2048 nodes (N2048-L9, see Encoder 39 in Table 2) produced the lowest MAE of testing set, so this architecture is employed by the final encoder.
Using LeakyReLU as the activation function instead of ordinary ReLU substantially improved the encoder’s performance. Decreasing the values of a from 0.4 to 0.02, we found the MAE of testing set continuously declined and reached to the minimum at a = 0. 07, but with a slight increase below that (Fig. 9). With an appropriate a , LeakyReLU reduced the MAE by 10%. Our final encoder employed a LeakyReLU with a = 0. 07 as the activation function.
Figure 9: Changes of encoders performance with width and depth of network (left) and alpha of LeakyReLU activation function (right).
Gaps of loss between the training and testing set are very common in supervised learning, as models always tend to perform less reliable on ”unseen” data. However, huge gaps are often considered a sign of overfitting. In this case, there is a relatively large gap between the MAE of training and testing set, so Dropout algorithm (Srivastava et al., 2014) was applied for the top layers in selected the network in order to improve model generalization. During training, Dropout deactivate a number of randomly selected nodes (neurons) at each iteration. As a result, it prevents models from depending exclusively on certain neurons and therefore improve generalization. In this case, we found that Dropout effectively narrowed the gap of MAE between training and testing set. However, the reduction is mostly led by increasing loss of training set, while the loss of testing set largely remain the same. This indicated that model generalization had little improvement. Presumably, a more effective way to enhanced encoder generalization is to increase diversity and sample size of ADHs datasets.

Potential use of t-SNE maps

As a clustering technique, t-SNE is a powerful technique that has had very limited application to hydrological data sets (Mazher, 2020; Liu et al., 2021), yet there are many potential uses as ’big data’ and machine learning emerge in hydrology. First, the t-SNE map provides an intuitive way to visualize the similarity of ADHs and other large time series data sets, and the encoder improves its practical use by turning the t-SNE from a non-parametric to a parametric method. It overcomes a major challenge of t-SNE by allowing new data to be projected on an existing map as it becomes available. The approach of information visualization can benefit a broad range of research that seeks to establish similarity among time series such and need not be restricted to streamflow hydrographs. For example, watershed classification, regionalization and streamflow change detection are all potential future areas of research using this methodology. t-SNE can also be used to infer similarity in other time series and signals to assess natural groupings and patterns.
Perhaps the most obvious use of t-SNE is in watershed classification as the parametric t-SNE technique can effectively identify natural groupings of watersheds and place new data within these groupings. While there is considerable research over the past few decades on hydrological classification (Wagener et al., 2007), t-SNE is a novel data-driven approach that allows fast identification of homogeneous and similar regimes, and process inference can be quickly transferred to their nearest neighbours. If the location is close enough on the t-SNE map, it would be possible to reconstruct hydrographs of ungauged watersheds according to their nearest neighbours (Patil and Stieglitz, 2012). Another potential utilization of t-SNE is identifying information redundancy, which is important in designing hydrometric and other monitoring networks (Coulibaly et al., 2013). For example, a number of highly homogeneous watersheds on the t-SNE map from the PNW suggest high consistency in the hydrographs and potential information redundancy in the network. However, the validity of information redundancy detected by t-SNE should be further verified with entropy-based approaches (Singh, 1997; Mishra and Coulibaly, 2009).