Results

t-SNE vs PCA

Following the normalization of the ADHs, a PCA and t-SNE were implemented on the labelled data (Fig. 3). The PCA suffered from the ”crowding issue”, with points clustered within a relatively narrow region on the 2D map. In contrast, t-SNE had points that were more evenly dispersed across the 2D map, yet with distinct clusters. Classes 1, 2 and 7 were notably separate on both PCA and t-SNE maps, with class 6 more separable on the t-SNE map. Classes 3-5 were mixed together on both maps, yet they form an individual cluster on the t-SNE map that is isolated from the other clusters. All three of these classes were dominated by gauges in the northern part of the domain (Alaska, Yukon, Northwest Territories, and northern British Columbia), highlighting their similarity (Fig. 2). Consequently, we merged Classes 3-5 into one (as Class 5) for subsequent analyses.
Figure 3: 2D map of PCA and t-SNE with labelled ADHs of seven flow regimes. Color of data points indicates flow regime type.
Based on the ADH arrangement on 2D map and KNN classification accuracy, t-SNE was superior to PCA. The PCA and t-SNE transformed data were classified using KNN, with a classification accuracy of 84.5% (PCA) and 90.3%(t-SNE), indicating the t-SNE map had enhanced separability among the flow regimes. A confusion matrix (Tab. 1) indicated that misclassification most often occurred among ADHs from Class 2 (the interior PNW) and Class 7 (predominantly Idaho), which are geographically proximal. This is reasonable as both flow regimes were characterized by a large spring freshet and winter storm events, yet flows in Class 7 typically had a more dominant freshet and less winter events than Class 2. In certain years dominated by either greater/less snow and rain in the winter resulted in ADHs resembling the other class. This highlights the challenge of subjectively labelling ADHs for classification, and suggests that refining the selection of labelled AHDs for these two classes may be warranted.
Figure 4: KNN-classified data points on PCA and t-SNE map. Black points represent misclassified ADHs.
Table 1: Confusion matrix of KNN classification with t-SNE datapoints