Results
t-SNE vs PCA
Following the normalization of the ADHs, a PCA and t-SNE were
implemented on the labelled data (Fig. 3). The PCA suffered from the
”crowding issue”, with points clustered within a relatively narrow
region on the 2D map. In contrast, t-SNE had points that were more
evenly dispersed across the 2D map, yet with distinct clusters. Classes
1, 2 and 7 were notably separate on both PCA and t-SNE maps, with class
6 more separable on the t-SNE map. Classes 3-5 were mixed together on
both maps, yet they form an individual cluster on the t-SNE map that is
isolated from the other clusters. All three of these classes were
dominated by gauges in the northern part of the domain (Alaska, Yukon,
Northwest Territories, and northern British Columbia), highlighting
their similarity (Fig. 2). Consequently, we merged Classes 3-5 into one
(as Class 5) for subsequent analyses.
Figure 3: 2D map of PCA and t-SNE with labelled ADHs of seven flow
regimes. Color of data points indicates flow regime type.
Based on the ADH arrangement on 2D map and KNN classification accuracy,
t-SNE was superior to PCA. The PCA and t-SNE transformed data were
classified using KNN, with a classification accuracy of 84.5% (PCA) and
90.3%(t-SNE), indicating the t-SNE map had enhanced separability among
the flow regimes. A confusion matrix (Tab. 1) indicated that
misclassification most often occurred among ADHs from Class 2 (the
interior PNW) and Class 7 (predominantly Idaho), which are
geographically proximal. This is reasonable as both flow regimes were
characterized by a large spring freshet and winter storm events, yet
flows in Class 7 typically had a more dominant freshet and less winter
events than Class 2. In certain years dominated by either greater/less
snow and rain in the winter resulted in ADHs resembling the other class.
This highlights the challenge of subjectively labelling ADHs for
classification, and suggests that refining the selection of labelled
AHDs for these two classes may be warranted.
Figure 4: KNN-classified data points on PCA and t-SNE map. Black points
represent misclassified ADHs.
Table 1: Confusion matrix of KNN classification with t-SNE datapoints