Materials and Methods

Our data collection and processing are done entirely in the free, cloud-based geospatial platform Google Earth Engine (GEE) (Gorelick et al., 2017). GEE allows us to analyze large spatial datasets in the cloud without downloading the source imagery, which means our methods scale without having to manage storage or bandwidth issues. Our methods are optimized to run quickly on GEE’s framework by calculating all change metrics using raster operations and then reducing the data to a table format before exporting. We used Python 3 and Geopandas for post processing and analysis of the vector data (Jordahl et al., 2021).

Water classification data

Accurate classification of water and land from satellite images is foundational to our calculations of bank migration. As such, we compare two widely used and validated water occurrence datasets based on the Landsat series of satellites. The first is the Joint Research Centre’s (JRC) yearly water classification dataset, where each pixel is marked as either no data, not water, seasonal water, or permanent water (Pekel et al., 2016). Pixels marked seasonal water are pixels that were classified as water for at least one month but not every month with valid observations. The permanent water classification represents pixels that were observed to be water in all months with valid observations. We considered both permanent and seasonal water classifications for our binary water masks because permanent water classification is sensitive to omission errors and often results in disconnected river reaches. The other water classification dataset we use, by Pickens et al. (2020), is a classification tree model applied to the Landsat archive starting in 1999. Unlike the JRC dataset, each pixel in the yearly composites from the Pickens dataset represent the percentage of images that were classified as water. We selected a 70% occurrence threshold in the Pickens dataset which approximated the JRC seasonal and permanent classifications well. Visual inspection of this threshold showed stable river forms with minimal flooding or disconnected river reaches. The JRC dataset ranges from 1984 to 2020 at the time of writing and is currently updated annually. However, data are not equally available through time and space. Notably, some regions in northeast Siberia do not have any images before 1999 due to limited downlink capability and no on-board image storage on the Landsat-5 satellite. The launch of Landsat-7 in April 1999 increased spatial coverage when operating in tandem with Landsat-5 and included on-board image storage, enabling the capture of up to 10 scenes to be stored and transmitted later when the satellite was in range of a ground station (Goward et al., 2001). As such, we begin our analysis in 2000, when Landsat-7 records became fully available. Still, there are unavoidable problems with optical remote sensing that impact the quality and continuity of the Landsat record. Clouds, smoke, and the seasonal lack of sunlight at high latitudes can all limit the utility of images. Any dataset that uses optical satellite images, including the two water occurrence datasets used in this study, are prone to incompleteness and biases because of these problems and limitations. Even so, temporal continuity and relatively high resolution make the Landsat archive the most viable satellite dataset for morphological analyses of the world’s large rivers over time.

River data

We rely on the SWOT River Database (SWORD, Altenau et al., 2021) to inform our search area for river classification and to organize our pixel-level data into a vector product that includes river topology, unique identifiers, and other morphological variables. SWORD is based on the Global River Widths from Landsat dataset (Allen and Pavelsky, 2018) but with several improvements that increase the topological continuity, add additional reach identifiers that relate to other hydrographic datasets, and resample reaches to a more uniform length of approximately ten kilometers. Within each reach, the river is divided into nodes approximately every 200 meters that track the river centerline. We also use SWORD to filter small river reaches that we will not be able to reliably measure with Landsat. We limit migration calculations to river reaches with a mean width greater than 150 m, or approximately 5 Landsat pixels. We based this threshold on manual inspection of river classifications as well as our uncertainty estimates (section 2.5).

River identification

Rivers occupy only about 0.6% of earth’s unglaciated land surface area, and as such, about the same percentage of pixels in satellite imagery (Allen and Pavelsky, 2018). We rely on the SWORD river centerline dataset to limit the scope of our search for rivers and greatly reduce the volume of pixels that we must process and classify. Identifying a river in a multispectral image or binary watermask is visually easy but often challenging with automated methods. In most environments, rivers display a unique shape and or color when compared to lakes in the floodplains. Rivers and lakes take on a wide range of morphologies, but, in general, rivers form sinuous channels that traverse long distances, whereas lakes are rounder and isolated. Identifying river pixels is often done by intersecting the river centerline with a watermask, such as in the RivWidthCloud algorithm (Yang et al., 2020). However, if the static centerline is inaccurate, whether by error or because of riverbank migration, using the intersection with a static centerline to identify the movement of rivers will often fail. Other approaches for identifying rivers include selecting polygons with high length to width ratios (Isikdogan et al., 2017) or polygons that intersect the edge of an image (Kyzivat et al., 2019). We combine these methods by selecting the polygon with the largest perimeter within the area buffered on the prior centerline dataset. We buffer the river centerline by 10 rivers widths, which approximates the meander amplitude (Leuven et al., 2018), and thus the approximate maximum distance the channel will move before reversing direction. Selecting the largest polygon by perimeter would identify many lakes and reservoirs in a global search, but we largely avoid this issue by limiting our search to the area within 10 river widths of the static river centerline and by excluding any pixels that are included in the HydroLakes dataset (Messager et al., 2016). These methods allow us to produce global annual maps of river pixels from 2000-2021 suitable for detection of bank migration over time.

Bank Migration

We use an area-based approach to quantifying riverbank erosion and accretion, similar to the SCREAM method from Rowland et al., (2016). This approach is advantageous for global applications because it can better represent multithreaded rivers than centerline-based methods do, and raster operations are much faster than vector operations in GEE. We measured riverbank erosion and accretion by comparing river inundation masks from different years and counting the pixels that transition between river and land. Pixels that transition from river to land are marked as accretion, and pixels that transition from land to river are marked as erosion (Figure 1b). Our first step is to calculate the area and bank characteristics of the individual images. We locate the bank pixels by dilating the river mask by one pixel, subtracting the original river mask, and then calculate the bank aspect by convolving the bank pixels with a directional kernel (Figure 1c). We measure the standard deviation of local bank aspect as a proxy for bank curvature for each side of the banks. Measuring bank curvature typically uses the radius of a best fit circle for a section of river, which is a slow process in the GEE framework. Next, we calculate the pixel-scale magnitude and direction of change using the same methods in Rowland et al. (2016; Figure 1d). The magnitude of erosion is calculated for each bank pixel in the first image as the shortest path to any of the bank pixels in the second image, where each path is constrained to only traverse pixels that were eroded. For islands and bars that have entirely eroded, the distance is calculated from the island perimeter to the island centroid. The result of these distance calculations is a raster surface that slopes between the banks in the two images, and we use the gradient of this surface to calculate the direction of erosion. Accretion is calculated in the same manner with the images switched and the paths constrained to the accreted area. These change metrics are relatively simple, efficient to calculate, and describe the change in pixel-by-pixel detail. While the raster data preserve all the detail in the original river masks and our change calculations, that fidelity comes with high cost when exporting and analyzing the data. We exported this data only in limited cases while developing the methods and visualizing the results over small regions and are not available globally.
To produce the structured global river erosion data product we summarize the raster data to a vector product by reducing the pixels to their nearest river centerline node in SWORD (Figure 1e). In our final data, the vector dataset includes the original SWORD hydrography data (centerline location, upstream and downstream connections, Pfafstetter basin codes, flow accumulation values, slope, mean width, sinuosity, etc.) and a summary of the change calculations (total river area in time 1 and time 2, accreted area, eroded area, average rate of change, average direction of change, variance in direction, and uncertainty estimates) at >2.2 million nodes spaced approximately 200 m apart. We further aggregate the data up to the reach and watershed scales for analysis, which allows flexibility in future analysis. For example, while we present average bank migration values at large scales, others may be interested in the maximum value for a particular reach. The high-resolution node data also allows for resampling the data over custom reaches and extents, beyond the reach and basin structure in SWORD.