Materials and Methods
Our data collection and processing are done entirely in the free,
cloud-based geospatial platform Google Earth Engine (GEE) (Gorelick et
al., 2017). GEE allows us to analyze large spatial datasets in the cloud
without downloading the source imagery, which means our methods scale
without having to manage storage or bandwidth issues. Our methods are
optimized to run quickly on GEE’s framework by calculating all change
metrics using raster operations and then reducing the data to a table
format before exporting. We used Python 3 and Geopandas for post
processing and analysis of the vector data (Jordahl et al., 2021).
Water classification data
Accurate classification of water and land from satellite images is
foundational to our calculations of bank migration. As such, we compare
two widely used and validated water occurrence datasets based on the
Landsat series of satellites. The first is the Joint Research Centre’s
(JRC) yearly water classification dataset, where each pixel is marked as
either no data, not water, seasonal water, or permanent water (Pekel et
al., 2016). Pixels marked seasonal water are pixels that were classified
as water for at least one month but not every month with valid
observations. The permanent water classification represents pixels that
were observed to be water in all months with valid observations. We
considered both permanent and seasonal water classifications for our
binary water masks because permanent water classification is sensitive
to omission errors and often results in disconnected river reaches. The
other water classification dataset we use, by Pickens et al. (2020), is
a classification tree model applied to the Landsat archive starting in
1999. Unlike the JRC dataset, each pixel in the yearly composites from
the Pickens dataset represent the percentage of images that were
classified as water. We selected a 70% occurrence threshold in the
Pickens dataset which approximated the JRC seasonal and permanent
classifications well. Visual inspection of this threshold showed stable
river forms with minimal flooding or disconnected river reaches. The JRC
dataset ranges from 1984 to 2020 at the time of writing and is currently
updated annually. However, data are not equally available through time
and space. Notably, some regions in northeast Siberia do not have any
images before 1999 due to limited downlink capability and no on-board
image storage on the Landsat-5 satellite. The launch of Landsat-7 in
April 1999 increased spatial coverage when operating in tandem with
Landsat-5 and included on-board image storage, enabling the capture of
up to 10 scenes to be stored and transmitted later when the satellite
was in range of a ground station (Goward et al., 2001). As such, we
begin our analysis in 2000, when Landsat-7 records became fully
available. Still, there are unavoidable problems with optical remote
sensing that impact the quality and continuity of the Landsat record.
Clouds, smoke, and the seasonal lack of sunlight at high latitudes can
all limit the utility of images. Any dataset that uses optical satellite
images, including the two water occurrence datasets used in this study,
are prone to incompleteness and biases because of these problems and
limitations. Even so, temporal continuity and relatively high resolution
make the Landsat archive the most viable satellite dataset for
morphological analyses of the world’s large rivers over time.
River data
We rely on the SWOT River Database (SWORD, Altenau et al., 2021) to
inform our search area for river classification and to organize our
pixel-level data into a vector product that includes river topology,
unique identifiers, and other morphological variables. SWORD is based on
the Global River Widths from Landsat dataset (Allen and Pavelsky, 2018)
but with several improvements that increase the topological continuity,
add additional reach identifiers that relate to other hydrographic
datasets, and resample reaches to a more uniform length of approximately
ten kilometers. Within each reach, the river is divided into nodes
approximately every 200 meters that track the river centerline. We also
use SWORD to filter small river reaches that we will not be able to
reliably measure with Landsat. We limit migration calculations to river
reaches with a mean width greater than 150 m, or approximately 5 Landsat
pixels. We based this threshold on manual inspection of river
classifications as well as our uncertainty estimates (section 2.5).
River identification
Rivers occupy only about 0.6% of earth’s unglaciated land surface area,
and as such, about the same percentage of pixels in satellite imagery
(Allen and Pavelsky, 2018). We rely on the SWORD river centerline
dataset to limit the scope of our search for rivers and greatly reduce
the volume of pixels that we must process and classify. Identifying a
river in a multispectral image or binary watermask is visually easy but
often challenging with automated methods. In most environments, rivers
display a unique shape and or color when compared to lakes in the
floodplains. Rivers and lakes take on a wide range of morphologies, but,
in general, rivers form sinuous channels that traverse long distances,
whereas lakes are rounder and isolated. Identifying river pixels is
often done by intersecting the river centerline with a watermask, such
as in the RivWidthCloud algorithm (Yang et al., 2020). However, if the
static centerline is inaccurate, whether by error or because of
riverbank migration, using the intersection with a static centerline to
identify the movement of rivers will often fail. Other approaches for
identifying rivers include selecting polygons with high length to width
ratios (Isikdogan et al., 2017) or polygons that intersect the edge of
an image (Kyzivat et al., 2019). We combine these methods by selecting
the polygon with the largest perimeter within the area buffered on the
prior centerline dataset. We buffer the river centerline by 10 rivers
widths, which approximates the meander amplitude (Leuven et al., 2018),
and thus the approximate maximum distance the channel will move before
reversing direction. Selecting the largest polygon by perimeter would
identify many lakes and reservoirs in a global search, but we largely
avoid this issue by limiting our search to the area within 10 river
widths of the static river centerline and by excluding any pixels that
are included in the HydroLakes dataset (Messager et al., 2016). These
methods allow us to produce global annual maps of river pixels from
2000-2021 suitable for detection of bank migration over time.
Bank Migration
We use an area-based approach to quantifying riverbank erosion and
accretion, similar to the SCREAM method from Rowland et al., (2016).
This approach is advantageous for global applications because it can
better represent multithreaded rivers than centerline-based methods do,
and raster operations are much faster than vector operations in GEE. We
measured riverbank erosion and accretion by comparing river inundation
masks from different years and counting the pixels that transition
between river and land. Pixels that transition from river to land are
marked as accretion, and pixels that transition from land to river are
marked as erosion (Figure 1b). Our first step is to calculate the area
and bank characteristics of the individual images. We locate the bank
pixels by dilating the river mask by one pixel, subtracting the original
river mask, and then calculate the bank aspect by convolving the bank
pixels with a directional kernel (Figure 1c). We measure the standard
deviation of local bank aspect as a proxy for bank curvature for each
side of the banks. Measuring bank curvature typically uses the radius of
a best fit circle for a section of river, which is a slow process in the
GEE framework. Next, we calculate the pixel-scale magnitude and
direction of change using the same methods in Rowland et al. (2016;
Figure 1d). The magnitude of erosion is calculated for each bank pixel
in the first image as the shortest path to any of the bank pixels in the
second image, where each path is constrained to only traverse pixels
that were eroded. For islands and bars that have entirely eroded, the
distance is calculated from the island perimeter to the island centroid.
The result of these distance calculations is a raster surface that
slopes between the banks in the two images, and we use the gradient of
this surface to calculate the direction of erosion. Accretion is
calculated in the same manner with the images switched and the paths
constrained to the accreted area. These change metrics are relatively
simple, efficient to calculate, and describe the change in
pixel-by-pixel detail. While the raster data preserve all the detail in
the original river masks and our change calculations, that fidelity
comes with high cost when exporting and analyzing the data. We exported
this data only in limited cases while developing the methods and
visualizing the results over small regions and are not available
globally.
To produce the structured global river erosion data product we summarize
the raster data to a vector product by reducing the pixels to their
nearest river centerline node in SWORD (Figure 1e). In our final data,
the vector dataset includes the original SWORD hydrography data
(centerline location, upstream and downstream connections, Pfafstetter
basin codes, flow accumulation values, slope, mean width, sinuosity,
etc.) and a summary of the change calculations (total river area in time
1 and time 2, accreted area, eroded area, average rate of change,
average direction of change, variance in direction, and uncertainty
estimates) at >2.2 million nodes spaced approximately 200 m
apart. We further aggregate the data up to the reach and watershed
scales for analysis, which allows flexibility in future analysis. For
example, while we present average bank migration values at large scales,
others may be interested in the maximum value for a particular reach.
The high-resolution node data also allows for resampling the data over
custom reaches and extents, beyond the reach and basin structure in
SWORD.