Introduction
Changes in global temperatures have resulted in more frequent and
intense rainfall events, which can lead to flooding in areas that are
not equipped to handle large amounts of water. Deforestation can also
contribute to increased flooding. Trees and other vegetation play a key
role in regulating the flow of water, and their removal can result in
more water running off the surface and into rivers and streams.
Flood monitoring with satellite images is an effective method of
detecting and tracking floods. This approach involves the use of
satellite imagery to detect changes in water levels and identify flooded
areas. To monitor floods using satellite images, the images are analyzed
to detect changes in water levels over time. To detect changes in water
levels and identify flooded areas based on a set of predefined criteria,
we can train algorithms.
Amazon SageMaker
geospatial capabilities make it easier for data scientists and machine
learning (ML) engineers to build, train, and deploy ML models using
geospatial data. These capabilities also provide pre-trained models.
One of the pre-trained models is land cover segmentation model. This
land cover segmentation model can be run with a simple API call and can
be leverage to analyze changes in the water level.
Solution Overview
Let’s further understand how SageMaker geospatial capabilities make it
easy to build, train, and deploy models using geospatial data. For flood
monitoring use case, we will be using Sentinel 2 data from ASDI -
Amazon
Sustainability Data Initiative. In this blog post, we will first show
you how to leverage new Sagemaker geospatial capabilities to visualize
geospatial images from Sentinel-2 and further process these images to
segment and classify the water coverage. This will help to analyze flood
in defined area
Prerequisites
To get hands-on experience with all the features described in this post,
complete the following prerequisites:
Ensure that you have an AWS account, secure access to log in to the
account via the AWS Management
Console, and AWS Identity and Access
Management (IAM) permissions to use
Amazon SageMaker and
Amazon Simple Storage Service (Amazon
S3) resources.
Onboard to a SageMaker domain and access Studio to use notebooks. For
instructions, refer to
Onboard
to Amazon SageMaker Domain. If you’re using existing Studio, upgrade
to the
latest
version of Studio.
Solution walkthrough with Jupyter Notebook using
SageMaker Geospatial
API
Data access
The new geospatial capabilities in SageMaker offer easy access to
geospatial data such as Sentinel-2 and Landsat 8. Built-in geospatial
dataset access saves weeks of effort otherwise lost to collecting data
from various data providers and vendors. First, we will use an Amazon
SageMaker Studio notebook with a SageMaker geospatial image by following
steps outlined in
Getting
Started with Amazon SageMaker geospatial capabilities. We use a
SageMaker Studio notebook with a SageMaker geospatial image for our
analysis.
The amazon-sagemaker-examples GitHub repository
https://github.com/aws/amazon-sagemaker-examples/tree/main/sagemaker-geospatial/lake-mead-drought-monitoring contains similar notebooks that served
as the basis for this article. You can easily query data using
SageMaker geospatial capabilities. We initially create a bounding box
around the Mississippi river to represent an AreaOfInterest (AOI). In
order to choose data from June 2018 to June 2019, we use the
TimeRangeFilter. Often cloud cover can block our view of the location.
To obtain less cloudy images, we will select a subset of photos by
setting the upper bound for cloud coverage to 20%.
Model inference
The next stage is to remove water bodies from the satellite images after
the data has been identified. Normally, in order to distinguish various
kinds of physical materials on the earth’s surface, such as water
bodies, vegetation, snow, and so on, we would need to train a land cover
segmentation model from scratch. Starting from scratch to train a model
is time and resource intensive which includes data labeling, model
training, and deployment. SageMaker geospatial capabilities provide a
pre-trained land cover segmentation model. This land cover segmentation
model can be run with a simple API call.
Rather than downloading the data to a local machine for inferences,
SageMaker does all the heavy lifting for you. In an Earth Observation
Job(EOJ), we simply specify the data setup and model configuration.
The satellite image data for the EOJ is automatically downloaded by
SageMaker, and prepared for inference. SageMaker then does model
inference for the EOJ automatically. The EOJ can take several minutes to
many hours to complete, depending on the workload (the amount of images
ran through model inference). You can monitor the job status using the
get_earth_observation_job function.
Coordinates =[
]
daterange = {
”2018-06-01T00:00:00Z”,
”2019-06-01T23:59:59Z”,}
Perform land cover segmentation on images returned from the sentinel
dataset.
eoj_input_config = {
}
eoj_config = {”LandCoverSegmentationConfig”:
{}}
response =
sg_client.start_earth_observation_job(
)
eoj_arn = response[”Arn”]
job_details = sg_client.get_earth_observation_job(Arn=eoj_arn)
{k: v for k, v in job_details.items() if k in [”Arn”, ”Status”,
”DurationInSeconds”]}
Analysis
The results of the EOJ can be exported to an Amazon Simple Storage
Service (Amazon S3) bucket using the export earth observation job
function. The data in Amazon S3 will be used for a subsequent analysis
in order to determine the water surface area. SageMaker also simplifies
dataset management. The handling of datasets is made simpler with
SageMaker. Instead of crawling thousands of files in the S3 bucket, we
can just share the EOJ results using the job ARN. Each EOJ becomes an
asset in the data catalog, as results can be grouped by the job ARN.
sagemaker_session = sagemaker.Session()
s3_bucket_name = sagemaker_session.default_bucket() # Replace with
your own bucket if needed
s3_bucket = session.resource(”s3”).Bucket(s3_bucket_name)
prefix = ”eoj_flood_detection” # Replace with the S3 prefix
desired
export_bucket_and_key =
f”s3://{s3_bucket_name}/{prefix}/”
eoj_output_config
= {”S3Data”: {”S3Uri”: export_bucket_and_key}}
export_response = sg_client.export_earth_observation_job(
)
Next, we analyze changes in the water level in Mississippi river. We
download the land cover masks to our local instance to calculate water
surface area using open-source libraries. SageMaker saves the model
outputs in Cloud Optimized GeoTiff (COG) format.
import os from glob import glob import cv2 import numpy as np import tifffile import matplotlib.pyplot as plt from urllib.parse import urlparse from botocore import UNSIGNED from botocore.config import Config # Download land cover masks os.makedirs(mask_dir, exist_ok=True) image_paths = [] for s3_object in s3_bucket.objects.filter(Prefix=prefix).all(): path, filename = os.path.split(s3_object.key) if ”output” in path: mask_name = mask_dir + ”/” + filename s3_bucket.download_file(s3_object.key, mask_name) print(”Downloaded mask: ” + mask_name) # Download source images for visualization for tci_url in tci_urls: url_parts = urlparse(tci_url) img_id = url_parts.path.split(”/”)[-2] tci_download_path = image_dir + ”/” + img_id + ”_TCI.tif” cogs_bucket = session.resource( ”s3”, config=Config(signature_version=UNSIGNED, region_name=”us-west-2”) ).Bucket(url_parts.hostname.split(”.”)[0]) cogs_bucket.download_file(url_parts.path[1:], tci_download_path + ”.1”) print(”Downloaded image: ” + img_id) print(”Downloads complete.”)