Meanwhile, HPC applications have evolved from numerical simulations to
workloads that include Artificial Intelligence (AI) and analytics. For
example,
scientists
at the Oak Ridge National Laboratory (ORNL) Health Data Sciences Institute are developing AI-based
natural language processing tools to extract information from textual
pathology reports using Summit, the USA’s most powerful supercomputer,
due to the vast amounts of memory it provides to its compute cores.
Similarly, the High Luminosity Large Hadron Collider (HL-LHC) will
further extend the capabilities of the LHC, allowing further
investigation of phenomena fundamental to the nature of the universe. To
be installed in 2025, these enhancements will lead to annual data
generation rates of tens of petabytes, with reduced datasets in the
petabyte range being used for analysis. These applications are often
read-intensive, and may rely on latency-sensitive transfers, each
consisting of small amounts of data. This marks a dramatic shift in how
HPC storage systems are used. While some emerging read-intensive
workloads may be able to rely on structuring within the data to
construct efficient data retrieval plans based on caching or prefetching
techniques, AI workloads and many data analytics routines are inherently
required to access the data without any predictable ordering. According
to the Department of Energy’s 2020 AI for Science report \cite{Stevens_2020}:
“AI training workloads, in contrast, must read large datasets (i.e.,
petabytes) repeatedly and perhaps noncontiguously for training. AI
models will need to be stored and dispatched to inference engines, which
may appear as small, frequent, random operations.”
Figure 1 shows examples of prevalent access patterns for
analytics, which are characterized by this lack of ordering.
Two technology trends have emerged as crucial to data-driven scientific
discovery. First, the high-speed networks used within scientific
computing platforms provide extremely low-latency access to remote
systems, including billions of message injections per second and direct
access to remote system memory via remote direct memory access (RDMA)
operations. Second, solid-state disks (SSDs) accessed through the
Non-Volatile Memory Express (NVMe) interface provide more than 1,000
times the performance of traditional hard disk drives for the small
random reads used within data-intensive workloads. Interestingly, while
HPC storage systems broadly leverage both high-speed networks and SSDs,
this adoption was not driven by the need to provide low-latency access
to remote storage, but by simulation requirements for fast
point-to-point communication between processes and high-throughput
requirements for access to HPC storage systems. With the advent of new
read-heavy analysis workloads, however, low-latency remote storage
access is now also a key enabling technology for new data-driven
approaches to computational science.