Key points
- Simple machine learning frameworks present a new, efficient, flexible
tool to predict and study the Madden-Julian oscillation (MJO)
- Shallow artificial neural networks (ANNs) predict a real-time MJO
index out to ~17 days in winter and
~10 days in summer
- Varying ANN input and using explainable artificial intelligence
methods offer insights into the MJO and key regions for prediction
skill
Abstract : Few studies have utilized machine learning techniques
to predict or understand the Madden-Julian oscillation (MJO), a key
source of subseasonal variability and predictability. Here we present a
simple framework for real-time MJO prediction using shallow artificial
neural networks (ANNs). We construct two ANN architectures that predict
a real-time MJO index using latitude-longitude maps of tropical
variables. These ANNs make skillful MJO predictions out to
~17 days in October-March and ~10 days
in April-September, and efficiently capture aspects of MJO
predictability found in more complex, computationally-expensive models.
Varying model input and applying ANN explainability techniques further
reveal sources and regions important for ANN prediction skill. This
simple machine learning framework can be more broadly adapted and
applied to predict and understand other climate phenomena.
Plain Language Summary: The Madden-Julian oscillation (MJO)
– a large-scale, organized pattern of wind and rain in the tropics
– is important for making weather and climate predictions weeks to
months into the future. Many different models have been used to study
the MJO, but few works have examined how machine learning and artificial
intelligence methods might inform our ability to predict and understand
the MJO. In this work, we show how two different types of machine
learning models, called artificial neural networks, perform at
predicting the MJO. We demonstrate that simple artificial neural
networks make skillful MJO predictions beyond 1-2 weeks into the future,
and perform better than simpler statistical methods. We also highlight
how neural networks can be used to explore sources of prediction skill
and inform understanding of the MJO, via changing what variables the
model uses and applying techniques that illuminate important regions for
skillful predictions. Because our neural networks perform relatively
well, are simple to implement, are computationally affordable, and can
be used to inform understanding, we believe these methods may be more
broadly applicable to study other important climate phenomena aside from
just the MJO.
Introduction
The Madden-Julian oscillation (MJO), a planetary-scale,
eastward-propagating coupling of tropical circulation and convection
(Madden and Julian
1971, 1972; Zhang 2005), is a key source of subseasonal-to-seasonal
(S2S) predictability (Vitart et al. 2017; Kim et al. 2018). Skillful MJO
prediction has important societal implications (Meehl et al. 2021;
Vitart et al. 2017; Kim et al. 2018), and extensive research has
explored using both empirical/statistical models or initialized
dynamical forecast models to predict and study the MJO (e.g. Waliser
2012; Vitart et al. 2017; Kim et al. 2018; Meehl et al. 2021; and
references therein). Before the late 2000s, statistical models showed
superior MJO prediction skill (~2 weeks; Waliser 2012;
Kang and Kim 2010), but dynamical models have continually improved and
several now skillfully predict the MJO beyond one month (Vitart 2014;
Vitart 2017; Kim et al. 2018). In contrast, statistical MJO modeling has
stagnated, and statistical MJO models still almost exclusively use
traditional linear methods (e.g. Kang and Kim 2010). New approaches are
needed to drive statistical MJO modeling forward.
Non-linear machine learning (ML) techniques have proven skillful at
predicting various climate and weather phenomena (Gagne et al. 2014;
Lagerquist et al. 2017; McGovern et al. 2017; Weyn et al. 2019; Rasp et
al. 2020; Ham et al. 2019; Mayer and Barnes 2021), but limited work has
considered MJO prediction. Recent ML studies have identified the MJO
(Toms et al. 2020b), reconstructed past MJO behavior (Dasgupta et al.
2020), or bias-corrected dynamical model output of MJO indices (Kim et
al. 2021), but only one study to our knowledge has examined MJO
prediction solely using ML
(Love and Matthews 2009). It
is thus timely to establish ML frameworks for predicting the MJO and
begin to quantify ML model performance. Equally important is
demonstrating that ML models are useful for more than just prediction:
they invite experimentation and can inform physical understanding of the
MJO. We highlight this under-appreciated aspect of ML modeling through
experiments changing input data, the exploration of ML model
architecture, and the application of tools from the field of explainable
AI (XAI; McGovern et al. 2019; Toms et al. 2020a; Mamalakis et al.
2021).
This paper addresses three aspects of using machine learning to study
the MJO: (1) developing ML frameworks, (2) analyzing ML model
performance, and (3) demonstrating how ML can inform physical
understanding. We prioritize simple techniques (i.e. shallow,
fully-connected artificial neural networks; ANNs) to establish a
benchmark for future ML modeling, ensure our approach is broadly
accessible, and facilitate applying XAI tools. Further, while our focus
is the MJO, the concept and methods we describe are widely transferable
to other areas in Earth science, and may help inform ML modeling of
other climate phenomena.
Data and Methodology
Data
The predictors of our ANN models are latitude-longitude maps of
processed tropical variables from 20°N-20°S. The predictand is the
Real-time Multivariate MJO index (Wheeler and Hendon 2004) which tracks
the MJO using an empirical orthogonal function (EOF) analysis of
outgoing longwave radiation (OLR), and zonal wind at 850 and 200 hPa
(Wheeler and Hendon 2004).
The index consists of two time series (“RMM1” and “RMM2”) that
together represent the strength and location of the MJO. Plotted on a
2-D plane, the RMM phase angle describes the location, or “phase”, of
the MJO (e.g. Fig. 1), while the RMM amplitude
(\(\sqrt{\text{RMM}1^{2}\ +\ RMM2^{2\ }})\) measures MJO strength. RMM
has known limitations (Roundy et al. 2009; Straub 2013) and other MJO
indices exist (Kiladis et al. 2014; Kikuchi et al. 2012; Ventrice et al.
2013), but RMM represents a logical starting point as it is a
widely-used benchmark index suitable for real-time forecasts.
Other data are from three sources: OLR data is from the NOAA
Interpolated OLR dataset (Liebmann and Smith 1996), sea-surface
temperature (SST) is from the NOAA OI SST V2 High Resolution dataset
(Reynolds et al. 2007), and all other variables are from ERA-5
reanalysis (Hersbach et al. 2020). Additional data from the ERA-20C
dataset (Poli et al. 2016) is used in the Supplemental Material, as
described therein. Data are daily from January 1, 1979 (1982 for SST) to
December 31, 2019, and interpolated onto a common 2.5° x 2.5° grid. Data
are pre-processed similarly to the RMM input variables by subtracting
the daily climatology’s mean and first three seasonal-cycle harmonics,
and a previous 120-day mean (Wheeler and Hendon 2004). Variables are not
averaged latitudinally because we are interested in how the 2-D
structure is utilized by the ANNs (though sensitivity tests exploring
latitudinal averaging are discussed in Supplemental Text S2). We also
normalize each variable by subtracting the tropics-wide, all-time mean
and dividing by the tropics-wide, all-time standard deviation at each
grid point.
Artificial Neural Networks
We explored two ANN architectures to study the MJO (Fig. 1): a
“regression model” and a “classification model”. We present a
high-level overview of both architectures here; for details see
Supplemental Text S1.
Both ANN architectures input the processed latitude-longitude maps of
one or more variables from a single day, and output information about
the RMM index N days into the future (Fig. 1). A separate ANN is
trained for each N from 0 to 20 days. The difference between the
regression and classification ANNs is the nature of their outputs. The
regression ANN (not to be confused with a linear regression model)
outputs numerical values of RMM1 and RMM2 (Figs. 1a, S1). The
classification model outputs the probability that the MJO is in each of
nine classes: active (RMM amplitude > 1) in one of the
eight canonical MJO phases (Wheeler and Hendon 2004) or weak. The
classification ANN outputs a probability for each class, and the
prediction is the class with the highest probability (e.g. Figs. 1b,
S2).
Both the regression and classification ANNs have a single hidden layer
of 16 nodes, and use ridge regularization and early stopping to avoid
overfitting (see Supplemental Text S1). They are trained on daily data
from June 1, 1979 (1982 for SST) to December 31, 2009. For the
classification ANN, since weak MJO days are the most common class
(~39% of all days) we avoid class imbalance by randomly
subsampling weak MJO days during training so they are 11% of all
training days. Weak days are not subsampled over the validation period.
Both ANNs are evaluated over the period January 1, 2010 to November 30,
2019, with forecasts initialized daily.
ANN performance is slightly improved if the models are trained
separately on different seasons (e.g. Supplemental Text S1, Fig. S3),
which allows the ANNs to learn more season-specific patterns. This is
likely important for the MJO due to its seasonal shifts in behavior,
strength, and structure (Hendon and Salby 1994; Hendon et al. 1999;
Zhang and Dong 2004), and we found splitting the data into two six-month
periods (October-March, or “boreal winter”, and April-September, or
“boreal summer”) provided a good trade-off between seasonal
specificity and number of training samples.
Finally, in some instances we trained multiple ANNs for the same seasons
and lead times, varying only the random initial training weights to
ensure convergence of our results and quantify sensitivity to
initialization. For more information regarding the details of our data,
ANN architecture, and ANN parameters see Supplemental Text S1.
Sensitivity tests varying ANN parameters and input data pre-processing
are discussed in Supplemental Text S2.
To assess model skill in the regression ANN, we utilize the bivariate
correlation coefficient (BCC; Vitart et al. 2017; Kim et al. 2018), with
a value greater than 0.5 denoting skill. In the classification ANN,
skill is measured using the accuracy predicting the MJO phase. ANN
performance is also assessed relative to a statistical “persistence
model” that forecasts the average counter-clockwise propagation of the
RMM index, holding RMM amplitude fixed from its initial state (see
Supplemental Text S1). Thus, the persistence model forecasts represent
the average RMM behavior corresponding to the slow eastward propagation
of MJO signals.
Layer-wise Relevance Propagation
To demonstrate how the classification ANN correctly captures regions of
importance for predicting the MJO, we use an ANN explainability
technique called layer-wise relevance propagation (LRP; Bach et al.
2015; Samek et al. 2016; Montavon et al. 2019). LRP has begun to gain
traction in Earth science as a tool for understanding ANNs (Toms et al.
2020a; Toms et al. 2020b; Barnes et al. 2020; Mayer and Barnes 2021;
Mamalakis et al. 2021; Madakumbura et al. 2021), and more technical
detail on LRP is provided in Supplemental Text S1 and the above-cited
works. Broadly, LRP is an algorithm applied to a trained ANN. After a
prediction is made, LRP back-propagates that prediction’s output through
the ANN in reverse. Ultimately, LRP returns a vector of the same size as
the input (here a latitude-longitude map), where the returned quantity,
termed the “relevance”, shows which input nodes (i.e. which
latitude-longitude points) were most important in determining that
prediction.
Artificial Neural Network Performance
Overall Model Performance
In this subsection we focus on the regression and classification ANNs
that input three variables simultaneously: OLR, zonal wind at 850 hPa,
and zonal wind at 200 hPa (Fig. 1). This input combination is among the
best-performing across the experiments we conducted, and are the same
variables that comprise RMM. Additional inputs are discussed in Section
3.2 and Supplemental Text S2.
Overall, the winter and summer regression ANNs show prediction skill,
respectively, of ~17 days and ~11 days
(Fig. 2a), with small spread across a 10-member ensemble. This is at the
forefront of statistical MJO prediction techniques (e.g. Kang and Kim
2010; Kim et al. 2018), which is impressive given the simplicity of the
ANNs, the real-time nature of the model, and the lack of substantial
input pre-processing. Both winter and summer regression ANNs have a
higher BCC than the persistence model after 2-3 days and make skillful
predictions out to longer leads than the persistence model. This
demonstrates that the ANNs learn not only to identify the MJO and
propagate it east, but also capture more nuanced MJO behavior. The
higher skill in winter versus summer is consistent with results in most
dynamical models (e.g., Vitart 2017), indicating that ANNs are able to
reproduce aspects of MJO predictability seen in more complex dynamical
modeling set-ups. The regression ANN skill also shows relatively small
sensitivity to initial MJO phase (Fig. S4), with somewhat higher skill
(~18-19 days) across MJO events initialized in phases
1-3 and lower skill (~14-15 days) for phases 6 and 8.
Whether this change is statistically significant, and what may explain
it, was not examined in detail.
A strength of the regression ANN is the quantitative information it
provides about MJO location and strength, but a prevalent source of
error is a decrease in the ANN-predicted MJO amplitude at lead times
past a few days, especially in phases 4-7 (Fig. 1a, Figs. S1, S4).
Minimizing this bias is an open challenge, and was one motivation for
exploring a classification ANN architecture that focuses on the MJO
phase alone.
Similar to the regression ANN, the classification ANN shows higher skill
in winter than in summer (Fig. S3) and outperforms the persistence model
after 3-4 days (Fig. 2b). It further shows accuracy well above random
chance for all lead times (Figs 2b, S5), indicating the model possesses
skill out to several weeks. At short lead times, where the
classification model is primarily identifying the MJO (similar to Toms
et al. 2020b), the phase of active MJO events are predicted with an
accuracy of ~70-80% (Fig. 2b). Most incorrectly
predicted active MJO events at short leads are near the boundary between
two RMM phases and predictions are often incorrect by only one phase
(Fig. S5). The lead-0 accuracy here is comparable to Toms et al.
(2020b), despite differences in our input variables, data
pre-processing, MJO index, and ANN complexity. At longer leads, accuracy
decreases monotonically, but stays above 20% for active MJO events out
to 20 days (Fig. 2b). In contrast to performance predicting active MJO
events, the classification ANN struggles to predict weak MJO days (Fig.
2b, S5), with ~40% accuracy identifying weak days at
lead 0, and worse performance at longer leads (Fig. S5). The
improvement in ANN accuracy forecasting active MJO days versus all days
is on the order of 15% (Fig. 2b), even out past 2-week lead times.
The tradeoffs between classification and regression ANN architectures
make choosing a “better” model difficult. The regression model outputs
more precise RMM information and is more readily comparable to existing
statistical and dynamical models, but struggles to predict strong MJO
amplitudes at long leads. This is true even when the regression model
was re-trained using fewer weak MJO days to emphasize strong MJO events,
as little change in performance was seen (Fig. S6). The classification
ANN shows the opposite tendency, overestimating the percentage of active
MJO days and struggling to accurately predict weak MJO events. Further,
accuracy of MJO phase is not a commonly used metric of model skill, and
the classification ANN (as set up here) cannot provide precise
information about MJO strength and location.
Still, results for both ML architectures show that aspects of the MJO
are skillfully predicted beyond two weeks in winter. A range of
sensitivity tests (Supple. Text S2 and Figs. S7, S8), including
increasing the amount of training data using 20th-century reanalysis,
showed comparable performance, though tests were not exhaustive nor
explored beyond simple ANN architectures. Also note that while our
primary goal here is to study ML modeling of the MJO, these simple ANNs
are not yet competitive with most S2S dynamical models (e.g. Vitart
2017; Kim et al. 2018). It remains to be seen whether future ML research
might change this, but as the next section illustrates, ANNs can be used
as a tool for more than just prediction, and may help spur new
discoveries or generate new hypotheses.
Efficiency and Utility of ANNs
In addition to examining the overall prediction skill, we also explored
how ANNs can continue to inform our understanding of the MJO and MJO
predictability. For example, ANNs reproduce aspects of MJO prediction
skill found in dynamical models, but they do so at a fraction of the
cost and capture more than just seasonal changes in skill (Fig. 2a).
Both regression and classification models show higher skill for
initially stronger MJO events (Figs. 2c, S9): the regression ANN
skillfully predicts MJO events which are initially strong or very strong
(RMM amplitude > 1.5) out to ~20 days in
winter, while skill predicting weak events is only ~10
days. ANNs also capture more mysterious aspects of MJO predictability,
such as the sensitivity to the phase of the stratospheric quasi-biennial
oscillation (Marshall et al. 2017; Lim et al. 2019; Wang et al. 2019;
Martin et al. 2021): the wintertime regression ANN skill during QBO
easterly months is nearly 20 days, whereas during QBO westerly months
skill is only 15 days (Fig. S10).
Another utility of the classification ANN is the “model confidence,”
which describes the ANN’s surety that a given prediction is correct.
Model confidence has clear utility for forecasters, could drive future
work in probabilistic MJO prediction (Marshall et al. 2016), and may be
useful in improving understanding of MJO predictability. Regardless of
lead time, our classification ANNs are reliable – in the sense that ANN
confidence corresponds well with model accuracy – which indicates that
model confidence is a useful and meaningful output (Fig. 2d).
Furthermore, ANN confidence relates to physical aspects of the MJO:
confidence is closely associated with initial MJO amplitude (correlation
coefficients of ~0.5-0.7 depending on lead), with higher
confidence associated with higher initial MJO amplitude (Fig. 2d).
Research using ANN confidence to identify predictable states of the
atmosphere has recently shown promise in the context of MJO
teleconnections to the extra-tropics (Barnes et al. 2020; Mayer and
Barnes 2021), and thus may lead to new insights into MJO predictability
or behavior.
Simple ANNs can also be used to efficiently explore the impact of
certain variables on MJO prediction. We illustrate this through
classification ANN experiments inputting various combinations of one to
three different variables, targeting leads 0, 5, and 10 days for
brevity. Overall, model performance varies widely depending on input
(Fig. 3). For example, across 1-variable ANNs (Fig. 3a) 850 hPa
meridional wind and sea-surface temperature (SST) models show much
poorer performance than other inputs. In the case of the SST model, this
suggests the ocean state alone (when processed to highlight subseasonal
variability) does not contain MJO signals the ANN is able to leverage,
consistent with findings that sub-seasonal SST variability does not
drive the MJO (e.g. Newman et al. 2009). In the case of meridional wind,
while the MJO possesses signals in meridional wind associated with
Rossby wave gyres (Zhang
2005), we hypothesize that skill may be low because these signals lack
the global-scale coherence seen in variables like zonal wind and OLR and
captured by RMM.
The most accurate models at short leads are those that input 850 hPa
and/or 200 hPa zonal winds (Fig. 3). This is consistent with literature
showing that MJO circulation tends to drive the RMM index (Straub 2013;
Ventrice et al. 2013), an aspect of RMM the ANN has organically learned.
Interestingly, skill identifying the MJO at short leads does not
necessarily imply similar performance predicting the MJO at longer
leads. For example, at lead 0 the 850hPa and 200 hPa zonal wind model
has the clear highest accuracy among 2-variable models (Fig. 3b), but at
lead 5 and 10 its accuracy overlaps with other configurations. Best
performing models at longer leads are those that include information
about zonal wind and the large-scale thermodynamic or moisture signature
of the MJO, as measured for example by OLR or column water vapor.
Further, RMM input variables are not always clearly superior at leads 5
and 10: a model with total column water, 200 hPa zonal wind and 200 hPa
temperature performs as well as or slightly better than the model with
200 and 850 hPa zonal wind and OLR (Fig. 3c).
Finally, while more input variables tend to improve model performance
(Fig. 3), tests showed no substantial improvement using 4 or more inputs
(Fig. S11), at least among the variables considered here. Whether this
is due to the limited complexity of our ANNs, the amount of training
data, or because new, meaningful information is difficult to leverage
with more variables is not known. Additional variables (perhaps with
different preprocessing) will continue to be explored, but these initial
tests provide a proof-of-concept for the kind of experimentation that
ANNs afford.
ANN Explainability Through LRP
A further advantage of ANNs versus other MJO modeling frameworks is
ability to apply XAI tools like LRP, which can illuminate sources of
model prediction skill. As an example, Figure 4 shows wintertime
composite LRP maps using the classification ANN from Section 3.1. LRP
maps are shown for lead times of 0 and 10 days, composited across
correct ANN predictions when the observed MJO is in phase 5 at the time
of verification. Composites are further restricted to those events when
model confidence exceeds the 60th percentile (see Supplemental Text S1
for more detail).
The LRP plots confirm that the classification ANN focuses on regions
central to the MJO. At lead 0, OLR relevance highlights suppressed
Indian Ocean convection and active conditions around the Maritime
Continent (Fig. 4a,b), whereas wind fields focus on low-level westerly
anomalies around the Maritime Continent (Fig. 4c,d) and upper level
signals in the central and east Pacific (Fig. 4e,f). At lead 10, LRP
shows how the ANN accounts for eastward MJO propagation: the maximum
relevance for OLR is shifted west relative to lead 0, highlighting
strong convection in the eastern Indian ocean (Fig. 4g,h). The lead-10
model also focuses on a small region of strong low-level convergence
near the equatorial Maritime Continent, and upper-level easterly
anomalies in the western Indian Ocean (Figs. 4i-l). LRP thus provides
information about how the ANN identifies the MJO and what signals across
variables are most associated with future MJO behavior. The unique
information LRP provides may also be used to explore sources of MJO
prediction skill under different large-scale states or for case studies
of particular events.
Discussion & Conclusions
Motivated by a lack of recent progress in statistical MJO modeling and
the ability of machine learning methods to skillfully predict other
climate and weather phenomena, here we demonstrate how simple machine
learning frameworks can be used to predict the MJO. We established two
straightforward neural network architectures (a regression and
classification approach) that use shallow ANNs to predict an MJO index.
The regression ANN shows prediction skill out to ~17
days in winter and ~11 days in summer, which is high
skill for a statistical approach, and both ANN architectures set
benchmarks for continued ML modeling of the MJO. Simple ANNs are also
efficiently able to reproduce aspects of MJO predictability found in
more complex, computationally-expensive dynamical models, making them
affordable tools to continue to study the MJO and MJO predictability.
Explainable AI tools can also help illuminate sources and regions of ANN
model skill.
This work illustrates how simple ANNs can be used not only for
prediction, but also as tools for hypothesis testing and experimentation
that might drive new discoveries or scientific insights. While our focus
here is on the MJO, the framework we establish is widely applicable to a
range of different climate phenomena, especially oscillations that can
be represented as simple indices. The performance, affordability,
accessibility, and explainability of simple ANNs thus recommends their
continued adoption by the climate community.