Abstract

This paper presents the Epidemic Volatility Index (EVI), an early warning tool for emerging epidemic waves. EVI is based on the volatility of the newly reported cases per unit of time, ideally per day, and issues an early warning when the rate of the volatility change exceeds a threshold. EVI is conceptually simple, and its application to data from the current COVID-19 pandemic revealed a consistent and stable performance in terms of detecting coming waves. Results from the COVID-19 epidemic in Italy and New York are presented here, while daily updated predictions for all world countries and each of the United States are available online. The application of EVI to other epidemics and syndromic surveillance tasks in combination with existing early warning systems will enhance our ability to act fast and and optimise containment of outbreaks.

Introduction

Early warning tools are crucial for the timely application of intervention strategies and the mitigation of the adverse health, social and economic effects associated with epidemics. Sentinel networks in combination with information technology infrastructures in public health \cite{heffernan2004syndromic} provide data for the detection of spatial and temporal aberrations in the expected number of cases for groups of clinical signs and symptoms\cite{Brett2020}. Several modelling frameworks exist for the analysis of such data. For example, the moving epidemic method, an approach used to monitor, among others, the start of the flu epidemic \cite{vega2013influenza}.  Further, methods based on seasonality patterns, the link between pathogens and meteorological parameters \cite{abeku2004malaria} and/or the measurement of vector indices for vector-borne pathogens \cite{chang2015re} are also available. 
    Once an epidemic erupts, growth models can be used to predict the course of the outbreak and quantify its consequences. The advantages and limitations of these methods have been extensively discussed \cite{chowell2016mathematical}. Machine learning algorithms have also been utilized with the most recent application being in the current COVID-19 pandemic \cite{wang2020prediction}. Correlating the number of COVID-19 cases with parameters obtained using big data approaches can predict future rise in the number of cases. For example, monitoring of digital data streams can be an early sign for a rise in the COVID-19 cases and deaths in the proceeding 2 to 3 weeks \cite{Kogan2021}.  All models have limitations arising from the imperfect nature of the data. The need for open, better, detailed data is imperative for the deployment of models with improved accuracy, models that will have better predictive ability and will be more useful for the timely application of appropriate control measures for the COVID-19 pandemic \cite{Vespignani_2020}.
    Our work introduces the Epidemic Volatility Index (EVI), which is inspired by the use of volatility indices in the stock market \cite{Fernandes_2014,Brenner_1989}. EVI is based on the moving standard deviation of the newly reported cases during an epidemic. First we present the rationale of EVI and then an example application is given with COVID-19 data from Italy and New York. Daily updated predictions - with a 48 hour lag for confirmation purposes - are available online for all world countries and each of the United States. Results revealed a firm and consistent ability of EVI to predict the COVID-19 epidemic waves,  in all instances.

Methods

The Epidemic Volatility Index

  EVI is calculated for a rolling window of time series epidemic data (i.e. the number of new cases per day). At each step, the observations within the window are obtained by shifting the window forward over the time series data one observation at a time. Let \(y_i=\left\{y_1,\ y_2,...,y_n\right\}\) be a time series of length \(N\). The rolling window size - that is the number of consecutive observations per rolling window - is \(m\). With \(0<m\le m_{\max}\) and \(0<m_{\max}\le N\), there are  \(t=N-m+1\) consecutive rolling windows.  
    At each of the \(t\) steps, \(EVI\) uses the standard deviation \(\left(\sigma_t\right)\) of the newly reported cases \(\left(y_{j_t}=\left\{y_{1_t},\ y_{2_t},...\ ,y_{m_t}\right\}\right)\) within the specified \(m\)\[\sigma_t=\sqrt{\frac{1}{m}\sum_{i_t=1}^m\left(x_{i_t}-\overline{x}_t\right)^2}\]
\[\sigma_t=\sqrt{\frac{1}{m}\sum_{i_t=1}^m\left(x_{i_t}-\overline{x}_t\right)^2}\]
with \(\overline{x_t}\) the mean of the \(t^{th}\) window. Subsequently EVI is calculated as the relative change of \(\left(\sigma_t\right)\) between two consecutive rolling windows:
\[EVI_{t-1,t}=\frac{\sigma_t-\sigma_{t-1}}{\sigma_t}\]
    We expect an increase in the future number of cases, if \(EVI_{t-1,t}\) exceeds a threshold \(c\) \(\left(c\in\left[0,1\right]\right)\) and the observed cases at time point \(t,\left(y_t\right)\) are higher than the average of the reported cases in the previous week:
\[Ind_{EVI_{t-1,t}} = \begin{cases} 1 \;\;\; if \;\;\; EVI_{t-1,t} \geq c \;\; \wedge \;\; y_t \geq \overline{\mu}_{t:t-7} \\ 0 \;\;\; \text{otherwise} \end{cases} \]

Case definition and desired accuracy

    The user should provide the minimum rise in cases that, if present, should be detected. A case definition can be the rise in mean the number of cases between two consecutive weeks that exceeds a threshold \(r\):
\[\frac{\overline{\mu}_{t:t-7}}{\overline{\mu}_{t:t+7}}\ge r\]
with \(0\le r\le1\).
    The accuracy of EVI, given the specified case definition, depends on \(m\) and \(c\), which should be selected in a way to achieve a desired accuracy target. Several strategies are available. Selection of \(m\) and \(c\) values that lead to the simultaneous maximization of the sensitivity \(\left(Se\right)\) and the specificity \(\left(Sp\right)\) for EVI and the Youden index \(\left(J=Se+Sp-1\right)\)\cite{Fluss_2005} and thus to an overall minimization of the false results (i.e. both false positive and false negative early warnings). Another approach could be to select \(m\) and \(c\) such that the highest \(Se\left(or\ Sp\right)\) is achieved with \(Sp\left(or\ Se\right)=1\) not dropping below a critical value (e.g. 95%).  Advanced Receiver Operating Characteristic curve analysis can also be performed \cite{Zweig_1993} and selection of critical values can be based on indices that quantify the relative cost of false positive (i.e., falsely predicting an upcoming epidemic wave) to false negative (i.e., failing to predict an upcoming epidemic wave) warnings, like the misclassification cost term \(\left(MCT\right)\)

Generation of an early warning

    Every time a new time point \(t\) is observed, the model uses all the observed cases up to \(t\) to decide whether it should issue an early warning, at time point \(t\). The steps are: 
  1. Observed cases up to \(t\) are analyzed for all possible values for the window size \(\left(m\in\left[1,m_{\max}\right]\right)\) and threshold \(\left(c\in\left[0,1\right]\right)\)
  2. For each of the \(m,c\) combinations the \(Se_{t_{m,c}}\)and \(Sp_{t_{m,c}}\)is estimated for the predefined case definition (Eq. 4). 
  3. The  \(m'\) and \(c'\) that give the best \(Se_{t_{m',c'}}\) and \(Sp_{t_{m',c'}}\) combination are selected.
  4. For \(m'\) and \(c'\), the value of \(Ind_{EVI_{t,t-1}}\) is determined at the most recent time point \(t\) and a decision is made on whether a warning signal is issued.

Accuracy and Predictive Values

    Further, at each time point \(t\), the probability of observing a rise or drop in the future cases given that an early warning was issued or not can be calculated as the positive \(\left(PV_t+\right)\) and negative \(\left(PV_t-\right)\) predictive value, respectively:
\[PV_t+=P(D+\mid T+)=\frac{p_{1:t}Se_{t_{m',c'}}}{p_{1:t}Se_{t_{m',c'}}+\left(1-p_{1:t}\right)\left(1-Sp_{t_{m',c'}}\right)}\]
\[PV_t-=P(D-\mid T-)=\frac{\left(1-p_{1:t}\right)Sp_{t_{m',c'}}}{\left(1-p_{1:t}\right)Sp_{t_{m',c'}}+p_{1:t}\left(1-Se_{t_{m',c'}}\right)}\] where \(p_{1:t}\) is the proportion of events satisfying the condition of Eq. 4 up to time point \(t\).
    Once the entire time series data has been observed, the overall \(Se_{EVI}\) can be estimated as the fraction of the total number of occurrences for which an early warning has been issued, given that the case definition (Eq. 4 ) holds \((P (T+ \mid D+))\), divided by the total number of occurrences that the case definition holds \((P(D+))\). Similarly, the overall \(Sp_{EVI}\)  is calculated as the fraction of the total number of occurrences for which an early warning was not issued given that the expected rise of cases was not observed, that is, the case definition is not true, \((P(T- \mid D- ))\) divided by the total number of occurrences that the case definition is not true \(\left(P\left(D-\right)\right)\):
\[Se_{EVI}=\frac{P\left(T+\mid D+\right)}{P\left(D+\right)},\ Sp_{EVI}=\frac{P\left(T-\mid D-\right)}{P\left(D-\right)}\]

Sensitivity analysis

    The performance of EVI depends on the specified case definition (i.e.,  \(r\)) and the desired accuracy. Ideally, in the presence of historical data, various case definitions and \(r\) values should be explored to identify combinations that provide the optimal monitoring of an epidemic. 

Example application

    The current most serious threat to global health and economy \cite{Fauci_2020} is the COVID-19 pandemic that begun in China and was first reported to the WHO China Country Office on December 31, 2019\cite{world2020pneumonia}. Data on the confirmed cases of COVID-19 were retrieved by the COVID-19 Data Repository, which is maintained by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University\cite{Dong2020}. The number of daily confirmed new cases of COVID-19, for each country, from January 22, 2020 until April 13, 2021 were analyzed. Due to unnatural variability in the reported cases between working days and weekends, the 7-day moving average rather than the actual observed cases were analyzed. For the analysis, \(m_{max}\) was restricted to 30 days in order to avoid the effect of potentially higher volatility from previous epidemic waves on the volatility estimates of the most recent data and the predictive ability of  EVI for upcoming and perhaps milder epidemic waves.
    The case definition was an increase in the mean of expected cases, between two consecutive weeks, equal or higher than twenty percent,  \(r\ge\frac{1}{1.2}\). For sensitivity analysis, the detection of an increase in the mean of expected cases equal or higher than 50 percent (\(r\ge\frac{1}{1.5}\)) was considered. Data were analyzed separately for each country and for each of the states of the United States of America that had experienced a total number of cases higher than 20,000, until April 13, 2021.

Statistical software

    All models were run in R\cite{team2020}. The packages readxl\cite{wickham2019package}, ggplot2\cite{wickham2011ggplot2}, cowplot\cite{wilke2019package} and readr\cite{wickham2015package} were used.

Results

    Results for Italy, one of the most severely affected EU countries\cite{livingston2020coronavirus}, and New York, which was in the epicenter of the pandemic in the U.S.\cite{thompson2020covid}, are presented in the main manuscript. Daily updated results for all world coutnries and each of the United States are available online at http://83.212.174.99:3838.
    Confirmed COVID-19 cases for Italy and New York State, from January 22, 2020, until April 13, 2021, are in Figures 1 and 2, respectively. Red dots correspond to time points that an early warning was issued according to \(Ind_{EVI_{t,t-1}}\), while grey dots to time points without an early warning indication. Further, the positive and negative predictive values at each time point are in Figures 3 and 4, respectively.