Materials-Methods:
This time-series analysis of Covid-19 data consisted of data starting from identification of the first case from January 21, 2020 to August 14, 2020 in the USA. Data for this cohort study was obtained from Situation Dashboard of the European Center for Disease Prevention and Control website Coronavirus 19 Situation Report from the official report page of the WHO webpage on August 14, 2020 [8]. Data analyzed included daily confirmed cases in the USA between the aforementioned dates.
Modeling consisted two of important steps: (1) building a time series model from January 21, 2019 to August 7, 2020 and (2) validation of the fit model, to forecast the number of confirmed cases from August 8, 2020 to August 14, 2020.
Before building the time-series model, stationarity was evaluated with augmented Dickey-Fuller (ADF) unit root test and the visual diagnosis was used to access trends. If stationarity was not met log transformation and differencing was used to de-trend the series.
Mathematically simple ARIMA model is written as Wₜ=μ+(θ(B)/ψ(B))αₜ ; where Wₜ is the response series Yₜ or difference of the response series, μ is the mean term, θ(B) is the MA operator, ψ(B) is AR operator, B is the backshift operator, that is BXₜ=Xₜ₋₁ and αₜ is the independence disturbances also known as the random error [10]. Parameters for the ARIMA method are estimated using the maximum likelihood method.
Auto-correlation and partial auto-correlation functions were used to determine the components of the ARIMA model (p,d,q). Box-Jenkins approach traditionally used to build models for ARIMA models where an iterative process was applied with three steps: Identification, estimation of parameters, and diagnostic checking. Models with the least BIC and AIC tests were used for forecasting. But for this study, the best model was selected based on “auto.arima()” function included in the “forecast” library of the statistical program which uses the Hyndman and Khandakar algorithm [11]. “auto.arima()” function is a step-wise approach to determine the model with the best fit by using models with appropriate and optimized parameters, models with least AIC, and producing point forecasts using the best model. Aim of function is to choose the parsimonious model.
Forecasting accuracy was evaluated by the percentage error (PE) defined as; the difference between forecast and confirmed cases divided by confirmed cases and mean average percentage error (MAPE), mean of PE [5]. p<0.05 was considered significant and statistical analysis was conducted using R 4.0.0 (R Core Team, Vienna, Austria).