3.1. Data Collection
Data associated with the variables were collected from different official sources for a total of 42 top counties that accounts for 448,989 COVID-19 cases as on 26th December accounting for 84.78% of the total. Date wise infections, recovery, and deaths were collected from the website of the World Health Organization (WHO). The data for infrastructure centred variables like the number of hospitals and the number of doctors was taken form [51]. The environment-based variables like average temperature and humidity since the onset of COVID-19 was taken from[52]. Day wise COVID-19 cases distribution extracted from WHO was used to identify countries that shows sign of containment of the virus based on a novel exponential growth modelling approach. Raw data from the sources was also consolidated and the variables physicians per thousand individuals, hospitals per thousand individuals, percentage of lockdown days since the first contact, cases per million population, deaths per million population, days since the first case, serious cases per thousand infections, average temperature since the first infection, and average humidity since the first infection were calculated so that they are ready to train machine learning models.
Table 1: Country Wise Information on Infrastructure, Weather, Policy, and Infection as on 26th Match 2020