Climate extremes factor attribution: a small data challenge in ML realm

PRASHANT Dave

doi:10.1002/essoar.10505147.1

loading page

Climate extremes factor attribution: a small data challenge in ML realm

PRASHANT Dave

Abstract

The identification of factors driving the climate extremes have been conventionally driven by the physical models evaluated using global climate models and/or using statistical analysis.
However, owing to lack of spatial historical records, both of these approaches pose a data insufficiency challenge. Moreover, identification of primary drivers of climate extremes from a larger set of factors can pose another challenge. Bagging machine learning models in conjugation of synthetic sampling techniques can address both of these challenges.
Here, I demonstrate the applicability of three synthetically sampling techniques along with Random Forest (RF) to identify the main drivers and their spatial locations affecting the heatwave days over India for the period of 1979-2013. The three sampling techniques used to generate balanced data are undersampling, oversampling and synthetic minority oversampling technique (SMOTE). It was RF model with SMOTE that could identify the most important factors with greater precision and recall ($f1-$score (0.85)) as compared to other sampling techniques. Geopotential height\@500 hPa along with sensible heating fluxes were identified as important factors characterizing the Indian heatwave days. The work has repercussion for any of the climate extremes which lacks balanced data along with significantly lesser number of observations than the factors.