Azim Ahmadzadeh

and 3 more

Strong solar flares are indeed rare events, which make the flare classification task a rare-event problem. Solar energetic particle events are even rarer space weather events as only a few instances of them are recorded each year. With the unprecedented growth in employment of Machine Learning algorithms for rare-event classification/forecast problems, a proper evaluation of rare-event models becomes a necessary skill for domain experts to have. This task remains to be an outstanding challenge as both the learning process and the metrics used for quantitative verification of models can easily obscure or skew the true performance of models and yield misleading and biased results. To help mitigate this effect we introduce a bounded semimetric space that provides a generic representation for any deterministic performance verification metric. This space, named Contingency Space, can be easily visualized and shed light on models’ performance as well as on the metrics’ distinct behavior. An arbitrary model’s performance can be mapped to a unique point in this space, which allows comparison of multiple models at the same time, for a given metric. Using this geometrical setting we show the difference between a metric’s interpretation of performance and the true performance of the model. Using this perspective, models which are seemingly different but practically identical, or only marginally different, can be easily spotted. By tracking down a learner’s performance at each epoch, we can also compare different learners’ learning paths, which provides a deeper understanding of the utilized algorithms and their challenge in the learning process. Moreover, in the Contingency Space, a given verification metric can be represented by a geometrical surface, which allows a visual comparison between different metrics—a task that without this concept could be done only by the tedious algebraic comparison of metrics’ formulae. Moreover, using such a surface, for the first time we can see and quantify the impact of scarcity of data (intrinsic to rare-even problems) on different metrics. This extra knowledge provides us with the information we need to choose an appropriate metric for evaluation of our rare-event models.

Atharv Yeolekar

and 6 more

Solar Energy Particles (SEPs) can be associated with solar flares and coronal mass ejections (CMEs) and offer energy spectra ranging from few KeVs to many GeVs. These events can occur without any notable indication and alter the radiation environment of the inner solar systems, which can potentially lead to precarious conditions for humans in space, affect the interior of spacecraft’s sensitive electronics, and trigger radio blackouts. Identifying the most critical physical parameters of the Solar Dynamic Observatory (SDO) to detect SEPs can allow for a swift response against its adverse effects. With the profusion of high-quality time series data from the SDO, which accounts for the modulating background of magnetic activity and the inherently dynamic phenomenon of pre-flares and post-flare phases; antithetical to non-representative data with the point-in-time measurements employed earlier, selection of vital parameters for solar flare classification using machine learning algorithms appears to be a well-fitted problem in this realm. The primary issue of dealing with multivariate time series data (mvts) is the large number of physical parameters operating at a rapid frequency, making the data dimensionality very high and thus causing the learning process to curb. Moreover, manually selecting vital parameters is a tedious and costly task on which experts may not always agree on the results. In response, we examined feature subset selection using multiple algorithms on both mvts data and the statistical features derived from mvts segments (vectorized data). We used the SWAN-SF (Space Weather Analytics for Solar Flares) benchmark dataset collected from May 2010 - September 2018 to conduct our experiments. The comprehensive study gives a stable scheme to recognize the critical physical parameters, which boosts the learning process and can be used as a blueprint to foretell future solar flare episodes.

Shreejaa Talla

and 3 more

A halo Coronal Mass Ejection can have a devastating impact on Earth by causing damage to satellites and electrical transmission line facilities and disrupting radio transmissions. To predict the orientation of the magnetic field (and therefore the occurrence of a geomagnetic storm) associated with an occurring CME, filaments’ sign of magnetic helicity can be used. This would allow us to predict a geomagnetic storm. With the deluge of image data produced by ground-based and space-borne observatories and the unprecedented success of computer vision algorithms in detecting and classifying objects (events) on images, identification of filaments’ chirality appears to be a well-fitted problem in this domain. To be more specific, Deep Learning algorithms with a Convolutional Neural Network (CNN) backbone are made to attack this very type of problem. The only challenge is that these supervised algorithms are data-hungry; their large number of model parameters demand millions of labeled instances to learn. Datasets of filaments with manually identified chirality, however, are costly to be built. This scarcity exists primarily because of the tedious task of data annotation, especially that identification of filaments’ chirality requires domain expertise. In response, we created a pipeline for the augmentation of filaments based on the existing and labeled instances. This Python toolkit provides a resource of unlimited augmented (new) filaments with labeled magnetic helicity signs. Using an existing dataset of H-alpha based manually-labeled filaments as input seeds, collected from August 2000 to 2016 from the big bear solar observatory (BBSO) full-disk solar images, we augment new filament instances by passing labeled filaments through a pipeline of chirality-preserving transformation functions. This augmentation engine is fully compatible with PyTorch, a popular library for deep learning and generates the data based on users requirement.

Azim Ahmadzadeh

and 5 more

In spite of more than 20 years of substantial advances, solar flare prediction remains a largely outstanding problem. This is partly because of the scarcity of major flares. Effective flare prediction, if ever achieved, would help mitigate a substantial projected economic damage, with a long-range magnitude of 1 to 2 trillion dollars for the US alone. Prediction could also help mitigate, or even prevent, serious health risks to astronauts exposed to flares’ electromagnetic radiation and particulate. While many recent flare prediction studies have opted to employ Machine Learning techniques to better tackle the problem, a lack of sufficient understanding of how to properly treat the data often leads to overly optimistic results. We use the recently generated GSU solar flare benchmark dataset, called Space Weather ANalytics for Solar Flares (SWAN-SF), to show how a ‘mediocre’ forecast model can turn into an ‘impressive’ one, by simply overlooking some basic practices in data mining and machine learning. The benchmark is a multivariate time series collection, extracted from magnetographic measurements in the solar photosphere and spans over eight years of the Solar Dynamics Observatory Helioseismic and Magnetic Imager (SDO/HMI) era. We briefly explain the data collection process, the sampling and the slicing of time series, and then outline a series of experiments using machine learning models to illustrate the common mistakes, fallacies and pitfalls in forecasting rare events. We particularly elaborate on how and why imbalanced datasets, in general, impact the models’ performance, and how different under- or over-sampling methodologies and weighting practices could introduce accurate but often weak models. Concluding, we aim to draw attention to the impact of these practices on the flare forecasting models and how to train models by accentuating the statistical robustness over a relative accuracy in prediction.