Results
In this section we compute the information content/entropy using the statistics of the ROC curve, and the time series precision. In Figure 2, we first address the skill and information content of the method outlined in Figure 1 for a continuum of future time windows\(T_{W}\in[0.125,8.5]\) years.
Skill. Figure 2a shows the same ROC diagram as in Figure 1e for a future time window of \(T_{W}=\)1 year. As discussed previously, the red curve is the true positive rate (TPR), which ranges from 0 to 1. The diagonal line is the true positive rate for an ensemble of 50 random time series, each of which were obtained from the state variable time series \(\Theta(t)\) using a bootstrap procedure of random sampling with replacement. The ensemble of random time series is shown as the cyan curves grouped near the diagonal line.
The skill, which is the area under the ROC curve, is shown in Figure 2b as function of the future time window \(T_{W}\) , for fixed EMAN -value and \(\lambda\)-value. Figure 2c shows the skill indexSKI defined in (1), also as a function of \(T_{W}\). Both Figures 2b,c indicate that there is a maximum in skill at a value \(T_{W}\) = 0.625 years, and no skill at \(T_{W}\) = 6.875 years, where the skill curve crosses the no-skill (dashed horizontal) line.
Shannon Information from ROC. To calculate the Shannon Information entropy as a function of \(T_{W}\) using (3), we need a probability mass function pmf. For this purpose, we use the ROC curve as a cumulative distribution function, and difference it with respect to threshold values \(T_{H}\) to obtain the pmf . Because the ROC curve was constructed using 200 values of \(T_{H}\), there are 199 values of the pmf => \(p(\omega)\) to be used in equation (3).
To compare the results with those for the no skill diagonal line, we note that the diagonal line can also be regarded as a cumulative distribution, but for a uniform pmf whose value is the constantpmf => \(p(\omega)\) = 1/N. For this value ofpmf , it is easy to show that \(I_{S}=\)7.64 bits.
According to the conventional interpretation of Shannon information, one would need to ask, on average, 7.64 yes/no questions to establish the value of a random state variable just prior to the occurrence of a major earthquake during the following \(T_{W}\) years. Or in other words, the number of yes/no questions needed to determine whether a given random threshold state is followed by a window \(T_{W}\) that contains a large earthquake.
By contrast, the actual ROC curve has a lower value of \(I_{S}\), and therefore more information content, and lower entropy, than the random ROC (diagonal line). For the value of \(T_{W}\) = 1.0 year, we find\(I_{S}\) = 4.29 bits, corresponding on average to 4.29 yes/no questions.
A selection of these data are also summarized in Table 1, and are compared to data from a simple illustrative simulation discussed below. Data for skill, skill index, ROC Information, Information from random ROC, Kolbeck-Leibler Divergence [3], and Jensen-Shannon Divergence[4] are shown in the table as well. These latter quantities are measures of the difference in information entropy between the data and a random nowcast.
Shannon Information from Precision (PPV). More insight into the information content/entropy of the state variable \(\Theta(t)\)can be realized using the positive predictive value (PPV) probability, or precision. Figure 3a shows the optimized state variable as a function of time, an enlarged version of Figure 1d.
Note in particular that the top area of the state variable curve corresponds to enhanced quiescence prior to the occurrence of a large earthquake, as explained previously and in Rundle et al. (2022). Conversely, the bottom area of the curve corresponds to enhanced activation, for example aftershock occurrence following a large event.
Figure 3b shows the precision, and Figure 3c shows the corresponding self information \(I_{\text{self}}\) , equation (2), both quantities on the horizontal axis and shown as a function of the threshold valueTH on the vertical axis. These are the magenta curves in those figures. Figure 3 allows one to read horizontally and associate a value of PPV and self-information \(I_{\text{self}}\) with a given value of \(\Theta(t)\).
Also shown in Figures 3b,c are the PPV and \(I_{\text{self}}\) for an ensemble of 50 random time series, these are the cyan curves. The mean of the cyan curves is shown as a solid black line, and the 1 \(\sigma\)confidence limits are shown as dashed lines. Each random time series in the ensemble is again computed by sampling with replacement the time series \(\Theta(t)\), then for each curve calculating the PPV and\(I_{\text{self}}\) for that curve.
A main finding from Figure 3 is that the statistics of future time windows \(T_{W}\) for the ensemble of random time series do not depend on the value of the threshold \(T_{H}\) . The random (uniform) probability of a future window \(T_{W}\) containing a large earthquake is about 10%, for example. By contrast, the probability of a future time window containing a large earthquake increases dramatically as the time series \(\Theta(t)\) increases from bottom of the chart (activation phase) to the top (quiescence phase).
We also see in Figure 3c that the information entropy is basically the same for the ensemble or random curves as for \(\Theta(t)\) in the activation condition. Conversely, as quiescence becomes more dominant and the time of a large earthquake approaches, entropy decreases and information content correspondingly decreases.
We can also understand why the self-information \(I_{\text{self}}\) for the random time series is approximately 3.35 bits. In the figure, we considered a series of \(T_{W}\) = 1 year windows from 1970 to early 2022. There are thus a little more than 51 non-overlapping, independent time windows.
During this time period, there are 5 major earthquakes having magnitudes\(M\geq 6.75\): M6.9 Loma Prieta; M7.3 Landers; M7.1 Hector Mine; M7.2 El Mayor Cucupah; and M7.1 Ridgecrest earthquakes. If the earthquakes were distributed randomly in time, there would be a probability of\(p(\omega)=5/51=0.098\) of finding a large earthquake in any of these time windows.
Thus we calculate a self-information entropy for the mean of the random ensemble curves of \(I_{\text{self}}=-Log_{2}(5/51)=3.35\) bits. Therefore it would take on average 3.35 yes/no questions to determine if one of these future time windows \(T_{W}\) contains a large earthquake. Conversely, it is apparent that the self-information entropy of the PPV of \(\Theta(t)\) approaches 0 as the seismically quiescence phase becomes fully developed.
The primary conclusion from these calculations is that the information content is higher in the quiescence phase of seismicity than the activation phase. Or alternatively, that the activation phase has higher entropy than the quiescence phase.