3.1 Statistical Criteria for functional evaluation
In order to introduce appropriate bias–variance criteria for xc functionals, well‑established model validation techniques from
statistical analysis must be used. Several criteria are available in
statistics for model selection and validation, mostly belonging to three
main classes:
- Methods based on information criteria \cite{akaike_new_1974}.
- Methods obtained from Vapnik–Chervonenkis theory \cite{vapnik_uniform_1971}.
- Resampling methods \cite{geisser_predictive_1993,devijver_pattern_1982}.
In general, the first two classes include analytic methods that evaluate
the overall uncertainty (risk) of the model by inflating the error of
the fitted model calculated on the training set (or some appropriate
data set) by a penalty factor that depends on the degrees of freedom (DoF) of
the model and the number of data in the set. These methods usually have
to rely on assumptions on both the type of function that is estimated
and the statistical distribution of the data. The third class of models
require external data sets for validation, and is usually more
computational demanding, however it has the advantage of not relying on
any assumptions on the distribution of the errors, nor the training
data. Among the first class, the Akaike information criterion
(AIC) \cite{akaike_new_1974} is the most widely used estimator of error prediction.
This coefficient is constructed from maximum likelihood arguments,
and it uses an additive formula to evaluate the overall risk, R ,
as:
\(\begin{equation}\label{eqn:9}R=R_{\text{emp}}+f\left(n,p\right), \end{equation}\)
where the empirical risk, \(R_{\text{emp}}\), represents the error of
the fitted model calculated on the training set, and should not be
confused with the error associated with the comparison of DFT data and
empirical (experimental) results, in a chemical sense. In order to
evaluate \(R_{\text{emp}}\), the recent ASCDB database can be used,
since it was specifically created to evaluate the performance of DFT
functionals. To account for the large differences in the average of the
absolute reference energies of each subset of ASCDB, it is convenient to
introduce here an overall weighted mean unsigned error (wMUE),
calculated from the mean unsigned errors of the individual subsets,
MUEi , using:
\(\begin{equation}\label{eqn:10}w\text{MUE}=\sum_{i=1}^{16}{w_{i}\text{MUE}_{i}}\, \end{equation}\)
where the individual weights are calculated from the ratio between the
average of the absolute reference energies for each subset,\(\left| \overline{\Delta \text{E}}\right|_{i}\), and that of the
overall database (which is 6.988 kcal/mol for ASCDB, weights for this database are provided within the Jupyter notebook that accompany the electronic version of this article and on the Author's github page):