3.1 Statistical Criteria for functional evaluation

In order to introduce appropriate bias–variance criteria for xc functionals, well‑established model validation techniques from statistical analysis must be used. Several criteria are available in statistics for model selection and validation, mostly belonging to three main classes:
In general, the first two classes include analytic methods that evaluate the overall uncertainty (risk) of the model by inflating the error of the fitted model calculated on the training set (or some appropriate data set) by a penalty factor that depends on the degrees of freedom (DoF) of the model and the number of data in the set. These methods usually have to rely on assumptions on both the type of function that is estimated and the statistical distribution of the data. The third class of models require external data sets for validation, and is usually more computational demanding, however it has the advantage of not relying on any assumptions on the distribution of the errors, nor the training data. Among the first class, the Akaike information criterion (AIC) \cite{akaike_new_1974} is the most widely used estimator of error prediction. This coefficient is constructed from maximum likelihood arguments, and it uses an additive formula to evaluate the overall risk, R , as:
\(\begin{equation}\label{eqn:9}R=R_{\text{emp}}+f\left(n,p\right), \end{equation}\)
where the empirical risk, \(R_{\text{emp}}\), represents the error of the fitted model calculated on the training set, and should not be confused with the error associated with the comparison of DFT data and empirical (experimental) results, in a chemical sense. In order to evaluate \(R_{\text{emp}}\), the recent ASCDB database can be used, since it was specifically created to evaluate the performance of DFT functionals. To account for the large differences in the average of the absolute reference energies of each subset of ASCDB, it is convenient to introduce here an overall weighted mean unsigned error (wMUE), calculated from the mean unsigned errors of the individual subsets, MUEi , using:
\(\begin{equation}\label{eqn:10}w\text{MUE}=\sum_{i=1}^{16}{w_{i}\text{MUE}_{i}}\, \end{equation}\)
where the individual weights are calculated from the ratio between the average of the absolute reference energies for each subset,\(\left| \overline{\Delta \text{E}}\right|_{i}\), and that of the overall database (which is 6.988 kcal/mol for ASCDB, weights for this database are provided within the Jupyter notebook that accompany the electronic version of this article and on the Author's github page):