It is important to highlight that this study is primarily intended to establish the
reliability of the statistical criteria for evaluation of performance
and transferability of functionals. While the reported rankings can be
used to establish some trends, the list of functionals is not
comprehensive enough to provide reliable final suggestions on which functional
to pick among the more than 300 available in the literature. Some
conclusion on the performance and transferability of the considered
functionals are still interesting to report, and are as follows:
- Best double-hybrid: DSD-PBEP86-D3(BJ) \cite{kozuch_dsd-pbep86:_2011}, alternate: PWPB95-D3(BJ)\cite{goerigk_general_2010,goerigk_thorough_2011}.
- Best hybrid meta-GGA: ωB97M-V \cite{mardirossian__2016}, alternate: PW6B95-D3(BJ) \cite{zhao_design_2005,grimme_effect_2011}.
- Best hybrid-GGA: PBE0-D3(BJ) \cite{adamo_toward_1999,ernzerhof_assessment_1999,grimme_effect_2011}, alternate: B3LYP-D3(BJ) \cite{vosko_accurate_1980,becke_density-functional_1988,lee_development_1988,becke_density-functional_1993,stephens_ab-initio_1994,grimme_effect_2011}.
- Best local meta-GGA: B97M-rV\cite{mardirossian_use_2016,sabatini_nonlocal_2013}, alternate: SCAN-D3(BJ) \cite{sun_strongly_2015,grimme_effect_2011}.
- Best local GGA: PBE \cite{perdew_generalized_1996}, alternate: PW91 \cite{perdew_atoms_1992}.
These results are strengthened by the fact that the
majority of the highlighted functionals overlap with the top
performers suggested in recent reviews by Head-Gordon’s \cite{mardirossian_thirty_2017}, Goerigk’s \cite{goerigk_trip_2019}, and Grimme’s \cite{goerigk_look_2017} groups, obtained with larger databases and considering a broader spectrum of functionals. Finally, connecting the transferability results to the issue of counting the number of parameters presented in Section \ref{816226}, the summary of the results plotted in Fig. \ref{898028} demonstrates a clear lack of correlation between the average ranking of each functional and its number of degrees of freedom. This lack of correlation supports the main message of this work: The number of fitted parameters does not represent an effective measure of the transferability of a functional. More reliable statistical criteria—such as those developed in this work, or alternatively, the probabilistic performance estimator recently introduced by Pernot and Savin \cite{pernot_probabilistic_2018,pernot_probabilistic_2020}—should be used to evaluate the reliability of new and existing xc functionals.