Two philosophies are at war in the world of functional development: the
first one originated mainly within the chemistry community from the pioneering work of Becke [xxx] and Handy [xxx], which took the approach of using
flexible mathematical forms that can be fitted using parameters to
chemical data, exact constraints, or a mix of both. The second
philosophy originated primarily within the physics community from the
ground-breaking work of Perdew [xxx], which pushed for DFT to remain a purely ab initio method, expanding the knowledge of exact conditions,
and avoiding fitting to chemical data at all cost. Among this latter
school of thoughts, fitting xc functionals has achieved a somehow
bad reputation, because parameters have been associated to overfitting
and poor transferability. A famous quote attributed by Enrico Fermi to
John von Neumann reads: “with four parameters I can fit an elephant,
and with five I can make him wiggle his trunk”.[xxx] A
corresponding frequently asked question in the DFT development community
is: “how many parameters does this functional have?”, implying that
functionals with more than four or five fitted parameters are barely
useful elephants in the DFT zoo. This question, however, underlies the
more fundamental assumption that the number of parameters is a truly unambiguous criterion with a direct connection to transferability of the results—but is it really? As pointed out in several occasions [xxx chan,
quest, yu], counting the number of
parameters is not always as straightforward as it might initially appear, especially for functionals that are not directly fitted to data. In fact,
there is no such thing as a “parameter-free” or “zero-parameter” xc functional approximation, since even functionals that are usually included in these categories have mathematical forms that contain parameters that are then determined based on
theoretical arguments. Since the true functional is still unknown, and potentially unknowable,[xxx] it seems clear that every xc functional approximation must contain an empirical
element.[xxx]
Instead of playing the game of counting fitted parameters in “parametrized functionals” and compare them to hidden parameters in
“zero-parameter” functionals, the first
portion of this article explores the somehow opposite scenario where every
functional—regardless of its development philosophy—is represented using a
simple function containing one single parameter. This new representation is a direct adaptation of the recent works of Piantadosi[xxx] and Boué[xxx], where any distribution of points in any dimension is represented by a well-behaved scalar function with a single real-valued parameter. In other words, quoting Piantadosi’s paper title: “One parameter is always enough”, even for xc functionals. The result of this procedure is that every
single functional on the first three rungs of Perdew’s Jacob’s ladder
[xxx] (corresponding to LDA, GGA, and meta-GGA approximations) can
be represented by just one single parameter.
Famous “zero-parameter” functionals, such as PBE [xxx] and SCAN [xxx], as well as
popular “parametrized functionals”, such as the Minnesota
family [xxx], are all defined by one number. This proof-of-principle exercise illustrates the
futility of the “number of parameters” as a measurement of
transferability of xc functionals.
The main purpose of this article is to juxtapose the number of parameters with a set of criteria that are suitable to evaluate the transferability of the results obtained with
different functionals.
Since the exact functional is still unknown, these criteria must rely
on statistical analysis of data across as many different chemical and physical properties as possible. Luckily, several benchmark results with hundreds of functionals are already available in the literature,[xxx] but their analysis is not unequivocal, and might even produce contrasting recommendations. This is due to the fact that the large
number of data in these studies can be sliced
and grouped into any number of ad hoc subsets, that can then be
used to statistically validate pretty much any hypothesis.
Recent work from the Author’s lab has introduced a new unbiased subdivision of some
of the most popular DFT databases generated without human intervention by means of data-science
algorithms [xxx ASCDB]. Interestingly enough, concepts that can be
derived using simple chemical intuition have been also recovered by a posteriori analysis of the machine-generated groups. This is a
reassuring fact, validating the chemical-intuition–based approach that
is commonly used by DFT developers to group and analyze the data. This new subdivision can be used as the basis for three new statistical criteria obtained adapting the Akaike information criterion (AIC), the Vapnik–Chervonenkis criterion (VCC), and a new cross-validation criterion (CVC) to the DFT results. Preliminary rankings of 53 popular xc functionals are also presented and briefly discussed.