INTRODUCTION
Species inventories are the universal currency of community ecology:
counting individuals that belong to each of the species in a place is a
routine and fundamental practice. These counts are the basis of key
community assembly theories (Fisher et al., 1943; MacArthur, 1957;
Bulmer, 1974; Caswell, 1976; Hubbell, 2001). It is thought that count
distributions usually tail off with an array of rare species (McGill et
al., 2007). When this is true, estimating species richness is difficult
and dangerous because inventories are likely to be quite incomplete
(Colwell & Coddington, 1994). Biodiversity is of deep concern
throughout science and society, making it imperative to solve this
problem. I focus here on the richness estimation strategy of fitting
inventories to mathematical distributions that imply fixed numbers of
missing species. In the course of doing so, I show not only that this
idea is feasible but that fundamental processes of community assembly
can be distinguished using basic inventory data.
Population dynamical models going back to Kendall (1948) have been used
before to predict shapes of species abundance distributions (SADs), but
the SADs have generally involved multi-parameter equations (Volkov et
al., 2005; Jabot & Chave, 2011). The three models considered here all
require a single parameter. I put aside other single-parameter models
such as the broken stick (MacArthur, 1957), the geometric series as
applied to rank-abundance distributions (Motomura, 1932), the logistic-J
(Dewdney, 2000), and the Zipf (see Newman, 2005) because they have
received little support in comprehensive assessments of distributions
(Alroy, 2015; Baldridge et al., 2016) and have not been considered in
many studies that have treated two or three distributions at a time
(Hughes, 1986; Dewdney, 2000; Connolly et al., 2005; Ulrich et al.,
2010; Antão et al., 2021). I do not consider the gambin model (Ugland et
al., 2007; Matthews et al., 2014) because it appears only to describe
distributions of counts binned into octaves on a log scale (Preston,
1948). Gray et al. (2006) are among several to have pointed out problems
with this approach. So like others including Antão et al. (2021), this
study is concerned with models such as the log series (Fisher et al.,
1943) that predict counts of singletons, doubletons, and so on – i.e.,
SADs in a restricted sense.
I also do not consider two-parameter distributions such as the classic
Poisson log normal (PLN: Bulmer, 1974) and the truncated negative
binomial (Connolly et al., 2009; Connolly & Thibaut, 2012). These
models have much traction: for example, the PLN has been argued to fit
extensive datasets of trees, birds, fishes, and benthic organisms (Antão
et al., 2021), not to mention all GBIF occurrence records in the world
combined (Callaghan et al., 2023). Meanwhile, the negative binomial has
been fit to a vast data set for Amazonian trees (ter Steege et al.,
2020). There are two major reasons not to consider these models for the
moment. First, they overfit the data, reducing chances of predicting
related patterns. Second, one-parameter distributions are often so good
that they cannot be rejected by a saturated model. The latter is a
highly resolved function that closely mirrors the raw counts instead of
following a proper parameterised distribution. This paper shows how to
construct a saturated model and how to assess its fit to the data. The
upshot is that the three substantive models under consideration perform
so well there is little left to explain.