In the character matrix shown in Table 2, there are 9 taxa and only 3
scored binary characters. The sequences of taxon 7 and 7.1 are the same
based the given three characters, hence they cannot be distinguished
without other characters being observed. If we would combine taxa 7 and
7.1 as a single OTU, all three characters have information entropy of 1
bit. Although such 3 binary characters are sufficient to distinguish 8
OTUs, they are far from enough to produce a resolved evolutionary
cladogram. Usually in practice, the number of characters is much larger
than the number of taxa in a character matrix and larger character
matrices seems to be a trend in paleontological systematic studies
(O’leary et al., 2013; Laing et al., 2017; Baron et al., 2017a). In
table 2, the construction of characters is not only insufficient to
represent source information entropy, 7 and 7.1 are indistinguishable,
but also vulnerable in systematic analysis. There are no redundant
scored characters to resist noise or loss of data as the mutual
information between each pair of characters is 0 bit if 7 and 7.1 are
combined as a single OTU. For example, if the digit parts of the fossils
are not preserved, taxa 1&2, taxa 3&5, taxa 4&6, and taxa 7&7.1&8
are indistinguishable because they are equally scored in the other two
characters in the same states. Or for some reason, some fungus fossils
in taxon 4 are identified as feather, so it can be confused with taxa 7
and 7.1. From this simplified example we can conclude that, to construct
a comprehensive and robust character matrix, the sequences of character
states should represent the source information entropy completely, and
enough redundancy based on mutual information should be incorporated to
minimize the influence of incomplete fossils and misidentification of
character states.
A study by Baron et al. (2017a) proposed a significantly different
dinosaur phylogeny, in which Theropoda and Ornithischia are sister
groups, forming the Ornithoscelida, and Sauropodomorpha and
Herrerasauridae form Saurischia as sister group to Ornithoscelida. In a
comment to Baron et al. (2017a), Langer et al. (2017) recovered the
“traditional” topology of dinosaur phylogeny with dichotomy of
Ornithischia and Saurischia including Theropoda, Sauropodomorpha, and
“Herrerasauridae”. The subsequent reply by Baron et al. (2017b)
mentioned that, “Langer et al., identify numerous disagreements
in terms of character scoring and suggest changing approximately 2,500
scorings, around 10% of the character data ”. Given that there are
only tiny differences between methods (Langer et al., 2017 supplementary
information), it is clear that the incongruence of original data had led
to the contrasting results, but not the algorithm used to reconstruct
the phylogeny. Both sides of authors tried to score the vast number of
morphological characters in the matrix (“457 anatomical features scored
for 74 early dinosaurs and close relatives”) as accurately as possible,
but rescoring a single character of a single taxon, Pisanosaurus
mertii , has led to a considerably different result (Baron et al., 2017b
Fig. 1). This vulnerability reflects the fact that this morphological
character matrix cannot provide robust results, although the taxon and
character numbers in these studies are larger than many previous
studies. Comparably, even before Shannon proposed the information
theory, communication engineers have designed codes, for example Morse
Code, and found factors influencing transmission quality in noisy
channel (Nyquist 1924, 1928). But a general problem had been realized
that blindly increasing the power of signal cannot improve communication
quality after certain threshold in noisy channels.
In typical digital communication systems, all messages are coded in 0
and 1 for transmission. The frequency of the transmitter is defined as
how many changes can be made during 1 second with unit Hz. With the
increase of frequency, more signals can be sent within a given time span
thus more information can be transmitted in an ideal situation.
According to the similarity between communication system and
paleontological systematic studies discussed before, the concept of
frequency in communication systems in spatial domain can be transplanted
in paleontological systematic studies in temporal domain as the number
of characters, namely bandwidth. Intuitively, if every fossil specimen
is complete and undeformed, increasing the number of morphological
characters can better describe their morphology, which correlates the
trend of using giant matrices currently. However, such positive
correlation is challenged under noisy situation, as noises also increase
as the increase of bandwidth. Channel capacity, the maximum rate of
reliable communication, can be limited by the presence of noises even
with arbitrarily large bandwidth.
In this study, we run phylogenetic analyses based on parsimony on
character matrices from 6 different vertebrate groups: Ornithischia (Han
et al., 2017), Ceratopsia (Yu et al., 2020), Diplodocidae (Tschopp &
Mateus 2017), multituberculata (Wang et al., 2019), Carnivoramorpha
(Spaulding & Flynn 2012), and lizards (Tschopp et al., 2018). We first
quantified the information entropy of each character in six matrices. To
access the differences between source coding and channel coding, we then
calculated the joint information entropy of first\(\text{n\ }\left(n\leq\text{total}\text{\ character\ num}\text{ber}\right)\)characters. To investigate the mutuality among characters, the mutual
information in each character matrix is calculated. Last, we use the
model of additive white gaussian noise (AWGN) discrete channel to
estimate the channel capacity of fossil preservation environments.
Material and methods
Information entropy
\(H=-\sum_{ }^{ }P_i\log_2\left(P_i\right)\ \) (1)
where Pi represents the possibility of \(i\)-th
possible value of the source variable, putatively possible states of
morphological characters in paleontological studies. For characters with
missing data in the character matrices, we estimate the missing parts to
have equal distribution among different states. For example, a binary
character is scored 0 in 20% taxa and 1 in 40% taxa, the estimated
distribution would be 0 in 20%+20%=40% and 1 in 40%+20%=60%. We
also calculate those values without the estimation of missing data as a
reference.
For a binary variable \(P\) with probability of \(p\) and \((1-p)\),
its information entropy is:
\(H=-p\log_2\left(p\right)\ -\left(1-p\right)\log_2(1-p)\)(2)
, and this relationship between information entropy and probability is
illustrated in Fig. 2a
Joint information entropy of the first \(n\) characters:
\(H_{\text{accu}}(n)=-\sum_{ }^{ }S_i\log_2\left(S_i\right)\)(3)
, where \(S_{i}\) is the probability of \(i\)-th distinct character
sequence of the first \(n\) characters.
Channel capacity of AWGN discrete channel:
\(C=B\log_2\left(1+\frac{S}{N}\right)\) (4)
, where \(C\) is the channel capacity, \(B\) is the bandwidth (number of
characters), \(S\) and \(N\) are the power of signal (scored characters)
and noise (unscored characters), respectively. Character matrices are
from published studies (Spaulding & Flynn 2012; Tschopp & Mateus 2017;
Han et al., 2018; Tschopp et al. 2018; Wang et al., 2019; Yu et al.,
2020). Calculation is done by custom Python 3.7 scripts.
Phylogenetic analysis was done in TNT 1.5 using traditional search
(Goloboff & Catalano 2016). The strict consensus tree was appended to
the last of tree list in each method. CI and RI are only calculated for
the strict consensus trees.