In the character matrix shown in Table 2, there are 9 taxa and only 3 scored binary characters. The sequences of taxon 7 and 7.1 are the same based the given three characters, hence they cannot be distinguished without other characters being observed. If we would combine taxa 7 and 7.1 as a single OTU, all three characters have information entropy of 1 bit. Although such 3 binary characters are sufficient to distinguish 8 OTUs, they are far from enough to produce a resolved evolutionary cladogram. Usually in practice, the number of characters is much larger than the number of taxa in a character matrix and larger character matrices seems to be a trend in paleontological systematic studies (O’leary et al., 2013; Laing et al., 2017; Baron et al., 2017a). In table 2, the construction of characters is not only insufficient to represent source information entropy, 7 and 7.1 are indistinguishable, but also vulnerable in systematic analysis. There are no redundant scored characters to resist noise or loss of data as the mutual information between each pair of characters is 0 bit if 7 and 7.1 are combined as a single OTU. For example, if the digit parts of the fossils are not preserved, taxa 1&2, taxa 3&5, taxa 4&6, and taxa 7&7.1&8 are indistinguishable because they are equally scored in the other two characters in the same states. Or for some reason, some fungus fossils in taxon 4 are identified as feather, so it can be confused with taxa 7 and 7.1. From this simplified example we can conclude that, to construct a comprehensive and robust character matrix, the sequences of character states should represent the source information entropy completely, and enough redundancy based on mutual information should be incorporated to minimize the influence of incomplete fossils and misidentification of character states.
A study by Baron et al. (2017a) proposed a significantly different dinosaur phylogeny, in which Theropoda and Ornithischia are sister groups, forming the Ornithoscelida, and Sauropodomorpha and Herrerasauridae form Saurischia as sister group to Ornithoscelida. In a comment to Baron et al. (2017a), Langer et al. (2017) recovered the “traditional” topology of dinosaur phylogeny with dichotomy of Ornithischia and Saurischia including Theropoda, Sauropodomorpha, and “Herrerasauridae”. The subsequent reply by Baron et al. (2017b) mentioned that, “Langer et al., identify numerous disagreements in terms of character scoring and suggest changing approximately 2,500 scorings, around 10% of the character data ”. Given that there are only tiny differences between methods (Langer et al., 2017 supplementary information), it is clear that the incongruence of original data had led to the contrasting results, but not the algorithm used to reconstruct the phylogeny. Both sides of authors tried to score the vast number of morphological characters in the matrix (“457 anatomical features scored for 74 early dinosaurs and close relatives”) as accurately as possible, but rescoring a single character of a single taxon, Pisanosaurus mertii , has led to a considerably different result (Baron et al., 2017b Fig. 1). This vulnerability reflects the fact that this morphological character matrix cannot provide robust results, although the taxon and character numbers in these studies are larger than many previous studies. Comparably, even before Shannon proposed the information theory, communication engineers have designed codes, for example Morse Code, and found factors influencing transmission quality in noisy channel (Nyquist 1924, 1928). But a general problem had been realized that blindly increasing the power of signal cannot improve communication quality after certain threshold in noisy channels.
In typical digital communication systems, all messages are coded in 0 and 1 for transmission. The frequency of the transmitter is defined as how many changes can be made during 1 second with unit Hz. With the increase of frequency, more signals can be sent within a given time span thus more information can be transmitted in an ideal situation. According to the similarity between communication system and paleontological systematic studies discussed before, the concept of frequency in communication systems in spatial domain can be transplanted in paleontological systematic studies in temporal domain as the number of characters, namely bandwidth. Intuitively, if every fossil specimen is complete and undeformed, increasing the number of morphological characters can better describe their morphology, which correlates the trend of using giant matrices currently. However, such positive correlation is challenged under noisy situation, as noises also increase as the increase of bandwidth. Channel capacity, the maximum rate of reliable communication, can be limited by the presence of noises even with arbitrarily large bandwidth.
In this study, we run phylogenetic analyses based on parsimony on character matrices from 6 different vertebrate groups: Ornithischia (Han et al., 2017), Ceratopsia (Yu et al., 2020), Diplodocidae (Tschopp & Mateus 2017), multituberculata (Wang et al., 2019), Carnivoramorpha (Spaulding & Flynn 2012), and lizards (Tschopp et al., 2018). We first quantified the information entropy of each character in six matrices. To access the differences between source coding and channel coding, we then calculated the joint information entropy of first\(\text{n\ }\left(n\leq\text{total}\text{\ character\ num}\text{ber}\right)\)characters. To investigate the mutuality among characters, the mutual information in each character matrix is calculated. Last, we use the model of additive white gaussian noise (AWGN) discrete channel to estimate the channel capacity of fossil preservation environments.
Material and methods
Information entropy
\(H=-\sum_{ }^{ }P_i\log_2\left(P_i\right)\ \) (1)
where Pi represents the possibility of \(i\)-th possible value of the source variable, putatively possible states of morphological characters in paleontological studies. For characters with missing data in the character matrices, we estimate the missing parts to have equal distribution among different states. For example, a binary character is scored 0 in 20% taxa and 1 in 40% taxa, the estimated distribution would be 0 in 20%+20%=40% and 1 in 40%+20%=60%. We also calculate those values without the estimation of missing data as a reference.
For a binary variable \(P\) with probability of \(p\) and \((1-p)\), its information entropy is:
\(H=-p\log_2\left(p\right)\ -\left(1-p\right)\log_2(1-p)\)(2)
, and this relationship between information entropy and probability is illustrated in Fig. 2a
Joint information entropy of the first \(n\) characters:
\(H_{\text{accu}}(n)=-\sum_{ }^{ }S_i\log_2\left(S_i\right)\)(3)
, where \(S_{i}\) is the probability of \(i\)-th distinct character sequence of the first \(n\) characters.
Channel capacity of AWGN discrete channel:
\(C=B\log_2\left(1+\frac{S}{N}\right)\) (4)
, where \(C\) is the channel capacity, \(B\) is the bandwidth (number of characters), \(S\) and \(N\) are the power of signal (scored characters) and noise (unscored characters), respectively. Character matrices are from published studies (Spaulding & Flynn 2012; Tschopp & Mateus 2017; Han et al., 2018; Tschopp et al. 2018; Wang et al., 2019; Yu et al., 2020). Calculation is done by custom Python 3.7 scripts.
Phylogenetic analysis was done in TNT 1.5 using traditional search (Goloboff & Catalano 2016). The strict consensus tree was appended to the last of tree list in each method. CI and RI are only calculated for the strict consensus trees.