Character matrix construction and weighting
The construction of (morphological) character matrices is central to systematic studies and has been discussed extensively. In this study, we make the initial attempt to quantify the information in existing morphological character matrices for the first time. Many results show consistence to common understanding of morphological characters, including different characters having different amount of information, mutuality existing among characters, more characters usually carrying more information, etc. Besides, we also propose that the information entropy of each character can be used as their weights in phylogenetic analysis.
As the information entropy represents how informative a character is, it may be a candidate of character weighting in phylogenetic analysis. Most researchers agree that some kind of weighting should be applied in systematic analysis and equal weighting is one of the weighting methods (Farris 1969, Sereno 2007). Based on the successive weighting proposed by Farris (1969), Goloboff (1993) proposed implied weighting and extended implied weighting (Goloboff 2014). These weighting methods refine the weights of different characters to reduce homoplasy. However, Congreve & Lamsdell (2016) indicated that implied weighting is not consistent with the idea of parsimony and increase both correctly and incorrectly resolved nodes with simulated datasets. The wide use suggest that implied weighting and its variants probably provide a direction in reconstructing better resolved trees, but neither the theoretical basis nor its utilization answer the core question of how much information is in each character and may fail when working with character matrices with too many homoplastic characters.
Birds and modern mammals are both endothermic, covered with filaments rather than scales, have four-chamber hearts, etc. If we would deliberately sample too many characters describing these features, the conclusion could easily be forced into that birds are mammals, and many synapomorphies between birds and other reptiles, for example the presence of sclerotic rings, can be recovered as homoplasy. Fortunately, there are many other lines of evidence, which mean more information, showing that birds are more closely related to modern reptiles than modern mammals. The morphology and physiology of birds, the genetic data, and the fossil records all indicate that these similar features between birds and mammal are results of convergent evolution. It is not reasonable to refute that birds are dinosaurs with considerable fewer features against the overwhelming evidence from fossils, molecular biology, anatomy, and many other aspects. However, such biased sampling of character can be hard to be realized for extinct groups with only limited fossil materials and implied weighting may even strengthen such bias. But information theory may discover those biased sampling. If such a character matrix exists, since its biased sampling, the mutual information among characters would be high and the channel capacity may not be saturated by the number of characters, because there is only little information represented by biased sampled characters.
Successive weighting, implied weighting and their variants require an initial weight or an existing tree topology, whereas information entropy weighting only depends on the information entropy in each character. In character construction, the fact is that the selections of characters are extremely biased as most morphological character come from cranial area in paleontological vertebrate studies (Fig. 2d). In the six datasets we analyzed here, the proportion of cranial characters are from 40.7% to 100% with an average of 63.2%, which immediately shows that some parts have more morphological information (or “more important”) than others in systematic studies. Practically, any multi-state characters can be split into several binary characters, and character matrices examined in this study all have multi-state characters. If using equal weighting, the weights are different for multi-state characters and binary characters. But based on information theory, since\(H\left(A+B\right)=H\left(A\right)+H\left(B\right)-I(A,B)\), multi-state characters can be accurately split into independent binary characters (with 0 mutual information) without losing or adding any information.
Intuitively, using information entropy as a measure of character weight also conforms with our understanding of what a character is. A character can be treated as a random variable and its states are possible values. For simplification, we only discuss discrete memoryless information source here, which has discrete signals and previous signals do not influence later ones. If a character is coded the same across the entire group of organisms, \(P(Character\ A\ =\ 0)\ =\ 1\), then it has an information entropy of 0, therefore should be weighted as 0, namely excluded. The character with the highest information entropy has the most equal distribution of character states (Fig. 2a). Those characters with very unequal distributions, for example only a few OTUs are scored as 0 and the majority as 1, should be down weighted because they only contribute little to the source coding by distinguishing very few OTUs, and easily influenced by environmental noises. Information entropy weighting gives characters that contribute more to the source information entropy higher weight. Kälersjö et al. (1999) studied plant nucleotides data and their results showed that fast evolving and highly homoplastic third codon positions, opposite to traditional thought, have the unexpectedly strongest phylogenetic information, and they also suggest that the frequency of change should be used as in character weighting and selection. Although these authors tried to quantify the information in different nucleotide sites, i.e., molecular characters, they did not provide an explanation on how they define information/informative sites.
We tested the results from equal weighting, implied weighting (k = 3&12), and information entropy weighting of six matrices analysed before. Ceratopsia are illustrated in Figure 4. To save space and show the differences among trees, colored columns replace the OTU names on the right side of trees and color gradients correspond to the taxa order in character matrix. Detail phylogenetic results of six groups are provided in Dryad doi:10.5061/dryad.8sf7m0cnc. Generally, they show unexpected consistence between both equal weighting and implied weighting, but slight differences are common. The CI (consistence index) and RI (retention index) are also calculated for the strict consensus tree of each group in table 3. The CI of entropy weighting is generally slightly lower than other methods and RI is slightly higher, suggesting that more homologous characters are suggested and the trees fits better for entropy weighted characters.
Table 3. CI and RI of different morphological character matrices