Character matrix construction and weighting
The construction of (morphological) character matrices is central to
systematic studies and has been discussed extensively. In this study, we
make the initial attempt to quantify the information in existing
morphological character matrices for the first time. Many results show
consistence to common understanding of morphological characters,
including different characters having different amount of information,
mutuality existing among characters, more characters usually carrying
more information, etc. Besides, we also propose that the information
entropy of each character can be used as their weights in phylogenetic
analysis.
As the information entropy represents how informative a character is, it
may be a candidate of character weighting in phylogenetic analysis. Most
researchers agree that some kind of weighting should be applied in
systematic analysis and equal weighting is one of the weighting methods
(Farris 1969, Sereno 2007). Based on the successive weighting proposed
by Farris (1969), Goloboff (1993) proposed implied weighting and
extended implied weighting (Goloboff 2014). These weighting methods
refine the weights of different characters to reduce homoplasy. However,
Congreve & Lamsdell (2016) indicated that implied weighting is not
consistent with the idea of parsimony and increase both correctly and
incorrectly resolved nodes with simulated datasets. The wide use suggest
that implied weighting and its variants probably provide a direction in
reconstructing better resolved trees, but neither the theoretical basis
nor its utilization answer the core question of how much information is
in each character and may fail when working with character matrices with
too many homoplastic characters.
Birds and modern mammals are both endothermic, covered with filaments
rather than scales, have four-chamber hearts, etc. If we would
deliberately sample too many characters describing these features, the
conclusion could easily be forced into that birds are mammals, and many
synapomorphies between birds and other reptiles, for example the
presence of sclerotic rings, can be recovered as homoplasy. Fortunately,
there are many other lines of evidence, which mean more information,
showing that birds are more closely related to modern reptiles than
modern mammals. The morphology and physiology of birds, the genetic
data, and the fossil records all indicate that these similar features
between birds and mammal are results of convergent evolution. It is not
reasonable to refute that birds are dinosaurs with considerable fewer
features against the overwhelming evidence from fossils, molecular
biology, anatomy, and many other aspects. However, such biased sampling
of character can be hard to be realized for extinct groups with only
limited fossil materials and implied weighting may even strengthen such
bias. But information theory may discover those biased sampling. If such
a character matrix exists, since its biased sampling, the mutual
information among characters would be high and the channel capacity may
not be saturated by the number of characters, because there is only
little information represented by biased sampled characters.
Successive weighting, implied weighting and their variants require an
initial weight or an existing tree topology, whereas information entropy
weighting only depends on the information entropy in each character. In
character construction, the fact is that the selections of characters
are extremely biased as most morphological character come from cranial
area in paleontological vertebrate studies (Fig. 2d). In the six
datasets we analyzed here, the proportion of cranial characters are from
40.7% to 100% with an average of 63.2%, which immediately shows that
some parts have more morphological information (or “more important”)
than others in systematic studies. Practically, any multi-state
characters can be split into several binary characters, and character
matrices examined in this study all have multi-state characters. If
using equal weighting, the weights are different for multi-state
characters and binary characters. But based on information theory, since\(H\left(A+B\right)=H\left(A\right)+H\left(B\right)-I(A,B)\),
multi-state characters can be accurately split into independent binary
characters (with 0 mutual information) without losing or adding any
information.
Intuitively, using information entropy as a measure of character weight
also conforms with our understanding of what a character is. A character
can be treated as a random variable and its states are possible values.
For simplification, we only discuss discrete memoryless information
source here, which has discrete signals and previous signals do not
influence later ones. If a character is coded the same across the entire
group of organisms, \(P(Character\ A\ =\ 0)\ =\ 1\), then it has an
information entropy of 0, therefore should be weighted as 0, namely
excluded. The character with the highest information entropy has the
most equal distribution of character states (Fig. 2a). Those characters
with very unequal distributions, for example only a few OTUs are scored
as 0 and the majority as 1, should be down weighted because they only
contribute little to the source coding by distinguishing very few OTUs,
and easily influenced by environmental noises. Information entropy
weighting gives characters that contribute more to the source
information entropy higher weight. Kälersjö et al. (1999) studied plant
nucleotides data and their results showed that fast evolving and highly
homoplastic third codon positions, opposite to traditional thought, have
the unexpectedly strongest phylogenetic information, and they also
suggest that the frequency of change should be used as in character
weighting and selection. Although these authors tried to quantify the
information in different nucleotide sites, i.e., molecular characters,
they did not provide an explanation on how they define
information/informative sites.
We tested the results from equal weighting, implied weighting (k =
3&12), and information entropy weighting of six matrices analysed
before. Ceratopsia are illustrated in Figure 4. To save space and show
the differences among trees, colored columns replace the OTU names on
the right side of trees and color gradients correspond to the taxa order
in character matrix. Detail phylogenetic results of six groups are
provided in Dryad doi:10.5061/dryad.8sf7m0cnc. Generally, they show
unexpected consistence between both equal weighting and implied
weighting, but slight differences are common. The CI (consistence index)
and RI (retention index) are also calculated for the strict consensus
tree of each group in table 3. The CI of entropy weighting is generally
slightly lower than other methods and RI is slightly higher, suggesting
that more homologous characters are suggested and the trees fits better
for entropy weighted characters.
Table 3. CI and RI of different morphological character matrices