Congyu Yu

and 4 more

The construction of morphological character matrices is central to paleontological systematic study, which extracts paleontological information from fossils. Although the word information has been repeatedly mentioned in a wide array of paleontological systematic studies, its meaning has rarely been clarified and there has not been a standard to measure paleontological information due to the incompleteness of fossils, difficulty of recognizing homologous and homoplastic structures, etc. Here, based on information theory, we show the deep connections between paleontological systematic study and communication system engineering. It is information, the decrease of uncertainty, in morphological characters that distinguishes operational taxonomic units (OTUs) and reconstructs evolutionary history. We propose that concepts in communication system engineering such as source coding and channel coding correspond in paleontological studies to the construction of diagnostic features and the entire character matrices, which should be distinguished as how typical communication systems are engineered because these two steps serve dual purposes. With character matrices from six different vertebrate groups, we analyzed their information properties including source entropy, mutual information, and channel capacity. Estimation of channel capacity shows upper limits of all matrices in transmitting paleontological information, indicating that, due to the presence of noise, too many characters not only increase the burden in character scoring, but also may decrease quality of matrices. Information entropy, which measure how informative a variable is, of each character is tested as a weighting criterion in parsimony-based systematic studies, the results show high consistence with existing knowledge with both good resolution and interpretability.