Information source
No matter what algorithm is being used in systematics studies, the common aspect is using sequences (DNA, amino acids, and morphological characters) to characterize organisms and to interpret their evolutionary history. With fixed alphabets, DNA and protein sequences resemble digit signals in modern communication systems, while morphology of fossils is more like analog signals. Therefore, the process of character construction is the same as sampling digit signals from analog signals, meanwhile, the probably infinite original information entropy of fossil morphology is transferred into finite entropy, represented by hundreds to thousands of morphological characters, that can be compared. More morphological characters usually describe organisms more completely, but it is extremely difficult to measure how complete the character matrix characterizes the overall morphology of a group of organisms. There is not a standard guidance on character selection and many characters in matrices are selected because researchers believe they carry morphological information. The interrelationship among morphological characters and how they connect to the overall morphology remains uncertain. At least from the results of mutual information and channel capacity against bandwidth, the number of characters, we show that the dependence between characters and different anatomical structures is complex, and current morphological character matrices seems to encounter the saturation of characters already. Shannon (1949) proposed the Sampling Theorem (also known as Nyquist-Shannon Sampling Theorem), which bridges the continuous signals and discrete signals. With a continuous signal source of a finite bandwidth, Sampling Theorem shows the lowest sample rate to capture all information, which is twice the rate of highest rate of original signals. As the connection between bandwidth in typical communication systems and character number of paleontological systematic studies is discussed before, Sampling Theorem may be a bridge between raw morphology and morphological characters.
However, the saturation of channel capacity (Fig. 3c) does not necessarily mean those morphological character matrices fully represent the entire morphology of fossil specimens, but cannot sufficiently convey the sampled morphological information in matrices while some other information may be left as the sampling of characters are strongly biased. The morphological matrix of multituberculata comprise only characters from the cranial region, but the postcrania of those organisms also have information.
With the wide application of advanced imaging techniques such as CT (computed tomography) scan, it is feasible to capture the complete morphology of fossil specimens without destruction. The unprecedent amount of data may be the stepstone to establish the connection between analog morphological data and digital character data. A standard workflow may be possible to morphological studies under the facilitation from information theory and high-resolution imaging.