Information source
No matter what algorithm is being used in systematics studies, the
common aspect is using sequences (DNA, amino acids, and morphological
characters) to characterize organisms and to interpret their
evolutionary history. With fixed alphabets, DNA and protein sequences
resemble digit signals in modern communication systems, while morphology
of fossils is more like analog signals. Therefore, the process of
character construction is the same as sampling digit signals from analog
signals, meanwhile, the probably infinite original information entropy
of fossil morphology is transferred into finite entropy, represented by
hundreds to thousands of morphological characters, that can be compared.
More morphological characters usually describe organisms more
completely, but it is extremely difficult to measure how complete the
character matrix characterizes the overall morphology of a group of
organisms. There is not a standard guidance on character selection and
many characters in matrices are selected because researchers believe
they carry morphological information. The interrelationship among
morphological characters and how they connect to the overall morphology
remains uncertain. At least from the results of mutual information and
channel capacity against bandwidth, the number of characters, we show
that the dependence between characters and different anatomical
structures is complex, and current morphological character matrices
seems to encounter the saturation of characters already. Shannon (1949)
proposed the Sampling Theorem (also known as Nyquist-Shannon Sampling
Theorem), which bridges the continuous signals and discrete signals.
With a continuous signal source of a finite bandwidth, Sampling Theorem
shows the lowest sample rate to capture all information, which is twice
the rate of highest rate of original signals. As the connection between
bandwidth in typical communication systems and character number of
paleontological systematic studies is discussed before, Sampling Theorem
may be a bridge between raw morphology and morphological characters.
However, the saturation of channel capacity (Fig. 3c) does not
necessarily mean those morphological character matrices fully represent
the entire morphology of fossil specimens, but cannot sufficiently
convey the sampled morphological information in matrices while some
other information may be left as the sampling of characters are strongly
biased. The morphological matrix of multituberculata comprise only
characters from the cranial region, but the postcrania of those
organisms also have information.
With the wide application of advanced imaging techniques such as CT
(computed tomography) scan, it is feasible to capture the complete
morphology of fossil specimens without destruction. The unprecedent
amount of data may be the stepstone to establish the connection between
analog morphological data and digital character data. A standard
workflow may be possible to morphological studies under the facilitation
from information theory and high-resolution imaging.