Introduction

Most extinct fossil organisms only preserved their morphology rather than macro biomolecules including DNA and proteins. Therefore, researchers need to transfer the morphology of fossils into sequences, a series of scored morphological characters for example, and analyse such sequences to identify each OTU (classification) and reconstruct their evolutionary history (systematics). However, unlike DNA or protein sequences coded by fixed alphabets (4 nucleotides and 20 amino acids), there is not a universal morphological alphabet of extinct organisms. A practical and the most common way to transfer morphology into sequences is constructing morphological characters matrices. The difficulties in constructing morphological characters have been realized early (Wilkinson 1995), and many early attempts to propose methods/guidance in character construction are far from satisfactory (Estabrook et al., 1975; Hawkins et al., 1997; Sereno 2007). The definition of “character” (in cladistics analysis) has also been discussed a lot (see review by Sereno 2007) but is far from being universally applied .
Besides the most basic question of what a character is, discussions on whether to use giant matrices (Laing et al., 2017) or not (Simoes et al., 2016), which anatomical structures should characters come from (Brocklehurst & Benevento 2020), whether to combine morphological characters with molecular data and shape data (Nylander et al., 2004, Catalano et al., 2010), etc., have been ongoing accessed. Moreover, due to the incompleteness and distortion from preservation environments, most morphological character matrices can only be partially scored. If morphological characters are the most basic units in morphology-based systematic studies, which resemble the nucleotides in DNA sequences and amino acids in proteins, analysing character matrices under the framework of information theory may help to better understand those arguments. .
The word information is repeatedly used in systematic studies (Cracraft 1974, Farris 1979; Mickevich & Platnick 1989; Wilkinson et al., 2004; Sereno 2007; Simoes et al., 2016; Laing et al., 2017) but often it seems to be confused with data, signal, or its embedded semantic meaning, and few studies have connected information theory with systematic studies, especially for fossil-based ones. Similarly, during the early development of tele-communication system, even after the extensive applications of telegraph, telephone, and broadcast in 1940s, people didn’t formulized theories of communication system engineering until information theory was proposed by Shannon (1948). By that time, the transmitted signals, for example the “·” and “-” in Morse Code, and their semantic meaning, for example “we found a dinosaur skull”, are not separated properly. And this ignorance had brought difficulties in improving the quality of communication because no guidance existed to maximize the efficiency of coding information source or to minimize the influence of noises in communication channels.