The properties of channel (bandwidth, channel capacity, noise)
In this study we use one of the most basic models, AWGN channel, to mimic preservation environments with limited explanation. AWGN channel requires that noises have uniform power in frequency domain and gaussian distribution in time domain. In this study, we treat the character number as bandwidth, then the character probably corresponds to the frequency domain in typical communication system and OTUs to the time domain. This model sounds natural based on the model in Fig. 1 as every organism ever lived on earth was a signal sent, and fossils are a small fraction received. However, in character matrices analyzed here, many OTUs are scored based on multiple specimens, therefore result in the aggregation of scored characters in the first few columns in Fig. 3a. For the time domain/OTUs, the noises derived from natural preservation and are controlled by many factors, so it is probably fair to use AWGN channel model for both simplification and convenience.
From the estimation based on AWGN channel model (Fig. 3c), all character matrices show saturation of characters. The basic explanation of saturation is that with the increase of bandwidth, the noise also increases. Incompleteness, deformation, and misidentification are common among the fossil specimens. If the nature of the paleontological information channel is noisy, we cannot expect to efficiently transmit paleontological information without channel coding. Moreover, the time costs in both encoding and decoding have to be considered when facing extremely giant character matrices.