The properties of channel (bandwidth, channel capacity, noise)
In this study we use one of the most basic models, AWGN channel, to
mimic preservation environments with limited explanation. AWGN channel
requires that noises have uniform power in frequency domain and gaussian
distribution in time domain. In this study, we treat the character
number as bandwidth, then the character probably corresponds to the
frequency domain in typical communication system and OTUs to the time
domain. This model sounds natural based on the model in Fig. 1 as every
organism ever lived on earth was a signal sent, and fossils are a small
fraction received. However, in character matrices analyzed here, many
OTUs are scored based on multiple specimens, therefore result in the
aggregation of scored characters in the first few columns in Fig. 3a.
For the time domain/OTUs, the noises derived from natural preservation
and are controlled by many factors, so it is probably fair to use AWGN
channel model for both simplification and convenience.
From the estimation based on AWGN channel model (Fig. 3c), all character
matrices show saturation of characters. The basic explanation of
saturation is that with the increase of bandwidth, the noise also
increases. Incompleteness, deformation, and misidentification are common
among the fossil specimens. If the nature of the paleontological
information channel is noisy, we cannot expect to efficiently transmit
paleontological information without channel coding. Moreover, the time
costs in both encoding and decoding have to be considered when facing
extremely giant character matrices.