3.2 | Analysis of CRISPR repeat-spacer arrays
The CRISPR array was composed of discontinuous DR sequences and
intervening spacer sequences. As
presented in Figure 2A, the size and
base arrangement of the repeat sequence were relatively conserved within
CRISPR types. The length of the repeat sequences was 29 nucleotides for
type I-E and IV-A systems, 28 nucleotides for I-E* system, respectively
(Figure 2D). In addition, the prediction results of RNA secondary
structure showed that all types of repeats formed stable ′stem-loops′
structures in the middle (Figure 2C). According to the predictions, the
RNA secondary structure included 9, 7, and 6 bp stem lengths for types
I-E, I-E*, and IV-A, respectively (Figure 2C). Based on the structure
diagram and MFE value, the secondary structures of repeat sequences can
be analyzed for conservation and stability. A small MFE value indicated
high structural stability, the length of stem was proportional to
structural stability. As shown in Figure 2C, the secondary structure of
type I-E repeat sequence had the least MFE values and the longest stem
colored in red, suggesting that the secondary structure of the type I-E
repeat was the most stable.
Spacer sequences are captured into CRISPR array with the aid of Cas
proteins. Spacer number reflects the activity of CRISPR/Cas system.
As seen in Figure 2E, the number of
the spacers was diverse. Among 105 strains carrying CRISPR/Cas system,
strain WUSM_KV_47 had the largest number of spacers (41 type I-E
spacers). The smallest spacer number was identified in strain
EuSCAPE_TR218 (3 type IV-A spacers). Moreover, type I-E system (26.5,
17-37) had more spacers than type I-E* (13, 8.5-15; p <
0.001) and type IV-A (16, 13-20; p < 0.001). However,
there was no significant difference in spacer numbers between type I-E*
and type IV-A systems (13, 8.5-15 vs 16, 13-20; p = 0.128).
PAM plays an important role in the acquisition of spacer sequences. As
shown in Figure 2B, PAM sequences for type I-E system, I-E* system was
inferred to be 5′-AAG-3′ and 5′-(C) GAA-3′, respectively. Considering
that PAM was essential elements for Cas protein to recognize and
degrading foreign DNA, diverse PAM represented different Cas protein
variants. The difference of PAM in type I-E and I-E* system further
supported the evolutionary and functional divergence. Notably, PAM
predicted for type IV-A system (5′-AAG-3′) was identical to that
predicted for type I-E.