Intradomain insertions
Loops are present within domains as well, usually on the surface between two secondary structure elements. From the point of view of a protein engineer seeking to insert protease cut sites, purification tags, or domains, known loops tolerant of significant changes to their composition and/or length were catalogued; thus, repetitive elements that alter secondary structures or undefined structures were not considered (Figure 4, Table II).
AT region [including FSD, n=949): Most of the observed insertions (19/34) were before the first β-strand or after the last β-strand of AT, in the FSD [especially α1-α2 (8) and α3-β3 (10), but also β3-β4 (5) and the loop following η4 (4)]. Within the AT domain, the loops that tolerated the most insertions are α10-β7 (3) and α15-β12 (3).
DH [n=599]: The longest loop, β9-β10, is also the most frequently inserted (16 instances, including a GT dipeptide repeated 20 times in the cyclizidine PKS). The next most frequently inserted loop is η5-β14 (6), adjacent to the active site.
KR region of β-modules [including the dimerization element (DE), n=279]: None of the 3-helix DE, present in 77% of the β-modules, contain insertions [27]. Although KRs and KRc are similarly sized, KRs contains significantly more insertions (19 vs. 2), with β3-α5 (5) and η1-β6 (5) being the most frequently inserted loops.
KR from γ- and δ-modules [n=599]: Even with the region upstream of α2 in KRs not being analyzed due to low sequence conservation, KRs contains more insertions than KRc (18 vs. 4). The most frequently inserted loop of KRc is α6-β6 (2).
ER [n=158]: Each of the 4 observed insertions are located in the N-terminal portion of the substrate-binding subdomain.
ACP [n=949]: No insertions were observed within this 100-residue, helical domain. This analysis includes α1 (often referred to as “helix 0”), which is rarely absent[26].
DDs [n=388]: Most of the docking domain motifs,CDD and NDD, could be grouped into Class 1a (n=226), Class 1b (n=63), or Class 2 (n=70) (Supplementary Figures 9-11)[7]. No insertions were observed in the Class 2 docking motifs. The majority of the insertions in Class 1a and Class 1bCDDs were immediately upstream of the terminal helix (8/12 and 4/6). The only NDDs that possess insertions at their upstream end belong to Class 1a (3).
KS [n=949]: Two insertions were observed, both in β13-β14, the most downstream loop.