INTRODUCTION
Modular polyketide synthases (PKSs) are among the most powerful molecular machines and synthesize some of the most important medicines (e.g., the antibiotic erythromycin and the anticancer agent epothilone), yet they are also some of the most difficult to study both in vivo and in vitro [1, 2]. Fortunately, the availability of genomic sequences provides an additional avenue for investigationin silico .
Megadalton PKS assembly lines are composed of several large polypeptides that house sets of domains termed modules (Figure 1)[1, 3]. The modules of cis -acyltransferase (cis -AT) assembly lines minimally contain an AT domain that selects the carbon building blocks (usually from malonyl- or methylmalonyl-CoA), a ketosynthase (KS) domain that fuses the building block with a growing polyketide chain, and an acyl carrier protein (ACP) domain that relays both building blocks and growing chains between the enzymatic domains. Modules may also contain processing enzymes such as a ketoreductase (KR) that reduces the β-keto group formed by KS, a dehydratase (DH) that dehydrates the resulting β-hydroxyacyl chain, and an enoylreductase (ER) that reduces the resulting α,β-unsaturated chain. Other enzymes often appear in the first and last modules to help load primer units and offload mature polyketide chains. Recently, a new module boundary was proposed that matches how sets of domains cooperate and evolutionarily co-migrate[4-6]. KS had been the most upstream domain but is now the most downstream, where it can play the role of a gatekeeper that ensures only properly processed polyketides are passed to the next module. The common module types incis -AT assembly lines are α-modules, which do not contain processing domains, β-modules, which possess a KR, γ-modules, which contain a KR and DH, and δ-modules, which contain a KR, DH, and ER[1]. Modules are commonly split by C-terminal and N-terminal docking domains (CDD and NDD) located between the ACP and KS[7]. The domains of PKS modules have each been characterized at atomic resolution, and structural biologists are endeavoring to determine their orientation relative to one another within an intact synthase[3, 8]. Less attention has been paid to the loops both between and within domains. While frequently not observed by x-ray crystallography, they play key roles like helping position and dock domains.
Amino acid repeats are present in most PKSs (Figure 2). A comparison of the ~17,000-residue MlsA1 components of mycolactone PKSs harbored by various mycobacteria reveals sequence identities of 99%[9]. The differences include indels that encode variable numbers of amino acid repeats. For example, at the updated module boundary between the sixth and seventh modules, the sequence GSDPAV is repeated 2 times in Mycobacterium ulcerans Agy99 MlsA1, 3 times in M. ulcerans subsp. shinshuense MlsA1, and 4 times inMycobacterium liflandii 128FXT MlsA1 (Figure 2a). These repetitive sequences are encoded by tandem repeats of the highly conserved 18-mer, 5’-GGTTCTGATCCCGCAGTG-3’. As another example, the loop at the boundary between the second and third modules of the nannocystin PKS from the myxobacterium Nannocystis sp. MB1016 contains two repetitive sequences, one from a 46-mer repeated 3.5 times and the other from a 36-mer repeated 4.9 times (Figure 2b). This module also harbors a remarkable 152-residue insertion between the structural and catalytic subdomains of the KR (KRs and KRc) resulting from a 48-mer repeated 9.7 times[3].
Each of these repeats is in a loop where it most likely does not impact PKS function, and their origin appears to be genetic, resulting from events such as slipped-strand mispairing[10, 11]. They provide an opportunity to identify the loops of PKSs tolerant to change, which could help elucidate the dynamics of assembly line domains, design better experiments, and engineer hybrid synthases that produce new molecules. Here, the tandem repeats in the genes encoding 949 modules within 129 cis -acyltransferase PKSs are catalogued, and the locations of the corresponding amino acids within the module are identified. The most frequently inserted interdomain loop corresponds with the updated module boundary immediately downstream of the ketosynthase (KS). An analysis of the loops bordering ACP reveals they are relatively short and that ACP relies on AT accessing a conformation such as that observed by electron microscopy of the pikromycin PKS[8]. The resistance of the ACP and KS domains to modifications indicates their sensitivity to alteration.