INTRODUCTION
Modular polyketide synthases (PKSs) are among the most powerful
molecular machines and synthesize some of the most important medicines
(e.g., the antibiotic erythromycin and the anticancer agent epothilone),
yet they are also some of the most difficult to study both in
vivo and in vitro [1, 2]. Fortunately, the availability of
genomic sequences provides an additional avenue for investigationin silico .
Megadalton PKS assembly lines are composed of several large polypeptides
that house sets of domains termed modules (Figure 1)[1, 3]. The
modules of cis -acyltransferase (cis -AT) assembly lines
minimally contain an AT domain that selects the carbon building blocks
(usually from malonyl- or methylmalonyl-CoA), a ketosynthase (KS) domain
that fuses the building block with a growing polyketide chain, and an
acyl carrier protein (ACP) domain that relays both building blocks and
growing chains between the enzymatic domains. Modules may also contain
processing enzymes such as a ketoreductase (KR) that reduces the β-keto
group formed by KS, a dehydratase (DH) that dehydrates the resulting
β-hydroxyacyl chain, and an enoylreductase (ER) that reduces the
resulting α,β-unsaturated chain. Other enzymes often appear in the first
and last modules to help load primer units and offload mature polyketide
chains. Recently, a new module boundary was proposed that matches how
sets of domains cooperate and evolutionarily co-migrate[4-6]. KS had
been the most upstream domain but is now the most downstream, where it
can play the role of a gatekeeper that ensures only properly processed
polyketides are passed to the next module. The common module types incis -AT assembly lines are α-modules, which do not contain
processing domains, β-modules, which possess a KR, γ-modules, which
contain a KR and DH, and δ-modules, which contain a KR, DH, and
ER[1]. Modules are commonly split by C-terminal and N-terminal
docking domains (CDD and NDD)
located between the ACP and KS[7]. The domains of PKS modules have
each been characterized at atomic resolution, and structural biologists
are endeavoring to determine their orientation relative to one another
within an intact synthase[3, 8]. Less attention has been paid to the
loops both between and within domains. While frequently not observed by
x-ray crystallography, they play key roles like helping position and
dock domains.
Amino acid repeats are present in most PKSs (Figure 2). A comparison of
the ~17,000-residue MlsA1 components of mycolactone PKSs
harbored by various mycobacteria reveals sequence identities of
99%[9]. The differences include indels that encode variable numbers
of amino acid repeats. For example, at the updated module boundary
between the sixth and seventh modules, the sequence GSDPAV is repeated 2
times in Mycobacterium ulcerans Agy99 MlsA1, 3 times in M.
ulcerans subsp. shinshuense MlsA1, and 4 times inMycobacterium liflandii 128FXT MlsA1 (Figure 2a). These
repetitive sequences are encoded by tandem repeats of the highly
conserved 18-mer, 5’-GGTTCTGATCCCGCAGTG-3’. As another example, the loop
at the boundary between the second and third modules of the nannocystin
PKS from the myxobacterium Nannocystis sp. MB1016 contains
two repetitive sequences, one from a 46-mer repeated 3.5 times and the
other from a 36-mer repeated 4.9 times (Figure 2b). This module also
harbors a remarkable 152-residue insertion between the structural and
catalytic subdomains of the KR (KRs and
KRc) resulting from a 48-mer repeated 9.7 times[3].
Each of these repeats is in a loop where it most likely does not impact
PKS function, and their origin appears to be genetic, resulting from
events such as slipped-strand mispairing[10, 11]. They provide an
opportunity to identify the loops of PKSs tolerant to change, which
could help elucidate the dynamics of assembly line domains, design
better experiments, and engineer hybrid synthases that produce new
molecules. Here, the tandem repeats in the genes encoding 949 modules
within 129 cis -acyltransferase PKSs are catalogued, and the
locations of the corresponding amino acids within the module are
identified. The most frequently inserted interdomain loop corresponds
with the updated module boundary immediately downstream of the
ketosynthase (KS). An analysis of the loops bordering ACP reveals they
are relatively short and that ACP relies on AT accessing a conformation
such as that observed by electron microscopy of the pikromycin
PKS[8]. The resistance of the ACP and KS domains to modifications
indicates their sensitivity to alteration.