Collecting and analyzing sequences
Sequences of the DNA encoding the abyssomicin (Aby, Verrucosispora
maris AB-18-032, JF752342.1), ajudazol (Aju, Chondromyces
crocatus , AM946600.1), akaeolide (Aka, Streptomyces sp.NBRC 109706 , BBOM01000011.1), althiomycin (Alm, Myxococcus
xanthus , FR831800.1), ambruticin (Amb, Sorangium cellulosum ,
DQ897667.1), amphotericin (Amp, Streptomyces nodosus , AF357202),
anatoxin (Ana, Oscillatoria sp. PCC 6506 ,
FJ477836.1), annimycin (Ann, Streptomyces calvus , KF683117.1),
ansamitocin (Asm, Actinosynnema pretiosum subsp.pretiosum , KY4899977.1), apoptolidin (Apo, Nocardiopsissp. FU40 , JF819834.1), aurafuron (Auf, Stigmatella
aurantiaca DW4/3-1, AM850130.1), aureothin (Aur, Streptomyces
thioluteus , AJ575648.1), avermectin (Ave, Streptomyces
avermitilis , AB032367.1), bafilomycin (Baf, Streptomyces lohii ,
GU390405.1), BE-14106 (Bec, Streptomyces sp. DSM
21069 , FJ872523.1), bengamide (Ben, Myxococcus virescens ,
KP143770.1), borrelidin (Bor, Streptomyces parvulus , AJ580915.1),
calcimycin (Cal, Streptomyces chartreusis NRRL 3882, HM452329.1),
candicidin (Fsc, Streptomyces sp. FR-008 , AY310323.2),
chalcomycin (Chm, Streptomyces bikiniensis , AY509120.1),
chaxamycin (Cxm, Streptomyces leeuwenhoekii , LN831790.1),
chlorizidine (Clz, Streptomyces sp. CNH-287 , KF585133.1),
chlorothricin (Chl, Streptomyces antibioticus , DQ116941.2),
chondramide (Cmd, Chondromyces crocatus , AM179409.1),
chondrochloren (Cnd, Chondromyces crocatus , AM988861.1),
coelimycin (Cpk, Streptomyces coelicolor A3(2), AL645882.2),
concanamycin (Con, Streptomyces neyagawaensis , DQ149987.1),
conglobatin (Cng, Streptomyces conglobatus , LN849060.1),
cremimycin (Cmi, Streptomyces sp. MJ635-86F5 , AB818354.1),
crocacin (Cro, Chondromyces crocatus , FN547928.1), cryptophycin
(Crp, Nostoc sp. ATCC 53789 , ER159954.1), curacin (Cur,Moorea producens 3L, HQ696500.1), cyclizidine (Cyc,Streptomyces sp. NCIB 11649 , KT327068.1),
cylindrospermopsin (Cyr, Cylindrospermopsis raciborskii AWT205,
EU140798.1), cystothiazole (Cta, Cystobacter fuscus , AY834753.1),
divergolide (Div, Streptomyces sp. HKI0576 , HF563079.1),
DKxanthene (Dkx, Stigmatella aurantiaca , BN001209.1), E-837
(E837, Streptomyces aculeolatus , DQ292520.1), ebelactone (Ebe,Streptomyces aburaviensis , KC894072.1), ECO-02301 (Eco,Streptomyces aizunensis , AY899214.1), elaiophylin (Ela,Streptomyces sp. ICBB 9297 , GP697151.1), epothilone (Epo,Sorangium cellulosum , GU063811.1), erythromycin (Ery,Saccharopolyspora erythraea , AM420283.1), FD-891 (Gfs,Streptomyces graminofaciens , AB469193.1), filipin (Pte,Streptomyces avermitilis MA-4680, BA000030.3), FK520 (Fkb,Streptomyces hygroscopicus subsp. ascomyceticus ,
AF235504.1), fostriecin (Fos, Streptomyces pulveraceus ,
HQ434551.1), geldanamycin (Gdm, Streptomyces hygroscopicus NRRL
3602, AY179507.1), gephyronic acid (Gph, Cystobacter violaceum Cb
vi76, KF479198.1), guadinomine (Gdn, Streptomyces sp.K01-0509 , JX545234.1), gulmirecin (Gul, Pyxidicoccus
fallax , KM361622.1), halstoctacosanolide (Hls, Streptomyces
halstedii , AB241068.1), hectochlorin (Hct, Lyngbya majuscula ,
AY974560.1), herbimycin (Hbm, Streptomyces hygroscopicus ,
AY947889.1), herboxidiene (Her, Streptomyces chromofuscus ,
JN671974.1), hitachimycin (Hit, Streptomyces scabrisporus ,
LC008143.1), hygrocin (Hgc, Streptomyces sp. LZ35 ,
JX504844.1), incednine (Idn, Streptomyces sp. ML694-90F3 ,
AB767280.1), indanomycin (Idm, Streptomyces antibioticus ,
FJ545274.1), jamaicamide (Jam, Lyngbya majuscula , AY522504.1),
jerangolid (Jer, Polyangium cellulosum , DQ897668.1), kendomycin
(Ken, Streptomyces violaceoruber , AM992894.1), kijanimicin (Kij,Actinomadura kijaniata , EU301739.1), lankamycin (Lkm,Streptomyces rochei , AB088224.2), lasalocid (Lsd,Streptomyces lasaliensis , AB449340.1), leupyrrin (Leu,Sorangium cellulosum , HM639990.1), lipomycin (Lip,Streptomyces aureofaciens , DQ176871.1), lobophorin (Lbp,Streptomyces sp. SCSIO 01127 , KC013978.1), lobosamide
(Lob, Micromonospora sp. RL09-050-HVF-A , KT209587.1),
lorneic acid (Lor, Streptomyces sp. NBRC 109706 ,
BBOM01000004.1), macbecin (Mbc, Actinosynnema pretiosum ,
EU827593.1), maklamicin (Mak, Micromonospora sp. GMKU326 ,
LC021382.1), meilingmycin (Mei, Streptomyces nanchangensis ,
FJ952082.1), melithiazol (Mel, Melittangium lichenicola ,
AJ557546.1), meridamycin (Mer, Streptomyces sp. NRRL
30748 , DQ351275.1), microcystin (Mcy, Planktothrix agardhiNIVA-CYA 126/8, AJ441056.1), microsclerodermin (Msc, Jahnella sp.MSr9139 , KF657739.1), ML-449 (Mla, Streptomyces sp.MP39-85, FJ372525.1 ), monensin (Mon, Streptomyces
cinnamonensis , AF440781.1), mycinamicin (Myc, Micromonospora
griseorubida , AB089954.2), mycolactone (Mls, Mycobacterium
ulcerans Agy99, BX649209.1), myxalamid (Mxa, Stigmatella
aurantiaca , AF319998.1), myxothiazol (Mta, Stigmatella
aurantiaca DW4/3-1, AF188287.1), nanchangmycin (Nan, Streptomyces
nanchangensis , AF521085.1), nannocystin (Ncy, Nannocystis sp.MB1016 , KT067736.1), naphthomycin (Nat, Streptomyces sp.CS , GQ452266.1), neoaureothin (Nor, Streptomyces orinoci ,
AM778535.1), niddamycin (Nid, Streptomyces caelestis ,
AF016585.1), nigericin (Nig, Streptomyces violaceusniger ,
DQ354110.1), nocardiopsin (Nsn, Nocardiopsis sp.CMB-M0232 , KP339942.1), oligomycin (Olm, Streptomyces
avermitilis , AB070940.1), nystatin (Nys, Streptomyces norseiATCC 11455, AF263912.1), pellasoren (Pel, Sorangium cellulosum ,
HE616533.1), phenylnannolone (Phn, Nannocystis pusilla ,
KF739396.1), phoslactomycin (Plm, Streptomyces sp. HK803 ,
AY354515.1), piericidin (Pie, Streptomyces piomogenus ,
HQ840721.1), pimaricin (Pim, Streptomyces natalensis ,
AJ278573.1), pikromycin (Pik, Streptomyces venezuelae ,
AF079138.1), pladienolide (Pld, Streptomyces platensis ,
AB435553.1), puwainaphycin (Puw, Cylindrospermum alatosporumCCALA 988, KM078884.1), pyoluteorin (Plt, Pseudomonas protegensPf-5, AF081920.3), quartromicin (Qmn, Amycolatopsis orientalis ,
JF970188.1), rapamycin (Rap, Streptomyces rapamycinicus NRRL
5491, X86780.1), reveromycin (Rev, Streptomyces sp.SN-593 , AB568601.1), rifamycin (Rif, Amycolatopsis
mediterranei S699, AF040570.3), rubradirin (Rub, Streptomyces
achromogenes subsp. rubradiris , AJ871581.1), salinilactam (Slm,Salinispora tropica CNB-440, CP000667.1), salinomycin (Sln,Streptomyces albus , JN033543.1), sanglifehrin (Sfa,Streptomyces flaveolus , FJ809786.1), soraphen (Sor,Sorangium cellulosum , U24241.2), spinosyn (Spn,Saccharopolyspora spinosa , AY007564.1), spirangien (Spi,Sorangium cellulosum , AM407731.1), stambomycin (Sta,Streptomyces ambofaciens ATCC 23877, AM238664.2), stigmatellin
(Sti, Stigmatella aurantiaca Sg a15, AJ421825.1), streptazone
(Stz, Streptomyces sp. MSC090213JE08 , LC051217.1),
streptolydigin (Slg, Streptomyces lydicus , FN433113.1),
tautomycetin (Ttn, Streptomyces sp. MSC090213JE08 ,
LC061217.1), tautomycin (Ttm, Streptomyces spiroverticillatus ,
EF990140.1), tetrocarcin (Tca, Micromonospora chalcea ,
EU443633.1), tetronasin (Tsn, Streptomyces longisporoflavus ,
FJ462704.1), tetronomycin (Tmn, Streptomyces sp. NRRL
11266 , AB193609.1), thuggacin (Tga, Sorangium cellulosum ,
GQ981380.1), tiacumicin (Tia, Dactylosporangium aurantiacumsubsp. hamdenensis , HQ011923.1), tirandamycin (Tam,Streptomyces sp. 307-9 , GU385216.1), tubulysin (Tub,Cystobacter sp. SBCb004 , GU002154.1), tylosin (Tyl,Streptomyces fradiae , U78289.1), versipelostatin (Vst,Streptomyces versipellis , LC006086.1), vicenistatin (Vin,Streptomyces halstedii , AB086653.1), and zwittermycin (Zma,Bacillus cereus , FJ430564.1) assembly lines were primarily
obtained from MIBiG[12]. A FASTA file of the DNA encoding extension
modules (the “GTNAH” motif near the end of the KS domain was used as
the boundary) as well as the corresponding amino acid sequences was
generated. Modules were named by the polypeptide containing its AT, its
position within that polypeptide, and its type (e.g., the module
EryA1_3b has its AT in the first polypeptide of the erythromycin PKS in
the third position and is a β-module). Modules were divided into eight
categories: α-modules without DD (n=62), α-modules with DD (n=9),
β-modules without DD (n=168), β-modules with DD (n=111), γ-modules
without DD (n=258), γ-modules with DD (n=183), δ-modules without DD
(n=73), and δ-modules with DD (n=85). Those encoded on two polypeptides
connected through a DD were treated as one sequence (5 C’s represent the
C-terminus of the first polypeptide and 5 N’s represent the N-terminus
of the second polypeptide). The biosynthetic model for each PKS helped
determine which polypeptides contain the upstream and downstream
portions of these split modules[1] (Supplementary Data Files 1-9).
The Tandem Repeats Finder server was employed to detect tandem repeats
at the DNA level [advanced mode with alignment parameters (match,
mismatch, indels): 2, 5, 7; minimum alignment score: 50; maximum period
size: 50; maximum tandem repeat array size (bp, millions): 2][13].
These DNA repeats as well as the corresponding amino acids were made
lowercase and sequentially highlighted yellow, green, and cyan
(Supplementary Data File 1).
While this was sufficient for repetitive sequences located between
domains, curation was necessary within domains to catalog repetitive
sequences that alter the composition and/or length of known loops. Thus,
repetitive sequences located in regions that correspond to structured
elements were made uppercase. Likewise, when it was unclear whether a
repetitive sequence was located in a (e.g., in the poorly conserved
regions of KRs from γ/δ-modules), it was made uppercase.
To help determine whether a repetitive sequence is in a known loop,
multiple sequence alignments were generated with the program SEAVIEW
(using the Clustal Omega algorithm) and ESPript with the aid of known
structures [AT region (EryAT4, PDB 2QO3), KS (EryKS3, PDB 2QO3), KR
region of β-modules (SpnKR4, PDB 4IMP), KR of γ/δ-modules (SpnKR3, PDB
3SLK), DH of γ/δ-modules, ER of δ-modules (SpnER3, PDB 3SLK), and ACP
(MycACP8 from MlsB, PDB 6H0Q)] (Supplementary Figures 1-11)
[14-16]. All lowercase repetitive sequences, along with their period
size and repeat number, were tabulated (Tables I-II).
Inverted repeats were detected using the EMBOSS palindrome server
(minimum length of palindrome: 10; maximum length of palindrome: 100;
gap between repeated regions: 50; mismatches allowed: 0)[17].
Repeats were highlighted in magenta or red. Those separated by more than
10 bases were italicized to indicate a lower likelihood of being
biologically significant (Supplementary Data File 1).