Structures of MERS-CoV Macro Domain in Aqueous Solution with Dynamics:
Coupling Replica Exchange Molecular Dynamics and Deep Learning at the
Nano Level
Ibrahim Yagiz Akbayrak1, Burak
Ulver2, Havvanur Dervisoğlu2, Mehmet
Haklidir2, Sule Irem Caglayan3,
Lukasz Kurgan4*, Vladimir N.
Uversky5,6*, Orkun Hasekioglu2*,
Orkid Coskuner-Weber3*
1Materials
Sciences and Technologies, College of Sciences, Turkish-German
University, Sahinkaya Caddesi, No. 106, Beykoz, Istanbul, 34820 Turkey;2TUBITAK, Turkish Scientific and Technological
Research Council, BİLGEM, Gebze, Istanbul 41470, Turkey;3Molecular Biotechnology, College of Sciences,
Turkish-German University, Sahinkaya Caddesi, No. 106, Beykoz, Istanbul
34820 Turkey; 4Department of Computer Science,
Virginia Commonwealth University, Richmond, VA 23284, USA;5Department of Molecular Medicine, USF Health Byrd
Alzheimer’s Research Institute, Morsani College of Medicine, University
of South Florida, Tampa, FL 33612, USA; 6Laboratory of
New Methods in Biology, Institute for Biological Instrumentation of the
Russian Academy of Sciences, Federal Research Center “Pushchino
Scientific Center for Biological Research of the Russian Academy of
Sciences”, Pushchino 142290, Russia.
Corresponding Author Emails:
vuversky@health.usf.edu,
lkurgan@vcu.edu,
orkun.hasekioglu@tubitak.gov.tr
and weber@tau.edu.tr
Keywords: MERS-CoV, REMD simulations, deep learning, bioinformatics.
Running Title: Aqueous MERS-CoV Macro Domain Structures.
ABSTRACT: A novel virus, severe acute respiratory syndrome
Coronavirus 2 (SARS-CoV-2), causing coronavirus disease 2019 (COVID-19)
worldwide appeared in 2019. Currently, we do not have a medicament that
treats the disease. One of the rea-sons for the absence of treatment is
related to the scarcity of detailed scientific knowledge of the members
of the Coronaviridae family, including the Middle East Respiratory
Syndrome Coronavirus (MERS-CoV). Structural studies of the MERS-CoV
proteins in the current literature are extremely limited. We present
here detailed characterization of the structural properties of MERS-CoV
macro domain in aqueous solution at the atomic level with dynamics. For
this study, we conducted extensive replica exchange molecular dynamics
simulations linked to a generative neural network and we use the
resulting trajectories for structural analysis. We perform structural
clustering based on the radius of gyration and end-to-end distance of
MERS-CoV macro domain in aqueous solution with dynamics at the atomic
level. We also report and analyze the residue-level intrinsic disorder
features, flexibility and secondary structure. Furthermore, we study the
propensities of this macro domain for protein-protein interactions and
for the RNA and DNA binding. Results are in agreement with available
nuclear magnetic resonance spectroscopy findings and present more
detailed insights into the structural properties of MERS CoV macro
domain. Overall, this work further shows that neural networks can be
used as an exploratory tool for the studies of CoV family molecular
conformational space at the nano level.
INTRODUCTION
Since the first outbreak of the severe acute respiratory syndrome (SARS)
in 2003, a fatal viral disease-causing pneumonia and death was first
reported in Saudi Arabia in 2012. This virus was named Middle East
Respiratory Syndrome Coronavirus (MERS-CoV).1 The
current SARS-CoV-2 infection, coronaviruses and coronavirus-related
infection aroused the attention of the entire world. A history of the
SARS-CoV outbreak justifies these high-levels of attention. By the time
the global SARS-CoV outbreak was contained, the virus spread to 26
countries, infected over 8,000 people worldwide and killed almost 800.
Similarly, even though MERS-CoV appeared initially in Saudi Arabia, the
virus - that was new to humans – spread to several other countries in
or near the Arabian Peninsula, Asia, Europe, and the United States of
America.2 The mortality of MERS was reported to be
4-fold higher than SARS-CoV.3 In fact, at the end of
2019, there were a total of 2,494 laboratory-confirmed cases of MERS-CoV
world-wide and the MERS-CoV infection was characterized by the mortality
rate of 34.4%. The current version of coronavirus, namely SARS-CoV-2,
is infecting and killing more people per day than SARS and MERS combined
during their existence.
Despite the history of posing threats to the human health, current
knowledge of coronaviruses is rather limited. It is clear that gaining
insights into the structural properties of various proteins from
MERS-CoV, including the conserved macro domain within the non-structural
protein 3 (NSP3), can help better understanding of theCoronaviridae family.4 Since the structural
properties of MERS-CoV macro domain in solution with dynamics are still
poorly understood, a comparison to SARS-CoV-2 macro domain in solution
with dynamics cannot be provided as well.
MERS-CoV belongs to the lineage C of β-coronaviruses (β-CoVs) that
includes CoVs isolated from bats and hedgehogs. CoVs use the RNA genome
to encode several structural proteins, including the spike glycoprotein
(S), membrane protein (M) and nucleocapsid protein (N), and various
non-structural proteins (NSPs) to facilitate its fast replication
processes.5 A single large replicase gene encodes the
proteins that play a role in viral replication.4 This
gene contains two open reading frames; ORF1a and ORF1b encoding the
polyproteins pp1a and pp1b, with the production of pp1b requiring a −1
ribosome frame-shift at the 3′ end of ORF1a.6 ORF1a
encodes viral proteases: main protease (Mpro) and
papain like protease (PLpro). These viral proteases
play a central role in the cleavage of ORF1a and ORF1b gene products in
order to produce functional NSPs.7
The largest NSP member of the MERS-CoV genome is the ORF1a-encoded,
multifunctional and multidomain protein NSP3 that serves as a major
evolutionary selection target in β-CoVs.8 NSP3
includes N-terminal acidic domain, macro domain, SARS-unique domain,
PLpro, nucleic acid-binding domain, marker domain
(G2M), transmembrane domain, and Y-domain. The macro domain received its
name based on the non-histone motif of the histone variant macroH2A,
which is a crucial protein module found in eukaryotes, bacteria, and
archaea. The macro-domain containing proteins and enzymes play central
roles in the regulation of various cellular processes. For instance, the
SARS-CoV and MERS-CoV macro domains were shown to possess
poly(AD)P-ribose binding affinity, which suggested that this domain
regulates cellular proteins that are important for an apoptotic way via
poly(ADP)-ribosylation to mediate the host response to
infection.4
Even though X-ray structure is available for the MERS-CoV macro domain,
such structure does not capture the impact of the bulk solvent
environment on protein structure and dynamics and provides a rather
limited view of the underlying structural and functional residue-level
characteristics. A detailed understanding of the structural properties
of MERS-CoV macro domain in solution at the atomic level with dynamics
linked to deep learning together residue-level characterization will
provide the lacking structural information on CoVs and may be used for
comparison with SARS-CoV-2 macro domain. In the long run, the
information gleaned from such structural studies could help to design
more efficient treatments including vaccines and small molecule drugs.
Therefore, we present the characterization of the structural properties
of MERS-CoV macro domain in aqueous solution at body temperature with
dynamics at the atomic level via linking replica exchange molecular
dynamics simulations to deep learning (generative neural networks). We
combine these results with several residue-level analyses that focus on
the structural flexibility, presence of intrinsically disordered
regions, and functional features related to the predisposition for
protein-protein and protein-nucleic acid interactions.
Proteins and long peptides, such as the NSP3 and the macro domain of the
NSP3, we are investigating in this study, represent very high
dimensional data with a large number of degrees of freedom. One way to
compress high dimensional data and obtain a lower dimensional
representation is the use of generative neural network models and
auto-encoders.9 In particular, the generative neural
networks (NN) and auto-encoders have been used to encode and model large
polypeptides.10, 11 Such generative NN models are also
useful for in silico drug design and drug
repurposing.32
MATERIALS and METHODS
Many molecular simulation scenarios require ergodic sampling of
conformations. Their energy landscapes may feature many minima and
barriers between minima that can be difficult to cross at ambient
temperatures over reachable simulation time scales. This means that the
corresponding findings are confounded by the choice of initial
conditions because such conditions determine the space region that is
explored by a simulation.12 On the other hand, replica
exchange simulations seek to enhance the conformational sampling by
running numerous independent replicas in different conditions, and
periodically exchanging the coordinates of different ensembles
(replicas).12 In this study, we conduct all-atom
replica exchange molecular dynamics (REMD) simulations of MERS-CoV macro
domain in water between the temperatures ranging from 280 K to 320 K
using 32 replicas distributed exponentially between these temperatures.
We use the CHARMM36 parameters for the MERS-CoV macro domain and the
explicit TIP3P model for water.13, 14 We apply a water
layer of 10Å with 11156 water molecules to solvate the macro domain
using a cubic box. We perform the simulations with the GROMACS 5.1.4
package.15 We isolated the initial structure for the
MERS-CoV macro domain from the publicly available crystal structure (PDB
ID: 5zu7). After solvating the macro domain in water, we first conduct
equilibration simulations for 20 ns (per replica) using the canonical
ensemble and then for additional 20 ns (per replica) using the
isothermal-isobaric ensemble. We run REMD simulations for a total
simulation time of 6.4 µs. We perform exchanges between replicas every 5
ps with a time step of 2 fs. We save trajectories every 500 steps.
Following our recent studies, We use the Langevin dynamics to maintain
the temperature of each replica with a collision frequency of 2
ps-1.16 Also, following our recent
studies,17, 18 we utilize the particle mesh Ewald
(PME) method to accommodate for the long-range interactions. We apply
the SHAKE algorithm to constrain the bonds to hydrogen atoms and we use
counterions to neutralize the charges.
We calculate the structural properties of the MERS-CoV macro domain from
the structures obtained after convergence from the replica closest to
physiological temperature (310 K, see Supporting Information section).
Also, we used the trajectories obtained from deep learning that was
linked to our REMD simulations for structural analysis. We compute the
content of the secondary structure components per residue for the
aqueous MERS-CoV macro domain utilizing the DSSP program both for data
obtained from REMD simulations and deep learning.19Additionally, we determine the end-to-end distances
(R EE) and radius of gyration
(R g) of the MERS-CoV macro domain in water using
all converged trajectories from REMD simulations as well as from deep
learning. Based on the relationship between theR g and R EE values, we
apply the k -means clustering method to perform vector
quantization and consequently to partition the structural observations
into 5 clusters. We assign each observation to the cluster with the
nearest cluster centroid that serves as a prototype of the
cluster.20 This way the structural data space is
partitioned into Voronoi cells and the k -means clustering
minimizes within cluster variances using squared Euclidean distances.
Finally, we compute the root mean square fluctuations for each residue
of the MERS-CoV macro domain in water. We compare these results to
findings secured by using disorder predictors, which we describe next.
In addition, we perform residue-level analysis of the intrinsic disorder
predisposition of the MERS-CoV macro domain and selected functional
features related to its protein and nucleic acid binding potential. We
evaluate the intrinsic disorder predisposition using a set of commonly
utilized and publicly available computational tools, such as
PONDR® VLXT,21PONDR® VSL2,22PONDR® FIT, 23 and IUPred capable of
predicting long and short disordered regions.24, 25,
26 Residue-level predisposition of this domain to interact with
proteins was evaluated with the state-of-the SCRIBER (SeleCtive
pRoteIn-Binding rEsidue pRedictor) method.27 SCRIBER
is currently the most accurate method that predicts protein-binding
residues (PBRs), and the only tool that eliminates the recently
described issue of the cross-prediction of residues that interact with
nucleic acids (RNA and DNA) as PBRs.28 This allows us
to accurately predict PBRs and maintain high specificity of our analysis
by limiting contamination of the results by the cross-predictions. We
also evaluate the nucleic acid binding potential of the MERS-CoV macro
domain with the DRNApred predictor.29 DRNApred is
currently the only method that provides accurate results and
successfully eliminates the cross-predictions.29, 30
Generative networks and auto-encoders can conveniently generate new
conformations, replacing computationally expensive molecular dynamics
computations. We use conformations produced by REMD simulations to train
a NN configured as an auto-encoder, as depicted in Scheme 1. When
training such a NN model, for high fidelity realizations of new
conformations, it is crucial that the conformation that will determine
the weights of the neurons in the NN to be very close to a plausible
typical conformation. In this respect, the results of the REMD
simulations are very suitable to be used for this purpose.
Auto-encoders are trained to encode (compress) high dimensional
conformational data to a lower dimensional space, that we refer to as
the latent space, which is a two-dimensional space in this particular
study. The two-dimensional data points in the latent space generated by
the encoder NN are then decoded (decompressed) again to the high
dimensional space to generate new conformations. The training procedure
minimizes the loss function, comprised of the average spatial distance
between the residue locations in the training conformation at the input
of the encoder and the conformation obtained at the output of the
decoder.
The conformations obtained from REMD simulations of the macro domain
constitute our training set of data. The training and test data sets
consists of converged conformations, 90% of the trajectories were used
for training and the remaining 10% for testing. The encoder NN
consisting of decreasing number of hidden layers is then trained to
project the REMD generated conformations to the 2D latent space. The
second NN, the decoder, consisting of hidden layers with increasing
number of neurons decode (decompress) the points in the latent space
back to the original conformational space of the N atoms with 3xN
spatial coordinates. During the training, the weights in both the
encoding and decoding NN layers are optimized such that the loss
function is minimized. On the other hand, the latent vectors, generated
in the 2D latent space through the training of the encoder do not always
necessarily correspond to feasible conformations. The conformations that
are physically plausible need to be eliminated. For proper selection
among the latent space vectors, following10 and, we
adopted a classifier based on the random forest
method.31 The random forest model is trained to
classify two separate types of data: conformations extracted from the
REMD data, conformations with added random fluctuations to the nuclear
coordinates. We use the atomic coordinates of the residues for training
purposes. Among the configurations generated by the decoder, feasible
configurations are first selected through the random forest classifier
and further the configurations above a 7 Å threshold with respect to the
RMSF distance are eliminated.
RESULTS and DISCUSSION
Figure 1 represents a set of the selected structures of the MERS-CoV
macro domain in aqueous media that we obtained from the all-atom REMD
simulations at 310 K replica and from deep learning. While presence and
proportions of specific secondary structures are similar across these
conformations, our results reveal a remarkable structural pliability of
this protein.
The grey line in Figure 2 presents the calculated RMSF values for each
residue of the MERS-CoV macro domain in aqueous media at 310 K with
dynamics at the atomic level. Based on these values, we notice more
significant fluctuations (higher flexibility) in the C-terminal region
of the domain even with such an extensive REMD simulation. The average
RMSF value for the macro domain (all residues) is 1.09 ± 0.48 Å.
However, the most flexible residue are characterized by the RMSF values
of up to 3.62 Å.
Figure 2A shows that some of the structural dynamic features observed in
our MD simulations are correlated with the residue-level intrinsic
disorder predisposition of the MERS-CoV macro domain. This is reflected
in the fact that several peaks in disorder profile serve as envelopes
that enclose the local RMSF peaks. However, there also some regions
(e.g. residues 18-38), which are predicted as ordered but which show
noticeable structural fluctuations. This indicates that part of the
structural fluctuations of the MERS-CoV macro domain in aqueous medium
can be rooted in the intrinsic disorder predisposition of this domain,
whereas other structural fluctuations are independent of the intrinsic
disorder predisposition.
We also assess propensity of this protein to interact with other
proteins and nucleic acids interactions. Like for the disorder analysis,
we annotate these interactions at the level of individual amino acids.
Figure 2B illustrates that the MERS-CoV macro domain is expected to have
several protein binding regions, such as residues 1-12, 32, 43-47, 51,
86-88, 133-134, 137-144, 147, and 162-168. The predicted likelihood of
the protein-protein interactions for these residues exceeds the 0.5
threshold. Some of these protein-binding residues are located within the
disordered or flexible regions; i.e., regions characterized by the
predicted disorder score exceeding 0.5 or ranging from 0.2 to 0.5,
respectively. Curiously, although all highly flexible residues coincided
or were located in the close proximity to the protein-binding
regions/residues, not all regions with the highest protein binding
potential were characterized by the highest RMSF values. Moreover, our
residue-level analysis did not find any DNA- or RNA-binding regions in
the MERS-CoV macro domain (see Figure 2B).
Dashed purple lines denote the RMSF values obtained through the
generational autoencoder NN model. We note that they follow the REMD
RMSF values quite closely on a residue basis. This indicates the
viability of utilizing the NN model for the purposes of generating
viable conformations.
Figure 3 presents the results that we generate with the k -means
clustering of the structures of MERS-CoV macro domain in water with
dynamics from REMD simulations and from deep learning. We base this
calculation on the radius of gyration (R g) and
end-to-end distance (R EE) values. Based on these
results, the global compactness of the MERS-CoV macro domain structures
(R g) found in our experiments varies only by 1.0
Å, whereas the R EE values fluctuate between 15 Å
and 30 Å. The average R g value of the MERS-CoV
macro domain is 15.27 ± 0.08 Å from our REMD simulations and this value
becomes 15.11 ± 0.45 Å using deep learning. The averageR EE values is 22.35 ± 1.72 Å in water from our
REMD simulations. This value is 21.89 ± 2.29 Å using the trajectories
from deep learning. Experimental structural studies on MERS CoV macro
domain in solution are extremely limited and therefore we could not
compare these results to data generated by the experiments. However, we
use a set of independently computed residue-level predictions and the
secondary structure analysis based on an NMR structure to contextualize
and compare with our all-atom results.
Figures 4 and 5 display the residue-level secondary structure
probabilities that we predicted from the MERS-CoV macro domain in water
at 310 K using the trajectories from REMD simulations and those from
deep learning. Based on these calculations, we found six α-helices
(Figure 4A and Figure 5A) in the MERS-CoV macro domain structures. They
are located at residues Ala25-Cys31, Gly50-Ser59, Ala62-Lys74,
Ser109-Met118, Pro138-Glu148, and Gln160-Leu166. Some of these helices,
especially those located within the C-terminal half of the domain, were
predicted to include the protein binding residues. While the six
α-helices were also observed in the NMR measurements, they also annotate
adjacent residues as helical, but this might be related to the buffer
used in these experiments.32 The abundance of the
helical structures formed in our REMD simulations and NN in water are,
in general, higher than those reported in the NMR
measurements.32
The location of the 310-helices – displayed in Figures
4B and 5B – are formed at Ala102-Ala104 and Val108-Ser112 regions of
the MERS-CoV macro domain in water at 310 K. The NMR experiments did not
measure the 310-helix content for this protein.
Correspondingly, we also note that only a narrow region of the sequence
of this domain adopts the 310-helical structure in
water. One of these 310-helices (residues
Val108-Ser112), overlaps with one of the α-helices (residues
Ser109-Met118).
Figures 4C and 5C summarize the calculated β-sheet content for the macro
domain in water. We note prominent β-strand formation in seven regions.
Namely, these regions are His11-Thr15, Val18-Leu22, Val36-Ala41,
Asp81-Leu85, Asn93-Val98, Leu123-Thr126, and Arg152-Val157. The NMR
measurements similarly assigned seven similar regions for β-strand
formation. However, like for the helices, residues adjacent to these
regions were also found to adopt β-strand structure to some extent based
on these measurements. The TALOS index measurements support our
findings.32
Finally, we calculated the turn structure content per residue (Figures
4D and 5D). This structural element was not analyzed in the NMR
experiments. As per our analysis, there are ten regions that adopt turn
structure. Specifically, turn structure was detected at Met4-Phe9,
Glu16-Cys17, Tyr32-Ser35, Asn42-Leu45, Lys60-Gly61, Gln78-Gly80,
Gly87-Lys92, Asp101-Lys105, Asn119-Pro122, and Leu128 -Gly135 of the
MERS-CoV macro domain in water. Many of the protein binding
residues/regions (e.g., residues 1-12, 32, 43-47, 86-88, and 133-134)
overlap with the regions with turn structure.
The estimated secondary structure propensities per residue using the
trajectories obtained from deep learning (Figure 5) are in excellent
accord with our REMD results (Figure 4, see above).
CONCLUSION
We conduct REMD simulations linked to deep learning at the nano level
for the MERS CoV macro domain in water and present here the results for
the 310 K replica. We cover several structural properties including RMSF
values with dynamics, secondary structure, and the k -means
clustering based on radius of gyration (Rg ) and
end-to-end distance (REE ) of the structures of
MERS CoV macro domain in water with dynamics. Our findings, which rely
on the RMSF values, REE values and deviations,
show that some of the residues are flexible. Furthermore, the global
structure is compact, not very flexible, and varies only by 1.0 Å in
water (in terms of the scale of Rg fluctuations).
We detected six α-helical regions and seven β-strand regions, which are
in good agreement with the available NMR measurements. In addition, we
notice ten regions with turn structure in the computed here
conformations of MERS CoV macro domain in water with dynamics.
Based on the results of the comparison of the independently generated
intrinsic disorder analysis of the MERS-CoV macro domain with the REMD
and deep learning analyses, we also show that only part of the
structural fluctuations of this protein in aqueous medium can be
attributed the local intrinsic disorder predisposition. The other
structural fluctuations are independent of the local propensity of the
MERS-CoV macro domain to the intrinsic disorder.
Our residue-level analysis provides some functional clues. Based on
putative propensities for protein and nucleic acids interactions, we
suggest that while the MERS-CoV macro domain appears not to show DNA- or
RNA-binding potential, it contains several protein binding regions. Many
of the corresponding PBRs are located within the disordered or flexible
regions. Also, some PBRs overlap with the regions with the turn
structure. Furthermore, some of the α-helices found in the MERS-CoV
macro domain, especially located within the C-terminal half of the
protein, were predicted to contain PBRs.
We studied the structures of MERS-CoV in water using generative networks
and auto-encoders linked to REMD simulations. The trajectories obtained
from generative networks and auto-encoders yield structural results for
MERS-CoV in full agreement with our extensive special sampling
simulations.
The reported here results can be used to support activities associated
with the development of new MERS-CoV treatments including vaccines and
drug molecules. Currently, we are studying the structural dynamics of
various regions of different proteins from the CoV family ranging from
SARS-CoV to SARS-CoV-2 with MERS-CoV in between.
SUPPORTING MATERIALS SECTION AVAILABLE: Time dependent RMSD values for
MERS CoV macro domain in water from our REMD simulations at 310 K.
Acknowledgements: The authors acknowledge TRUBA resources because the
numerical calculations reported in this study were performed at TUBITAK
ULAKBIM, High Performance and Grid Computing Center (TRUBA resources).
REFERENCES
(1) Schwarz, K.; Groneberg, D. A. [Current overview on MERS-CoV].Zentralblatt Arbeitsmedizin Arbeitsschutz Ergon. 2015, 65(6), 353–354. https://doi.org/10.1007/s40664-015-0062-8.
(2) Subbaram, K.; Kannan, H.; Khalil Gatasheh, M. Emerging Developments
on Pathogenicity, Molecular Virulence, Epidemiology and Clinical
Symptoms of Current Middle East Respiratory Syndrome Coronavirus
(MERS-CoV). Hayati J. Biosci. 2017, 24 (2), 53–56.
https://doi.org/10.1016/j.hjb.2017.08.001.
(3) Ford, N.; Vitoria, M.; Rangaraj, A.; Norris, S. L.; Calmy, A.;
Doherty, M. Systematic Review of the Efficacy and Safety of
Antiretroviral Drugs against SARS, MERS or COVID‐19: Initial Assessment.J. Int. AIDS Soc. 2020, 23 (4).
https://doi.org/10.1002/jia2.25489.
(4) Cho, C.-C.; Lin, M.-H.; Chuang, C.-Y.; Hsu, C.-H. Macro Domain from
Middle East Respiratory Syndrome Coronavirus (MERS-CoV) Is an Efficient
ADP-Ribose Binding Module: CRYSTAL STRUCTURE AND BIOCHEMICAL STUDIES.J. Biol. Chem. 2016, 291 (10), 4894–4902.
https://doi.org/10.1074/jbc.M115.700542.
(5) Aasiyah Chafekar; Burtram Fielding. MERS-CoV: Understanding the
Latest Human Coronavirus Threat. Viruses 2018, 10 (2), 93.
https://doi.org/10.3390/v10020093.
(6) Snijder, E. J.; Bredenbeek, P. J.; Dobbe, J. C.; Thiel, V.; Ziebuhr,
J.; Poon, L. L. M.; Guan, Y.; Rozanov, M.; Spaan, W. J. M.; Gorbalenya,
A. E. Unique and Conserved Features of Genome and Proteome of
SARS-Coronavirus, an Early Split-off from the Coronavirus Group 2
Lineage. J. Mol. Biol. 2003, 331 (5), 991–1004.
https://doi.org/10.1016/s0022-2836(03)00865-9.
(7) Neuman, B. W.; Buchmeier, M. J. Supramolecular Architecture of the
Coronavirus Particle. In Advances in Virus Research ; Elsevier,
2016; Vol. 96, pp 1–27. https://doi.org/10.1016/bs.aivir.2016.08.005.
(8) Forni, D.; Cagliani, R.; Mozzi, A.; Pozzoli, U.; Al-Daghri, N.;
Clerici, M.; Sironi, M. Extensive Positive Selection Drives the
Evolution of Nonstructural Proteins in Lineage C Betacoronaviruses.J. Virol. 2016, 90 (7), 3627–3639.
https://doi.org/10.1128/JVI.02988-15.
(9) Hinton, G. E. Reducing the Dimensionality of Data with Neural
Networks. Science 2006, 313 (5786), 504–507.
https://doi.org/10.1126/science.1127647.
(10) Degiacomi, M. T. Coupling Molecular Dynamics and Deep Learning to
Mine Protein Conformational Space. Structure 2019, 27 (6),
1034-1040.e3. https://doi.org/10.1016/j.str.2019.03.018.
(11) Ma, H.; Bhowmik, D.; Lee, H.; Turilli, M.; Young, M. T.; Jha, S.;
Ramanathan, A. Deep Generative Model Driven Protein Folding Simulation.ArXiv190800496 Q-Bio 2019.
(12) Geng, H.; Chen, F.; Ye, J.; Jiang, F. Applications of Molecular
Dynamics Simulation in Structure Prediction of Peptides and Proteins.Comput. Struct. Biotechnol. J. 2019, 17 , 1162–1170.
https://doi.org/10.1016/j.csbj.2019.07.010.
(13) Huang, J.; MacKerell, A. D. CHARMM36 All-Atom Additive Protein
Force Field: Validation Based on Comparison to NMR Data. J.
Comput. Chem. 2013, 34 (25), 2135–2145.
https://doi.org/10.1002/jcc.23354.
(14) Jorgensen, W. L.; Chandrasekhar, J.; Madura, J. D.; Impey, R. W.;
Klein, M. L. Comparison of Simple Potential Functions for Simulating
Liquid Water. J. Chem. Phys. 1983, 79 (2), 926–935.
https://doi.org/10.1063/1.445869.
(15) Berendsen, H. J. C.; van der Spoel, D.; van Drunen, R. GROMACS: A
Message-Passing Parallel Molecular Dynamics Implementation.Comput. Phys. Commun. 1995, 91 (1–3), 43–56.
https://doi.org/10.1016/0010-4655(95)00042-E.
(16) Wise-Scira, O.; Aloglu, A. K.; Dunn, A.; Sakallioglu, I. T.;
Coskuner, O. Structures and Free Energy Landscapes of the Wild-Type and
A30P Mutant-Type α-Synuclein Proteins with Dynamics. ACS Chem.
Neurosci. 2013, 4 (3), 486–497.
https://doi.org/10.1021/cn300198q.
(17) Coskuner, O.; Wise-Scira, O. Arginine and Disordered Amyloid-β
Peptide Structures: Molecular Level Insights into the Toxicity in
Alzheimer’s Disease. ACS Chem. Neurosci. 2013, 4 (12),
1549–1558. https://doi.org/10.1021/cn4001389.
(18) Coskuner, O.; Uversky, V. N. Tyrosine Regulates β-Sheet Structure
Formation in Amyloid-β 42 : A New Clustering Algorithm
for Disordered Proteins. J. Chem. Inf. Model. 2017, 57(6), 1342–1358. https://doi.org/10.1021/acs.jcim.6b00761.
(19) Kabsch, W.; Sander, C. Dictionary of Protein Secondary Structure:
Pattern Recognition of Hydrogen-Bonded and Geometrical Features.Biopolymers 1983, 22 (12), 2577–2637.
https://doi.org/10.1002/bip.360221211.
(20) Likas, A.; Vlassis, N.; J. Verbeek, J. The Global K-Means
Clustering Algorithm. Pattern Recognit. 2003, 36 (2),
451–461. https://doi.org/10.1016/S0031-3203(02)00060-2.
(21) Romero, P.; Obradovic, Z.; Li, X.; Garner, E. C.; Brown, C. J.;
Dunker, A. K. Sequence Complexity of Disordered Protein. Proteins2001, 42 (1), 38–48.
https://doi.org/10.1002/1097-0134(20010101)42:1<38::aid-prot50>3.0.co;2-3.
(22) Peng, K.; Radivojac, P.; Vucetic, S.; Dunker, A. K.; Obradovic, Z.
Length-Dependent Prediction of Protein Intrinsic Disorder. BMC
Bioinformatics 2006, 7 , 208.
https://doi.org/10.1186/1471-2105-7-208.
(23) Xue, B.; Dunbrack, R. L.; Williams, R. W.; Dunker, A. K.; Uversky,
V. N. PONDR-FIT: A Meta-Predictor of Intrinsically Disordered Amino
Acids. Biochim. Biophys. Acta 2010, 1804 (4), 996–1010.
https://doi.org/10.1016/j.bbapap.2010.01.011.
(24) Dosztányi, Z.; Csizmok, V.; Tompa, P.; Simon, I. IUPred: Web Server
for the Prediction of Intrinsically Unstructured Regions of Proteins
Based on Estimated Energy Content. Bioinforma. Oxf. Engl. 2005,21 (16), 3433–3434.
https://doi.org/10.1093/bioinformatics/bti541.
(25) Dosztányi, Z.; Csizmók, V.; Tompa, P.; Simon, I. The Pairwise
Energy Content Estimated from Amino Acid Composition Discriminates
between Folded and Intrinsically Unstructured Proteins. J. Mol.
Biol. 2005, 347 (4), 827–839.
https://doi.org/10.1016/j.jmb.2005.01.071.
(26) Mészáros, B.; Erdos, G.; Dosztányi, Z. IUPred2A: Context-Dependent
Prediction of Protein Disorder as a Function of Redox State and Protein
Binding. Nucleic Acids Res. 2018, 46 (W1), W329–W337.
https://doi.org/10.1093/nar/gky384.
(27) Zhang, J.; Kurgan, L. SCRIBER: Accurate and Partner Type-Specific
Prediction of Protein-Binding Residues from Proteins Sequences.Bioinforma. Oxf. Engl. 2019, 35 (14), i343–i353.
https://doi.org/10.1093/bioinformatics/btz324.
(28) Zhang, J.; Kurgan, L. Review and Comparative Assessment of
Sequence-Based Predictors of Protein-Binding Residues. Brief.
Bioinform. 2018, 19 (5), 821–837.
https://doi.org/10.1093/bib/bbx022.
(29) Yan, J.; Kurgan, L. DRNApred, Fast Sequence-Based Method That
Accurately Predicts and Discriminates DNA- and RNA-Binding Residues.Nucleic Acids Res. 2017, 45 (10), e84.
https://doi.org/10.1093/nar/gkx059.
(30) Su, H.; Liu, M.; Sun, S.; Peng, Z.; Yang, J. Improving the
Prediction of Protein-Nucleic Acids Binding Residues via Multiple
Sequence Profiles and the Consensus of Complementary Methods.Bioinforma. Oxf. Engl. 2019, 35 (6), 930–936.
https://doi.org/10.1093/bioinformatics/bty756.
(31) Zhavoronkov, A.; Aladinskiy, V. A.; Zhebrak, A.; Zagribelnyy, B.
A.; Terentiev, V. A.; Bezrukov, D.; Polykovskiy, D.; Shayakhmetov, R.;
Filimonov, A.; Orekhov, P.; Yilin Yan; Popova, O.; Vanhaelen, Q.;
Aliper, A.; Ivanenkov, Y. A. Potential 2019-NCoV 3C-like Protease
Inhibitors Designed Using Generative Deep Learning Approaches. 2020.
https://doi.org/10.13140/RG.2.2.29899.54569.
(32) Huang, Y.-P.; Cho, C.-C.; Chang, C.-F.; Hsu, C.-H. NMR Assignments
of the Macro Domain from Middle East Respiratory Syndrome Coronavirus
(MERS-CoV). Biomol. NMR Assign. 2016, 10 (2), 245–248.
https://doi.org/10.1007/s12104-016-9676-9.
Figure Legends
Scheme 1. Generative auto-encoder Neural Network model. The
encoder input layer consists of 3N neurons corresponding to the number
of coordinates in the training data. The two hidden layers contain 300
and 50 neurons. The output layer consisting of two neurons finally
encode and compress the original conformation to two real numbers in the
two-dimensional latent space. The decoder in reverse order maps the
points in the latent space back to the conformation space.
Figure 1. Selected structures from our simulations and deep learning
representing conformations of the MERS-CoV macro domain in aqueous
media.
Figure 2. Comparison of the structural flexibility of MERS-CoV macro
domain in aqueous media with its intrinsic disorder predisposition (A)
and propensity for protein and nucleic acid binding (B). Structural
flexibility in the aqueous media is reflected in root mean square
fluctuations (RMSF) of the protein backbone as a function of the
MERS-CoV macro domain residue number. Intrinsic disorder predisposition
was evaluated using PONDR® VLXT,
PONDR® VSL2, PONDR® FIT,
IUPred_short, and IUPres_long (A). Predisposition of this domain to
interact with proteins and nucleic acids was evaluated by SCRIBER and
DRNApred, respectively (B).
Figure 3. A) R g vs .R ee values of the MERS-CoV macro domain in
solution from REMD simulations that we processed with the k means
clustering. 5 k values were used and centroids are located atR g = 15.24 Å, R ee = 17.69
Å (Centroid1), R g = 15.26 Å,R ee = 24.10 Å (Centroid 2),R g = 15.26 Å, R ee = 21.21
Å (Centroid 3), R g = 15.26 Å,R ee = 22.17 Å (Centroid 4), andR g = 15.26 Å, R ee = 22.95
Å (Centroid 5). B) Rg vs. Ree values of the MERS-CoV macro domain in
solution from deep learning that we processed with the k means
clustering. 5 k values were used and centroids are located at Rg = 15.12
Å, Ree = 23.36 Å (Centroid1), Rg = 15.19 Å, Ree = 21.87 Å (Centroid 2),
Rg = 14.92 Å, Ree = 19.24 Å (Centroid 3), Rg = 15.18 Å, Ree = 24.98 Å
(Centroid 4), and Rg = 15.10 Å, Ree = 17.80 Å (Centroid 5).
Figure 4. Secondary structure elements and their residue-level
probabilities recovered from the MERS-CoV macro domain structures
calculated with dynamics at the atomic level in aqueous media from REMD
simulations.
Figure 5. Secondary structure elements and their residue-level
probabilities recovered from the MERS-CoV macro domain structures
calculated with dynamics at the atomic level in aqueous media from deep
learning.
FIGURES
Scheme 1.