Predicting Peptide-MHC Binding Affinity With Imputed Training Data and Recurrent Neural Networks

Predicting the binding affinity between peptides (short amino acid sequences) and MHC proteins has emerged as a central problem in computational immunology due to its importance in determining the targets of T-cell mediated immune activity. An individual’s poly-clonal collection of T-cells is able to kill infected and cancerous cells while protecting healthy ones. This amazing feat is achieved through the winnowing and expansion of T-cell sub-populations possessing highly specific T-cell receptors (TCRs) (Blackman 1990). A distinct T-cell receptor recognizes a small number of similar peptides bound to an MHC molecule on the surface of a cell (Huseby 2005). Peptide-MHC binding is one of the most restrictive steps in “antigen processing” (Cresswell 2005) and is thus essential for determining which amino acid sequences can potentially trigger various T-cell responses.

Early approaches to peptide-MHC binding prediction focused on “sequence motifs”(Sette 1989), followed by regularized linear models, linear models with interaction terms such as SMM with pairwise features (Peters 2003). More recently, methods based on ensembles of shallow neural networks (Lundegaard 2008, Nielsen 2007) have become common tools in computational virology (Lund 2011), tumor immunology (Gubin 2015), and autoimmunity (Abreu 2012). Existing predictors work by encoding amino acid sequences as fixed length vectors using predefined amino acid features. In this poster we delineate several flavors of the peptide-MHC binding problem (i.e. allele-specific vs. pan-allele) and present the following improvements over the current generation of peptide-MHC binding predictors:

  • Learning vector embeddings for amino acids as part of training instead of using predefined features.

  • Generating synthetic data using imputation to train models for alleles with few training samples.

  • Replacing fixed length vector encodings with recurrent neural networks to make better predictions across a broader range of sequence lengths.


  1. M. K. Anderson, R. Pant, A. L. Miracle, X. Sun, C. A. Luer, C. J. Walsh, J. C. Telfer, G. W. Litman, E. V. Rothenberg. Evolutionary Origins of Lymphocytes: Ensembles of T Cell and B Cell Transcriptional Regulators in a Cartilaginous Fish. The Journal of Immunology 172, 5851–5860 The American Association of Immunologists, 2004. Link

  2. M Blackman, J Kappler, P Marrack. The role of the T cell receptor in positive and negative selection of developing T cells. Science 248, 1335–1341 American Association for the Advancement of Science (AAAS), 1990. Link

  3. Eric S. Huseby, Janice White, Frances Crawford, Tibor Vass, Dean Becker, Clemencia Pinilla, Philippa Marrack, John W. Kappler. How the T Cell Repertoire Becomes Peptide and MHC Specific. Cell 122, 247–260 Elsevier BV, 2005. Link

  4. Peter Cresswell, Anne L. Ackerman, Alessandra Giodini, David R. Peaper, Pamela A. Wearsch. Mechanisms of MHC class I-restricted antigen processing and cross-presentation. Immunol Rev 207, 145–157 Wiley-Blackwell, 2005. Link

  5. A. Sette, S. Buus, E. Appella, J. A. Smith, R. Chesnut, C. Miles, S. M. Colon, H. M. Grey. Prediction of major histocompatibility complex binding regions of protein antigens by sequence pattern analysis.. Proceedings of the National Academy of Sciences 86, 3296–3300 Proceedings of the National Academy of Sciences, 1989. Link

  6. B. Peters, W. Tong, J. Sidney, A. Sette, Z. Weng. Examining the independent binding assumption for binding of peptide epitopes to MHC-I molecules. Bioinformatics 19, 1765–1772 Oxford University Press (OUP), 2003. Link

  7. C. Lundegaard, K. Lamberth, M. Harndahl, S. Buus, O. Lund, M. Nielsen. NetMHC-3.0: accurate web accessible predictions of human mouse and monkey MHC class I affinities for peptides of length 8-11. Nucleic Acids Research 36, W509–W512 Oxford University Press (OUP), 2008. Link

  8. Morten Nielsen, Claus Lundegaard, Thomas Blicher, Kasper Lamberth, Mikkel Harndahl, Sune Justesen, Gustav Røder, Bjoern Peters, Alessandro Sette, Ole Lund, Søren Buus. NetMHCpan a Method for Quantitative Predictions of Peptide Binding to Any HLA-A and -B Locus Protein of Known Sequence. PLoS ONE 2, e796 Public Library of Science (PLoS), 2007. Link

  9. Ole Lund, Eduardo J. M. Nascimento, Milton Maciel, Morten Nielsen, Mette Voldby Larsen, Claus Lundegaard, Mikkel Harndahl, Kasper Lamberth, Søren Buus, Jérôme Salmon, Thomas J. August, Ernesto T. A. Marques. Human Leukocyte Antigen (HLA) Class I Restricted Epitope Discovery in Yellow Fewer and Dengue Viruses: Importance of HLA Binding Strength. PLoS ONE 6, e26494 Public Library of Science (PLoS), 2011. Link

  10. Matthew M. Gubin, Maxim N. Artyomov, Elaine R. Mardis, Robert D. Schreiber. Tumor neoantigens: building a framework for personalized cancer immunotherapy. Journal of Clinical Investigation 125, 3413–3421 American Society for Clinical Investigation, 2015. Link

  11. J. R. F. Abreu, S. Martina, A. A. Verrijn Stuart, Y. E. Fillié, K. L. M. C. Franken, J. W. Drijfhout, B. O. Roep. CD8 T cell autoreactivity to preproinsulin epitopes with very low human leucocyte antigen class I binding affinity. Clinical & Experimental Immunology 170, 57–65 Wiley-Blackwell, 2012. Link

[Someone else is editing this]

You are editing this file