Statement of Significance
LC-MS/MS-based proteomics is continuously advancing, allowing
redefinition of disease at the molecular scale, transforming curative
medicine to preventive and personalized medicine. While there are
numerous large, annotated spectral databases available for the
development of new bioinformatic tools focused on MS2data, the same cannot be said for research focused on
MS1 data. In the MS1 setting, each
spectrum contains multiple peptides, and the primary interest often lies
in their isotope distributions. However, extracting this information is
not a straightforward task. Therefore, we propose a method to extract
these isotope distributions combined with other important
MS1 features in a PSM data-driven manner and summarize
them in a standardized format, creating an MS1 isotope
distribution benchmark dataset. We applied this workflow on a proteomics
standard and demonstrated the results, showing a high similarity between
the extracted and theoretical isotope distributions. The workflow can be
applied in the future to further extend the benchmark dataset. The
dataset itself can act as the foundation to develop new bioinformatic
tools. The availability of an extensive MS1 isotope
distribution benchmark dataset will foster the development of innovative
bioinformatic tools, enabling researchers to unlock new insights and
further advance our understanding of molecular underpinnings of disease
pathology.