Peptide spectral library
A peptide spectral library is a curated, annotated and non-redundant collection/database of LC-MS/MS peptide spectra. One essential utility of a peptide spectral library is to serve as consensus templates supporting the identification of peptide/proteins based on the correlation between the templates with experimental spectra. The process of peptide/protein identification is called spectral library searching. Compared to the traditional peptide spectra identification approach, sequence database searching, spectral library searching offers many unique benefits.
Spectral libraries have been used in the small molecules mass spectra identification since the 1980s.[1] In the early years of shotgun proteomics, pioneer investigations suggested that a similar approach might be applicable in shotgun proteomics for peptide/protein identification.[2] But until recent years, with the availability of millions of confidently identified MS/MS spectra, the implementation of peptide spectral libraries shows practical value.
Shotgun proteomics
Modern tandem MS instruments combine features of fast duty cycle, exquisite sensitivity, and unprecedented mass accuracy. Tandem mass spectrometry, which is an ideal match for the large-scale protein identification and quantification in complex biological systems. In a shotgun proteomics approach, proteins in a complex mixture are digested by proteolytic enzymes such as trypsin. Subsequently, one or more chromatographic separations are applied to resolve resulting peptides, which are then ionized and analyzed in a mass spectrometer. To acquire tandem mass spectra, a particular peptide precursor is isolated, and fragmented in a mass spectrometer; the mass spectra corresponding to the fragments of peptide precursor is recorded. Tandem mass spectra contains specific information regarding the sequence of the peptide precursor, which can aid the identification of peptide/protein.
Protein identification via sequence database searching
Sequence database searching is widely used currently for mass spectra based protein identification. In this approach a protein sequence database is used to calculate all putative peptide candidates in the given setting (proteolytic enzymes, miscleavages, post-translational modifications). The sequence search engines use various heuristics to predict the fragmentation pattern of each peptide candidate. Such derivative patterns are used as templates to find a sufficiently close match within experimental mass spectra, which serves as the basis for peptide/protein identification. Many tools have been developed for this practice, which have enabled many past discoveries, e.g. SEQUEST,[3] Mascot.[4]
Shortcomings of the sequence database searching workflow
Due to the complex nature of peptide fragmentation in a mass spectrometer, derivative fragmentation patterns fall short of reproducing experimental mass spectra, especially relative intensities among distinct fragments. Thus, sequence database searching faces a bottleneck of limited specificity. Sequence database searching also demands vast search space, which still could not cover all possibilities of peptide dynamics, exhibiting limited efficiency post-translational modifications). The search process is sometimes slow and requires costly high-performance computers. In addition, the nature of sequence database searching disconnects the research discoveries among different groups or at different times.
Advantages and limitations
First, a greatly reduced search space will decrease the searching time. Second, by taking full advantage of all spectral features including relative fragment intensities, neutral losses from fragments and various additional specific fragments, the process of spectra searching will be more specific, and it will generally provide better discrimination between true and false matches.
Spectral library searching is not applicable in a situation where the discovery of novel peptides or proteins is the goal. Fortunately, more and more high-quality mass spectra are being acquired by the collective contribution of the scientific community, which will continuously expand the coverage of peptide spectral library.
Research community focused peptide spectral Libraries
For a peptide spectra library, to reach a maximal coverage is a long-term goal, even with the support of scientific community and ever-growing proteomic technologies. However, the optimization for a particular module of the peptide spectra library is a more manageable goal, e.g. the proteins in a particular organelle or relevant to a particular biological phenotype. For example, a researcher studying mitochondrial proteome, will likely focus his/her analyses within protein modules within the mitochondria. The research community focused peptide spectral library supports targeted research in a comprehensive fashion for a particular research community.
Cardiac-Specific Organellar Protein Atlas Knowledgebase
The Cardiac-Specific Organellar Protein Atlas Knowledgebase (COPaKB) is being developed in an international consortium, which aims at providing a specialized, comprehensive, and interactive support to the cardiovascular community in protein biology at a systems scale.[5] In this knowledgebase, orthogonal sets of proteomic knowledge are integrated, including mass spectra datasets, image datasets, gene-based datasets and clinical datasets. In a data federation framework, users can access all these datasets from a single web server. This spectral library component of it is built upon a modular infrastructure, which offers flexibilities to users and future updates. In the first release launched in July 2011,[6] it included human mitochondrial module, mouse mitochondrial module, human proteasome module and mouse proteasome module.[7] With the partnership with the scientific community, these modules will be continuously updated and new modules will also be integrated in the upcoming releases.
References
- ↑ Domokos, L., Hennberg, D., and Weimann, B. 1984. Computer-aided identification of compounds by comparison of mass spectra. Anal. Chim. Acta 165:61-74.
- ↑ Yates, J.R., 3rd, Morgan, S.F., Gatlin, C.L., Griffin, P.R., and Eng, J.K. 1998. Method to compare collision-induced dissociation spectra of peptides: Potential for library searching and subtractvie analysis. Anal. Chem., 70:3557-3565.
- ↑ Eng,J.K. et al. (1994) An approach to correlate tandem mass-spectral data of peptides with amino-acid-sequences in a protein database. J. Am. Soc. Mass Spectrometry, 5,976-989.
- ↑ Perkins, D.N. et al. (1999) Probability-based protein identification by searching sequence database using mass spectrometry data. Electrophoresis, 20, 3551-3567.
- ↑ "Peptide Spectral Library". Neo Bio Lab Research. Retrieved 3 August 2015.
Cardiac Orangellar Protein Atlas Knowledgebase (COPaKB) has worked to create a comprehensive database of cardiovascular instigators. This project was initially founded by the NHLBI Porteomics Centers and European Bioinformatics Institute, Scripps Research Institute, Royal Institute of Technology, University of California, Los Angeles, Zhejiang University and Beijing Genomics Institute. The goal of this project is to create a comprehensive listing of cardiac proteome dynamics that can encourage collaborative efforts between individual research entities worldwide. These goals are addressed by creating a guide to results of previous studies, following up on results from ongoing analyses that can be combined with previous research and enabling a Wiki component that allows researchers from a variety of backgrounds access to these materials as well as the opportunity to participate in the growth of the project.
- ↑ "News: UCLA Proteomics Center Launched COPa Knowledgebase (COPaKB) Version 1.0". National Heart Lung and Blood Institute. Retrieved 3 August 2015.
- ↑ "The Modular Structure of COPaKB". Retrieved 3 August 2015.