Re-analysis of ProteomicsDB using an accurate, sensitive and scalable false discovery rate estimation approach for protein groups

The, Matthew; Samaras, Patroklos; Kuster, Bernhard; Wilhelm, Mathias

doi:10.1016/j.mcpro.2022.100437

10.1016/j.mcpro.2022.100437

Titel:: Re-analysis of ProteomicsDB using an accurate, sensitive and scalable false discovery rate estimation approach for protein groups
Dokumenttyp:: Zeitschriftenaufsatz
Autor(en):: The, Matthew; Samaras, Patroklos; Kuster, Bernhard; Wilhelm, Mathias
Abstract:: Estimating false discovery rates (FDRs) of protein identification continues to be an important topic in mass spectrometry-based proteomics, particularly when analyzing very large data sets. One performant method for this purpose is the Picked Protein FDR approach which is based on a target-decoy competition strategy on the protein level that ensures that FDRs scale to large data sets. Here, we present an extension to this method that can also deal with protein groups, i.e. proteins that share common peptides such as protein isoforms of the same gene. To obtain well-calibrated FDR estimates that preserve protein identification sensitivity, we introduce two novel ideas. First, the picked group target-decoy and, second, the rescued subset grouping strategies. Using entrapment searches and simulated data for validation, we demonstrate that the new Picked Protein Group FDR method produces accurate protein group-level FDR estimates regardless of the size of the data set. The validation analysis also uncovered that applying the commonly used Occam's razor principle leads to anti-conservative FDR estimates for large datasets. This is not the case for the Picked Protein Group FDR method. Re-analysis of deep proteomes of 29 human tissues showed that the new method identified up to 4% more protein groups than MaxQuant. Applying the method to the re-analysis of the entire human section of ProteomicsDB, led to the identification of 18,000 protein groups at 1% protein group-level FDR. The analysis also showed that about 1,250 genes were represented by ≥2 identified protein groups. To make the method accessible to the proteomics community, we provide a software tool including a graphical user interface that enables merging results from multiple MaxQuant searches into a single list of identified and quantified protein groups. «
Estimating false discovery rates (FDRs) of protein identification continues to be an important topic in mass spectrometry-based proteomics, particularly when analyzing very large data sets. One performant method for this purpose is the Picked Protein FDR approach which is based on a target-decoy competition strategy on the protein level that ensures that FDRs scale to large data sets. Here, we present an extension to this method that can also deal with protein groups, i.e. proteins that share co... »
Stichworte:: BayBioMS; Large-scale proteomics; ProteomicsDB; picked protein FDR; protein false discovery rate estimation; protein inference.
Zeitschriftentitel:: Molecular & Cellular Proteomics
Jahr:: 2022
Seitenangaben Beitrag:: 100437
Volltext / DOI:: doi:10.1016/j.mcpro.2022.100437
Verlag / Institution:: Elsevier BV
E-ISSN:: 1535-9476
Publikationsdatum:: 01.11.2022
BibTeX

Vorkommen:

mediaTUM Gesamtbestand Hochschulbibliographie 2022 Schools und Fakultäten TUM School of Life Sciences Bayerisches Zentrum für Biomolekulare Massenspektrometrie (BayBioMS)

mediaTUM Gesamtbestand Einrichtungen Schools TUM School of Life Sciences Zentren Bayerisches Zentrum für Biomolekulare Massenspektrometrie (BayBioMS)Jahr 2022