Changes in information on the number of human proteoforms, post-translational modification (PTM) events, alternative splicing (AS), single-amino acid polymorphisms (SAP) associated with protein-coding genes in the neXtProt database have been retrospectively analyzed. In 2016, our group proposed three mathematical models for predicting the number of different proteins (proteoforms) in the human proteome. Eight years later, we compared the original data of the information resources and their contribution to the prediction results, correlating the differences with new approaches to experimental and bioinformatic analysis of protein modifications. The aim of this work is to update information on the status of records in the databases of identified proteoforms since 2016, as well as to identify trends in changes in the quantities of these records. According to various information models, modern experimental methods may identify from 5 to 125 million different proteoforms: the proteins formed due to alternative splicing, the implementation of single nucleotide polymorphisms at the proteomic level, and post-translational modifications in various combinations. This result reflects an increase in the size of the human proteome by 20 or more times over the past 8 years.
Sarygina E.V., Kozlova A.S., Ponomarenko E.A., Ilgisonis E.V. (2024) The human proteome size as a technological development function. Biomeditsinskaya Khimiya, 70(5), 364-373.
Sarygina E.V. et al. The human proteome size as a technological development function // Biomeditsinskaya Khimiya. - 2024. - V. 70. -N 5. - P. 364-373.
Sarygina E.V. et al., "The human proteome size as a technological development function." Biomeditsinskaya Khimiya 70.5 (2024): 364-373.
Sarygina, E. V., Kozlova, A. S., Ponomarenko, E. A., Ilgisonis, E. V. (2024). The human proteome size as a technological development function. Biomeditsinskaya Khimiya, 70(5), 364-373.
References
Aebersold R., Agar J.N., Amster I.J., Baker M.S., Bertozzi C.R., Boja E.S., Costello C.E., Cravatt B.F., Fenselau C., Garcia B.A., Ge Y., Gunawardena J., Hendrickson R.C., Hergenrother P.J., Huber C.G., Ivanov A.R., Jensen O.N., Jewett M.C., Kelleher N.L., Kiessling L.L., Krogan N.J., Larsen M.R., Loo J.A., Ogorzalek Loo R.R., Lundberg E., MacCoss M.J., Mallick P., Mootha V.K., Mrksich M., Muir T.W., Patrie S.M., Pesavento J.J., Pitteri S.J., Rodriguez H., Saghatelian A., Sandoval W., Schlüter H., Sechi S., Slavoff S.A., Smith L.M., Snyder M.P., Thomas P.M., Uhlén M., van Eyk J.E., Vidal M., Walt D.R., White F.M., Williams E.R., Wohlschlager T., Wysocki V.H., Yates N.A., Young N.L., Zhang B. (2018) How many human proteoforms are there? Nat. Chem. Biol., 14(3), 206–214. CrossRef Scholar google search
Zhang F., Chen J.Y. (2016) A method for identifying discriminative isoform-specific peptides for clinical proteomics application. BMC Genomics, 17(Suppl 7), 522. CrossRef Scholar google search
Prabakaran S., Lippens G., Steen H., Gunawardena J. (2012) Post-translational modification: Nature’s escape from genetic imprisonment and the basis for dynamic information encoding. Wiley Interdiscip. Rev. Syst. Biol. Med., 4(6), 565–583. CrossRef Scholar google search
Schlüter H., Apweiler R., Holzhütter H.G., Jungblut P.R. (2009) Finding one’s way in proteomics: A protein species nomenclature. Chem. Cent. J., 3, 11. CrossRef Scholar google search
Smith L.M., Kelleher N.L., Consortium for Top Down Proteomics (2013) Proteoform: A single term describing protein complexity. Nat. Methods, 10(3), 186–187. CrossRef Scholar google search
Semba R.D., Enghild J.J., Venkatraman V., Dyrlund T.F., van Eyk J.E. (2013) The human eye proteome project: Perspectives on an emerging proteome. Proteomics, 13(16), 2500–2511. CrossRef Scholar google search
Wasinger V.C., Locke V.L., Raftery M.J., Larance M., Rothemund D., Liew A., Bate I., Guilhaus M. (2005) Two-dimensional liquid chromatography/tandem mass spectrometry analysis of Gradiflow fractionated native human plasma. Proteomics, 5(13), 3397–3401. CrossRef Scholar google search
Vavilov N., Ilgisonis E., Lisitsa A., Ponomarenko E., Farafonova T., Tikhonova O., Zgoda V., Archakov A. (2022) Number of detected proteins as the function of the sensitivity of proteomic technology in human liver cells. Curr. Protein Pept. Sci., 23(4), 290–298. CrossRef Scholar google search
Po A., Eyers C.E. (2023) Top-down proteomics and the challenges of true proteoform characterization. J. Proteome Res., 22(12), 3663–3675. CrossRef Scholar google search
Carvalho A.S., Penque D., Matthiesen R. (2015) Bottom up proteomics data analysis strategies to explore protein modifications and genomic variants. Proteomics, 15(11), 1789–1792. CrossRef Scholar google search
Ponomarenko E.A., Poverennaya E.V., Ilgisonis E.V., Pyatnitskiy M.A., Kopylov A.T., Zgoda V.G., Lisitsa A.V., Archakov A.I. (2016) The size of the human proteome: The width and depth. Int. J. Anal. Chem., 2016, 7436849. CrossRef Scholar google search
Lane L., Argoud-Puy G., Britan A., Cusin I., Duek P.D., Evalet O., Gateau A., Gaudet P., Gleizes A., Masselot A., Zwahlen C., Bairoch A. (2012) neXtProt: A knowledge platform for human proteins. Nucleic Acids Res., 40(Database issue), D76-D83. CrossRef Scholar google search
Lander E.S., Linton L.M., Birren B., Nusbaum C., Zody M.C., Baldwin J., Devon K., Dewar K., Doyle M., FitzHugh W., Funke R., Gage D., Harris K., Heaford A., Howland J., Kann L., Lehoczky J., LeVine R., McEwan P., McKernan K., Meldrim J., Mesirov J.P., Miranda C., Morris W., Naylor J., Raymond C., Rosetti M., Santos R., Sheridan A., Sougnez C., Stange-Thomann Y., Stojanovic N., Subramanian A., Wyman D., Rogers J., Sulston J., Ainscough R., Beck S., Bentley D., Burton J., Clee C., Carter N., Chen Y.J., Szustakowki J., International Human Genome Sequencing Consortium (2001) Initial sequencing and analysis of the human genome. Nature, 409(6822), 860–921. CrossRef Scholar google search
Ilgisonis E.V., Pogodin P.V., Kiseleva O.I., Tarbeeva S.N., Ponomarenko E.A. (2022) Evolution of protein functional annotation: Text mining study. J. Pers. Med., 12(3), 479. CrossRef Scholar google search
neXtProt downloads. FTP-server. Retrieved August 6, 2024, from: https://download.nextprot.org/pub/previous_releases. Scholar google search
Gaudet P., Argoud-Puy G., Cusin I., Duek P., Evalet O., Gateau A., Gleizes A., Pereira M., Zahn-Zabal M., Zwahlen C., Bairoch A., Lane L. (2013) neXtProt: Organizing protein knowledge in the context of human proteome projects. J. Proteome Res., 12(1), 293–298. CrossRef Scholar google search
Li Z., Li S., Luo M., Jhong J.H., Li W., Yao L., Pang Y., Wang Z., Wang R.., Ma R., Yu J., Huang Y., Zhu X., Cheng Q., Feng H., Zhang J., Wang C., Hsu J.B., Chang W.C., Wei F.X., Huang H.D., Lee T.Y. (2022) dbPTM in 2022: An updated database for exploring regulatory networks and functional associations of protein post-translational modifications. Nucleic Acids Res., 50(D1), D471–D479. CrossRef Scholar google search
Yang F., Wang C. (2020) Profiling of post-translational modifications by chemical and computational proteomics. Chem. Commun. (Cambridge), 56(88), 13506–13519. CrossRef Scholar google search
Santos A.L., Lindner A.B. (2017) Protein posttranslational modifications: roles in aging and age-related disease. Oxid. Med. Cell. Longev., 2017, 5716409. CrossRef Scholar google search
Basak S., Lu C., Basak A. (2016) Post-translational protein modifications of rare and unconventional types: Implications in functions and diseases. Curr. Med. Chem., 23(7), 714–745. CrossRef Scholar google search
Lim C.S., Wardell S.J.T., Kleffmann T., Brown C.M. (2018) The exon-intron gene structure upstream of the initiation codon predicts translation efficiency. Nucleic Acids Res., 46(9), 4575–4591. CrossRef Scholar google search
Sciarrillo R., Wojtuszkiewicz A., Kooi I.E., Gómez V.E., Boggi U., Jansen G., Kaspers G.J., Cloos J., Giovannetti E. (2016) Using RNA-sequencing to detect novel splice variants related to drug resistance in in vitro cancer models. J. Vis. Exp., 9(118), 54714. CrossRef Scholar google search
Roy M., Xu Q., Lee C. (2005) Evidence that public database records for many cancer-associated genes reflect a splice form found in tumors and lack normal splice forms. Nucleic Acids Res., 33(16), 5026–5033. CrossRef Scholar google search
Cmero M., Schmidt B., Majewski I.J., Ekert P.G., Oshlack A., Davidson N.M. (2021) MINTIE: Identifying novel structural and splice variants in transcriptomes using RNA-seq data. Genome Biol., 22, 296. CrossRef Scholar google search
Adamopoulos P.G., Kontos C.K., Scorilas A., Sideris D.C. (2020) Identification of novel alternative transcripts of the human Ribonuclease κ (RNASEK) gene using 3′ RACE and high-throughput sequencing approaches. Genomics, 112(1), 943–951. CrossRef Scholar google search
Morales J., Pujar S., Loveland J.E., Astashyn A., Bennett R., Berry A., Cox E., Davidson C., Ermolaeva O., Farrell C.M., Fatima R., Gil L., Goldfarb T., Gonzalez J.M., Haddad D., Hardy M., Hunt T., Jackson J., Joardar V.S., Kay M., Kodali V.K., McGarvey K.M., McMahon A., Mudge J.M., Murphy D.N., Murphy M.R., Rajput B., Rangwala S.H., Riddick L.D., Thibaud-Nissen F., Threadgold G., Vatsan A.R., Wallin C., Webb D., Flicek P., Birney E., Pruitt K.D., Frankish A., Cunningham F., Murphy T.D. (2022) A joint NCBI and EMBL-EBI transcript set for clinical genomics and research. Nature, 604(7905), 310–315. CrossRef Scholar google search
Reixachs-Solé M., Eyras E. (2022) Uncovering the impacts of alternative splicing on the proteome with current omics techniques. Wiley Interdiscip. Rev. RNA, 13(4), e1707. CrossRef Scholar google search
Nesvizhskii A.I., Keller A., Kolker E., Aebersold R. (2003) A statistical model for identifying proteins by tandem mass spectrometry. Anal. Chem., 75(17), 4646–4658. CrossRef Scholar google search
Sinitcyn P., Richards A.L., Weatheritt R.J., Brademan D.R., Marx H., Shishkova E., Meyer J.G., Hebert A.S., Westphall M.S., Blencowe B.J., Cox J., Coon J.J. (2023) Global detection of human variants and isoforms by deep proteome sequencing. Nat. Biotechnol., 41(12), 1776–1786. CrossRef Scholar google search
Lek M., Karczewski K.J., Minikel E.V., Samocha K.E., Banks E., Fennell T., O'Donnell-Luria A.H., Ware J.S., Hill A.J., Cumming B.B., Tukiainen T., Birnbaum D.P., Kosmicki J.A., Duncan L.E., Estrada K., Zhao F., Zou J., Pierce-Hoffman E., Berghout J., Cooper D.N., Deflaux N., de Pristo M., Do R., Flannick J., Fromer M., Gauthier L., Goldstein J., Gupta N., Howrigan D., Kiezun A., Kurki M.I., Moonshine A.L., Natarajan P., Orozco L., Peloso G.M., Poplin R., Rivas M.A., Ruano-Rubio V., Rose S.A., Ruderfer D.M., Shakir K., Stenson P.D., Stevens C., Thomas B.P., Tiao G., Tusie-Luna M.T., Weisburd B., Won H.H., Yu D., Altshuler D.M., Ardissino D., Boehnke M., Danesh J., Donnelly S., Elosua R., Florez J.C., Gabriel S.B., Getz G., Glatt S.J., Hultman C.M., Kathiresan S., Laakso M., McCarroll S., McCarthy M.I., McGovern D., McPherson R., Neale B.M., Palotie A., Purcell S.M., Saleheen D., Scharf J.M., Sklar P., Sullivan P.F., Tuomilehto J., Tsuang M.T., Watkins H.C., Wilson J.G., Daly M.J., MacArthur D.G., Exome Aggregation Consortium (2016) Analysis of protein-coding genetic variation in 60,706 humans. Nature, 536(7616), 285–291. CrossRef Scholar google search
Omenn G.S., Lane L., Overall C.M., Corrales F.J., Schwenk J.M., Paik Y.K., van Eyk J.E., Liu S., Snyder M., Baker M.S., Deutsch E.W. (2018) Progress on identifying and characterizing the human proteome: 2018 metrics from the HUPO human proteome project. J. Proteome Res., 17(12), 4031–4041. CrossRef Scholar google search
Senior A.W., Evans R., Jumper J., Kirkpatrick J., Sifre L., Green T., Qin C., Žídek A., Nelson A.W.R., Bridgland A., Penedones H., Petersen S., Simonyan K., Crossan S., Kohli P., Jones D.T., Silver D., Kavukcuoglu K., Hassabis D. (2020) Improved protein structure prediction using potentials from deep learning. Nature, 577(7792), 706–710. CrossRef Scholar google search
Walker A.S., Clardy J. (2021) A machine learning bioinformatics method to predict biological activity from biosynthetic gene clusters. J. Chem. Inf. Model., 61(6), 2560–2571. CrossRef Scholar google search
Wright C.J., Smith C.W.J., Jiggins C.D. (2022) Alternative splicing as a source of phenotypic diversity. Nat. Rev. Genet., 23(11), 697–710. CrossRef Scholar google search
Chandramouli K., Qian P.-Y. (2009) Proteomics: Challenges, techniques and possibilities to overcome biological sample complexity. Human Genomics Proteomics, 2009, 239204. CrossRef Scholar google search