A plethora of biological functions are performed through various types of protein-peptide binding. Prime examples include the protein kinase phosphorylation on peptide substrates and the binding of major histocompatibility complex to neoantigens in the immune system. Understanding the specificity of protein-peptide interactions is critical for unraveling the architectures of functional pathways and the mechanisms of cellular processes in human cells. Despite mass-spectrometric techniques were developed for the identification of protein-peptide interactions, our understanding of the preferences of proteins on their binding peptides is still rudimentary. As a complementary direction, a line of computational prediction methods has been recently proposed to predict protein-peptide bindings which efficiently provide rich functional annotations on a large scale. To achieve a high prediction accuracy, these computational methods require a sufficient amount of data to build the prediction model. However, the number of experimentally verified protein-peptide bindings is often limited in real cases. For example, a majority of protein kinases have very few experimentally verified phosphorylation sites (e.g., less than 30 sites) in existing databases. These methods are thus limited to building accurate prediction models for only well-characterized proteins with a large volume of known binding peptides and cannot be extended to predict new binding peptides for less-studied proteins. In this paper, we introduce a generic framework to address this issue of data scarcity in protein binding prediction. We demonstrate the applicability of our framework in predicting kinase-specific phosphorylation sites. Our method uses an effective training strategy to build a prediction model with robust transferability. The model is able to predict the phosphorylation sites of a less-studied kinase, even if there is only a small number of phosphorylation sites known for this kinase. To achieve this, we train the model via a meta-learning phase followed by a few-shot learning phase. We demonstrate our framework has better transferability than state-of-the-art methods and is effective in utilizing limited data to accurately predict phosphorylation sites for less-characterized kinases. The implementation of our framework is available at https://github.com/luoyunan/MetaKinase.
- Downloaded 861 times
- Download rankings, all-time:
- Site-wide: 17,629 out of 106,159
- In bioinformatics: 2,619 out of 9,474
- Year to date:
- Site-wide: 26,435 out of 106,159
- Since beginning of last month:
- Site-wide: 22,631 out of 106,159
Downloads over time
Distribution of downloads per paper, site-wide
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!