Rxivist logo

Learning Structural Motif Representations For Efficient Protein Structure Search

By Yang Liu, Qing Ye, Liwei Wang, Jian Peng

Posted 14 May 2017
bioRxiv DOI: 10.1101/137828 (published DOI: 10.1093/bioinformatics/bty585)

Understanding the relationship between protein structure and function is a fundamental problem in protein science. Given a protein of unknown function, fast identification of similar protein structures from the Protein Data Bank (PDB) is a critical step for inferring its biological function. Such structural neighbors can provide evolutionary insights into protein conformation, interfaces and binding sites that are not detectable from sequence similarity. However, the computational cost of performing pairwise structural alignment against all structures in PDB is prohibitively expensive. Alignment-free approaches have been introduced to enable fast but coarse comparisons by representing each protein as a vector of structure features or fingerprints and only computing similarity between vectors. As a notable example, FragBag represents each protein by a “bag of fragments”, which is a vector of frequencies of contiguous short backbone fragments from a predetermined library. Here we present a new approach to learning effective structural motif presentations using deep learning. We develop DeepFold, a deep convolutional neural network model to extract structural motif features of a protein structure. Similar to FragBag, DeepFold represents each protein structure or fold using a vector of learned structural motif features. We demonstrate that DeepFold substantially outperforms FragBag on protein structural search on a non-redundant protein structure database and a set of newly released structures. Remarkably, DeepFold not only extracts meaningful backbone segments but also finds important long-range interacting motifs for structural comparison. We expect that DeepFold will provide new insights into the evolution and hierarchical organization of protein structural motifs. The source code for generating DeepFold representation can be downloaded at https://github.com/largelymfs/DeepFold.

Download data

  • Downloaded 1,231 times
  • Download rankings, all-time:
    • Site-wide: 13,552
    • In bioinformatics: 1,702
  • Year to date:
    • Site-wide: 78,239
  • Since beginning of last month:
    • Site-wide: 78,624

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)