Rxivist logo

Predicting Alignment Distances via Continuous Sequence Matching

By Jian Chen, Le Yang, Lu Li, Yijun Sun

Posted 27 May 2020
bioRxiv DOI: 10.1101/2020.05.24.113852

Sequence comparison is the basis of various applications in bioinformatics. Recently, the increase in the number and length of sequences has allowed us to extract more and more accurate information from the data. However, the premise of obtaining such information is that we can compare a large number of long sequences accurately and quickly. Neither the traditional dynamic programming-based algorithms nor the alignment-free algorithms proposed in recent years can satisfy both the requirements of accuracy and speed. Recently, in order to meet the requirements, researchers have proposed a data-dependent approach to learn sequence embeddings, but its capability is limited by the structure of its embedding function. In this paper, we propose a new embedding function specifically designed for biological sequences to map sequences into embedding vectors. Combined with the neural network structure, we can adjust this embedding function so that it can be used to quickly and reliably predict the alignment distance between sequences. We illustrated the effectiveness and efficiency of the proposed method on various types of amplicon sequences. More importantly, our experiment on full length 16S rRNA sequences shows that our approach would lead to a general model that can quickly and reliably predict the pairwise alignment distance of any pair of full-length 16S rRNA sequences with high accuracy. We believe such a model can greatly facilitate large scale sequence analysis. ### Competing Interest Statement The authors have declared no competing interest.

Download data

  • Downloaded 250 times
  • Download rankings, all-time:
    • Site-wide: 90,368
    • In bioinformatics: 7,975
  • Year to date:
    • Site-wide: 63,207
  • Since beginning of last month:
    • Site-wide: 63,207

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)