Rxivist logo

Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 73,530 bioRxiv papers from 319,996 authors.

Targeted sequence capture outperforms RNA-Seq and degenerate-primer PCR cloning for sequencing the largest mammalian multi-gene family

By Laurel R. Yohe, Kalina T. J. Davies, Nancy B. Simmons, Karen E. Sears, Elizabeth R. Dumont, Stephen J. Rossiter, Liliana M. Dávalos

Posted 13 Apr 2019
bioRxiv DOI: 10.1101/607994

Multigene families evolve from single-copy ancestral genes via duplication, and typically encode proteins critical to key biological processes. Molecular analyses of these gene families require high-confidence sequences, but the high sequence similarity of the members can create challenges for both sequencing and downstream analyses. Focusing on the common vampire bat, Desmodus rotundus, we evaluated how different sequencing approaches performed in recovering the largest mammalian protein-coding multigene family: olfactory receptors (OR). Using the common vampire bat genome as a reference, we determined the proportion of putatively protein-coding receptors recovered by: 1) amplicons from degenerate primers sequenced via Sanger technology, 2) RNA-Seq of the main olfactory epithelium, and 3) those genes captured with probes designed from transcriptomes of closely-related species. Our initial re-annotation of the high-quality vampire bat genome resulted in >400 intact OR genes, more than double the number based on original estimates. Sanger-sequenced amplicons performed the poorest among the three approaches, detecting <33% of receptors in the genome. In contrast, the transcriptome reliably recovered >50% of the annotated genomic ORs, and targeted sequence capture recovered nearly 75% of annotated genes. Each sequencing approach assembled high-quality sequences, even if it did not recover all putative receptors in the genome. Therefore, variation among assemblies was caused by low coverage of some receptors, rather than high rates of assembly error. Given this variability, we caution against using the counts of number of intact receptors per species to model the birth-death process of multigene families. Instead, our results support the use of orthologous sequences to explore and model the evolutionary processes shaping these genes.

Download data

  • Downloaded 274 times
  • Download rankings, all-time:
    • Site-wide: 44,291 out of 73,481
    • In genomics: 3,793 out of 4,874
  • Year to date:
    • Site-wide: 32,060 out of 73,481
  • Since beginning of last month:
    • Site-wide: 32,060 out of 73,481

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)