Rxivist logo

Dog10K_Boxer_Tasha_1.0: A long-read assembly of the dog reference genome

By vidhya jagannathan, Christophe HITTE, Jeffrey M. Kidd, patrick Materson, Terence D. Murphy, Sarah Emery, Brian W Davis, Reuben M Buckley, Yanhu Liu, Xiangquan Zhang, Tosso Leeb, Ya-ping Zhang, Elaine A. Ostrander, Guo-Dong Wang

Posted 05 May 2021
bioRxiv DOI: 10.1101/2021.05.05.442772

Abstract: The domestic dog has evolved to be an important biomedical model for studies regarding the genetic basis of disease, morphology and behavior. Genetic studies in the dog have relied on a draft reference genome of a purebred female boxer dog named Tasha initially published in 2005. Derived from a Sanger whole genome shotgun sequencing approach coupled with limited clone-based sequencing, the initial assembly and subsequent updates have served as the predominant resource for canine genetics for 15 years. While the initial assembly produced a good quality draft, as with all assemblies produced at the time it contained gaps, assembly errors and missing sequences, particularly in GC-rich regions, which are found at many promoters and in the first exons of protein coding genes. Here we present Dog10K_Boxer_Tasha_1.0, an improved chromosome-level highly contiguous genome assembly of Tasha created with long-read technologies, that increases sequence contiguity >100-fold, closes >23,000 gaps of the Canfam3.1 reference assembly and improves gene annotation by identifying >1200 new protein-coding transcripts. The assembly and annotation are available at NCBI under the accession GCF_000002285.5.

Download data

  • Downloaded 181 times
  • Download rankings, all-time:
    • Site-wide: 166,298
    • In genomics: 7,999
  • Year to date:
    • Site-wide: 115,519
  • Since beginning of last month:
    • Site-wide: 114,788

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide