In-solution Y-chromosome capture-enrichment on ancient DNA libraries

By Diana I Cruz Dávalos, María A Nieves-Colón, Alexandra Sockell, G. David Poznik, Hannes Schroeder, Anne C. Stone, Carlos D Bustamante, Anna-Sapfo Malaspinas, María C Ávila-Arcos

Posted 22 Nov 2017
bioRxiv DOI: 10.1101/223214 (published DOI: 10.1186/s12864-018-4945-x)

Background: As most ancient biological samples have low levels of endogenous DNA, it is advantageous to enrich for specific genomic regions prior to sequencing. One approach - in-solution capture-enrichment - retrieves sequences of interest and reduces the fraction of microbial DNA. In this work, we implement a capture-enrichment approach targeting informative regions of the Y chromosome in six human archaeological remains excavated in the Caribbean and dated between 200 and 3,000 years BP. We compare the recovery rate of Y-chromosome capture (YCC) alone, whole-genome capture followed by YCC (WGC+Y) versus non-enriched (pre-capture) libraries. Results: We recovered 17-4,152 times more targeted unique Y-chromosome sequences after capture, where 0.01-6.2% (WGC+Y) and 0.01-23.5% (YCC) of the sequence reads were on-target, compared to 0.0002-0.004% pre-capture. In samples with endogenous DNA content greater than 0.1%, we found that WGC followed by YCC (WGC+Y) yields lower enrichment due to the loss of complexity in consecutive capture experiments, whereas in samples with lower endogenous content, WGC+Y yielded greater enrichment than YCC alone. Finally, increasing recovery of informative sites enabled us to assign Y-chromosome haplogroups to some of the archeological remains and gain insights about their paternal lineages and origins. Conclusions: We present to our knowledge the first in-solution capture-enrichment method targeting the human Y-chromosome in aDNA sequencing libraries. YCC and WGC+Y enrichments lead to an increase in the amount of Y-DNA sequences, as compared to libraries not enriched for the Y-chromosome. Our probe design effectively recovers regions of the Y-chromosome bearing phylogenetically informative sites, allowing us to identify paternal lineages with less sequencing than needed for pre-capture libraries. Finally, we recommend considering the endogenous content in the experimental design and avoiding consecutive rounds of capture for low-complexity libraries, as clonality increases considerably with each round.

