Rxivist logo

Rapid Genotype Refinement for Whole-Genome Sequencing Data using Multi-Variate Normal Distributions

By Rudy Arthur, Jared O’Connell, Ole Schulz-Trieglaff, Anthony J. Cox

Posted 12 Nov 2015
bioRxiv DOI: 10.1101/031484 (published DOI: 10.1093/bioinformatics/btw097)

Whole-genome low-coverage sequencing has been combined with linkage-disequilibrium (LD) based genotype refinement to accurately and cost-effectively infer genotypes in large cohorts of individuals. Most genotype refinement methods are based on hidden Markov models, which are accurate but computationally expensive. We introduce an algorithm that models LD using a simple multivariate Gaussian distribution. The key feature of our algorithm is its speed, it is hundreds of times faster than other methods on the same data set and its scaling behaviour is linear in the number of samples. We demonstrate the performance of the method on both low-coverage and high-coverage samples.

Download data

  • Downloaded 749 times
  • Download rankings, all-time:
    • Site-wide: 21,680 out of 103,808
    • In bioinformatics: 3,092 out of 9,474
  • Year to date:
    • Site-wide: 94,545 out of 103,808
  • Since beginning of last month:
    • Site-wide: 84,233 out of 103,808

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)