Rxivist logo

Effects of variable mutation rates and epistasis on the distribution of allele frequencies in humans

By Arbel Harpak, Anand Bhaskar, Jonathan K. Pritchard

Posted 13 Apr 2016
bioRxiv DOI: 10.1101/048421 (published DOI: 10.1371/journal.pgen.1006489)

The site frequency spectrum (SFS) has long been used to study demographic history and natural selection. Here, we extend this summary by examining the SFS conditional on the alleles found at the same site in other species. We refer to this extension as the "phylogenetically-conditioned SFS" or cSFS. Using recent large-sample data from the Exome Aggregation Consortium (ExAC), combined with primate genome sequences, we find that human variants that occurred independently in closely related primate lineages are at higher frequencies in humans than variants with parallel substitutions in more distant primates. We show that this effect is largely due to sites with elevated mutation rates causing significant departures from the widely-used infinite sites mutation model. Our analysis also suggests substantial variation in mutation rates even among mutations involving the same nucleotide changes. We additionally find evidence for epistatic effects on the cSFS: namely, that parallel primate substitutions at nonsynonymous sites are more informative about constraint in humans when the parallel substitution occurs in a closely related species. In summary, we show that variable mutation rates and local sequence context are important determinants of the SFS in humans.

Download data

  • Downloaded 1,650 times
  • Download rankings, all-time:
    • Site-wide: 5,335 out of 94,912
    • In genomics: 862 out of 5,955
  • Year to date:
    • Site-wide: 66,449 out of 94,912
  • Since beginning of last month:
    • Site-wide: 76,123 out of 94,912

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)