Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 62,747 bioRxiv papers from 278,434 authors.
Most downloaded bioRxiv papers, all time
in category evolutionary biology
4,155 results found. For more information, click each entry to expand.
3,718 downloads evolutionary biology
More than a decade of DNA barcoding encompassing about five million specimens covering 100,000 animal species supports the generalization that mitochondrial DNA clusters largely overlap with species as defined by domain experts. Most barcode clustering reflects synonymous substitutions. What evolutionary mechanisms account for synonymous clusters being largely coincident with species? The answer depends on whether variants are phenotypically neutral. To the degree that variants are selectable, purifying selection limits variation within species and neighboring species have distinct adaptive peaks. Phenotypically neutral variants are only subject to demographic processes: drift, lineage sorting, genetic hitchhiking, and bottlenecks. The evolution of modern humans has been studied from several disciplines with detail unique among animal species. Mitochondrial barcodes provide a commensurable way to compare modern humans to other animal species. Barcode variation in the modern human population is quantitatively similar to that within other animal species. Several convergent lines of evidence show that mitochondrial diversity in modern humans follows from sequence uniformity followed by the accumulation of largely neutral diversity during a population expansion that began approximately 100,000 years ago. A straightforward hypothesis is that the extant populations of almost all animal species have arrived at a similar result consequent to a similar process of expansion from mitochondrial uniformity within the last one to several hundred thousand years.
3,676 downloads evolutionary biology
Genomic information from ancient human remains is beginning to show its full potential for learning about human prehistory. We review the last few years' dramatic finds about European prehistory based on genomic data from humans that lived many millennia ago and relate it to modern-day patterns of genomic variation. The early times, the Upper Palaeolithic, appears to contain several population turn-overs followed by more stable populations after the Last Glacial Maximum and during the Mesolithic. Some 11,000 years ago the migrations driving the Neolithic transition start from around Anatolia and reach the north and the west of Europe millennia later followed by major migrations during the Bronze age. These findings show that culture and lifestyle were major determinants of genomic differentiation and similarity in pre-historic Europe rather than geography as is the case today.
3,565 downloads evolutionary biology
Ecological studies routinely show genotype-genotype interactions between insects and their parasites. The mechanisms behind these interactions are not clearly understood. Using the bumblebee Bombus terrestris / trypanosome Crithidia bombi model system, we have carried out a transcriptome-wide analysis of gene expression and alternative splicing in bees during C. bombi infection. We have performed four analyses, 1) comparing gene expression in infected and non-infected bees 24 hours after infection by Crithidia bombi, 2) comparing expression at 24 and 48 hours after C.bombi infection, 3) searching for differential gene expression associated with the host-parasite genotype-genotype interaction at 24 hours after infection and 4) searching for alternative splicing associated with the host-parasite genotype-genotype interaction at 24 hours post infection. We found a large number of genes differentially regulated related to numerous canonical immune pathways. These genes include receptors, signaling pathways and effectors. We discovered a possible interaction between the peritrophic membrane and the insect immune system in defense against Crithidia. Most interestingly we found differential expression and alternative splicing of Dscam related transcripts and a novel immunoglobulin related gene Twitchin depends on the genotype-genotype interactions of the given bumblebee colony and Crithidia strain.
3,542 downloads evolutionary biology
Advancements in portable scientific instruments provide promising avenues to expedite field work in order to understand the diverse array of organisms that inhabit our planet. Here we tested the feasibility for in situ molecular analyses of endemic fauna using a portable laboratory fitting within a single backpack, in one of the most imperiled biodiversity hotspots: the Ecuadorian Choco rainforest. We utilized portable equipment, including the MinION DNA sequencer (Oxford Nanopore Technologies) and miniPCR (miniPCR), to perform DNA extraction, PCR amplification and real-time DNA barcode sequencing of reptile specimens in the field. We demonstrate that nanopore sequencing can be implemented in a remote tropical forest to quickly and accurately identify species using DNA barcoding, as we generated consensus sequences for species resolution with an accuracy of >99% in less than 24 hours after collecting specimens. In addition, we generated sequence information at Universidad Tecnologica Indoamerica in Quito for the recently re-discovered Jambato toad Atelopus ignescens, which was thought to be extinct for 28 years, a rare species of blind snake Trilepida guayaquilensis, and two undescribed species of Dipsas snakes. In this study we establish how mobile laboratories and nanopore sequencing can help to accelerate species identification in remote areas (especially for species that are difficult to diagnose based on characters of external morphology), be applied to local research facilities in developing countries, and rapidly generate information for species that are rare, endangered and undescribed, which can potentially aid in conservation efforts.
3,422 downloads evolutionary biology
As a result of the process of descent with modification, closely related species tend to be similar to one another in a myriad different ways. In statistical terms, this means that traits measured on one species will not be independent of traits measured on others. Since their introduction in the 1980s, phylogenetic comparative methods (PCMs) have been framed as a solution to this problem. In this paper, we argue that this way of thinking about PCMs is deeply misleading. Not only has this sowed widespread confusion in the literature about what PCMs are doing but has led us to develop methods that are susceptible to the very thing we sought to build defenses against --- unreplicated evolutionary events. Through three Case Studies, we demonstrate that the susceptibility to singular events is indeed a recurring problem in comparative biology that links several seemingly unrelated controversies. In each Case Study we propose a potential solution to the problem. While the details of our proposed solutions differ, they share a common theme: unifying hypothesis testing with data-driven approaches (which we term "phylogenetic natural history") to disentangle the impact of singular evolutionary events from that of the factors we are investigating. More broadly, we argue that our field has, at times, been sloppy when weighing evidence in support of causal hypotheses. We suggest that one way to refine our inferences is to re-imagine phylogenies as probabilistic graphical models; adopting this way of thinking will help clarify precisely what we are testing and what evidence supports our claims.
3,346 downloads evolutionary biology
Gut microbiota are shaped by a combination of ecological and evolutionary forces. While the ecological dynamics have been extensively studied, much less is known about how species of gut bacteria evolve over time. Here we introduce a model-based framework for quantifying evolutionary dynamics within and across hosts using a panel of metagenomic samples. We use this approach to study evolution in ~30 prevalent species in the human gut. Although the patterns of between-host diversity are consistent with quasi-sexual evolution and purifying selection on long timescales, we identify new genealogical signatures that challenge standard population genetic models of these processes. On shorter timescales within hosts, we find that genetic differences only rarely arise from the invasion of distantly related strains. Instead, the resident strains more commonly acquire a smaller number of evolutionary changes, in which nucleotide variants or gene gains or losses rapidly sweep to high frequency over ~6 month timescales. By comparing these mutations with the typical between-host differences, we find evidence that sweeps are driven by introgression from other strains, rather than by new mutations. Our results suggest that gut bacteria evolve on human-relevant timescales, and highlight the feedback between short- and long-term evolution across hosts.
3,338 downloads evolutionary biology
Background: After three decades of mtDNA studies on human evolution the only incontrovertible main result is the African origin of all extant modern humans. In addition, a southern coastal route has been relentlessly imposed to explain the Eurasian colonization of these African pioneers. Based on the age of macrohaplogroup L3, from which all maternal Eurasian and the majority of African lineages originated, that out-of-Africa event has been dated around 60-70 kya. On the opposite side, we have proposed a northern route through Central Asia across the Levant for that expansion. Consistent with the fossil record, we have dated it around 125 kya. To help bridge differences between the molecular and fossil record ages, in this article we assess the possibility that mtDNA macrohaplogroup L3 matured in Eurasia and returned to Africa as basic L3 lineages around 70 kya. Results: The coalescence ages of all Eurasian (M,N) and African L3 lineages, both around 71 kya, are not significantly different. The oldest M and N Eurasian clades are found in southeastern Asia instead near of Africa as expected by the southern route hypothesis. The split of the Y-chromosome composite DE haplogroup is very similar to the age of mtDNA L3. A Eurasian origin and back migration to Africa has been proposed for the African Y-chromosome haplogroup E. Inside Africa, frequency distributions of maternal L3 and paternal E lineages are positively correlated. This correlation is not fully explained by geographic or ethnic affinities. It seems better to be the result of a joint and global replacement of the old autochthonous male and female African lineages by the new Eurasian incomers. Conclusions: These results are congruent with a model proposing an out-of-Africa of early anatomically modern humans around 125 kya. A return to Africa of Eurasian fully modern humans around 70 kya, and a second Eurasian global expansion by 60 kya. Climatic conditions and the presence of Neanderthals played key roles in these human movements.
3,338 downloads evolutionary biology
Gaussian processes such as Brownian motion and the Ornstein-Uhlenbeck process have been popular models for the evolution of quantitative traits and are widely used in phylogenetic comparative methods. However, they have drawbacks which limit their utility. Here I describe new, non-Gaussian stochastic differential equation (diffusion) models of quantitative trait evolution. I present general methods for deriving new diffusion models, and discuss possible schemes for fitting non-Gaussian evolutionary models to trait data. The theory of stochastic processes provides a mathematical framework for understanding the properties of current, new and future phylogenetic comparative methods. Attention to the mathematical details of models of trait evolution and diversification may help avoid some pitfalls when using stochastic processes to model macroevolution.
3,269 downloads evolutionary biology
Geographic patterns of genetic variation within modern populations, produced by complex histories of migration, can be difficult to infer and visually summarize. A general consequence of geographically limited dispersal is that samples from nearby locations tend to be more closely related than samples from distant locations, and so genetic covariance often recapitulates geographic proximity. We use genome-wide polymorphism data to build ``geogenetic maps,'' which, when applied to stationary populations, produces a map of the geographic positions of the populations, but with distances distorted to reflect historical rates of gene flow. In the underlying model, allele frequency covariance is a decreasing function of geogenetic distance, and nonlocal gene flow such as admixture can be identified as anomalously strong covariance over long distances. This admixture is explicitly co-estimated and depicted as arrows, from the source of admixture to the recipient, on the geogenetic map. We demonstrate the utility of this method on a circum-Tibetan sampling of the greenish warbler (Phylloscopus trochiloides), in which we find evidence for gene flow between the adjacent, terminal populations of the ring species. We also analyze a global sampling of human populations, for which we largely recover the geography of the sampling, with support for significant histories of admixture in many samples. This new tool for understanding and visualizing patterns of population structure is implemented in a Bayesian framework in the program SpaceMix.
3,247 downloads evolutionary biology
Although initial studies suggested that Denisovan ancestry was found only in modern human populations from island Southeast Asia and Oceania, more recent studies have suggested that Denisovan ancestry may be more widespread. However, the geographic extent of Denisovan ancestry has not been determined, and moreover the relationship between the Denisovan ancestry in Oceania and that elsewhere has not been studied. Here we analyze genome-wide SNP data from 2493 individuals from 221 worldwide populations, and show that there is a widespread signal of a very low level of Denisovan ancestry across Eastern Eurasian and Native American (EE/NA) populations. We also verify a higher level of Denisovan ancestry in Oceania than that in EE/NA; the Denisovan ancestry in Oceania is correlated with the amount of New Guinea ancestry, but not the amount of Australian ancestry, indicating that recent gene flow from New Guinea likely accounts for signals of Denisovan ancestry across Oceania. However, Denisovan ancestry in EE/NA populations is equally correlated with their New Guinea or their Australian ancestry, suggesting a common source for the Denisovan ancestry in EE/NA and Oceanian populations. Our results suggest that Denisovan ancestry in EE/NA is derived either from common ancestry with, or gene flow from, the common ancestor of New Guineans and Australians, indicating a more complex history involving East Eurasians and Oceanians than previously suspected.
3,242 downloads evolutionary biology
Several methods have been proposed to test for introgression across genomes. One method tests for a genome-wide excess of shared derived alleles between taxa using Patterson?s D statistic, but does not establish which loci show such an excess or whether the excess is due to introgression or ancestral population structure. Several recent studies have extended the use of D by applying the statistic to small genomic regions, rather than genome-wide. Here, we use simulations and whole genome data from Heliconius butterflies to investigate the behavior of D in small genomic regions. We find that D is unreliable in this situation as it gives inflated values when effective population size is low, causing D outliers to cluster in genomic regions of reduced diversity. As an alternative, we propose a related statistic f̂d, a modified version of a statistic originally developed to estimate the genome-wide fraction of admixture. f̂d is not subject to the same biases as D, and is better at identifying introgressed loci. Finally, we show that both D and f̂d outliers tend to cluster in regions of low absolute divergence (dXY), which can confound a recently proposed test for differentiating introgression from shared ancestral variation at individual loci.
3,192 downloads evolutionary biology
An open question in human evolution is the importance of polygenic adaptation: adaptive changes in the mean of a multifactorial trait due to shifts in allele frequencies across many loci. In recent years, several methods have been developed to detect polygenic adaptation using loci identified in genome-wide association studies (GWAS). Though powerful, these methods suffer from limited interpretability: they can detect which sets of populations have evidence for polygenic adaptation, but are unable to reveal where in the history of multiple populations these processes occurred. To address this, we created a method to detect polygenic adaptation in an admixture graph, which is a representation of the historical divergences and admixture events relating different populations through time. We developed a Markov chain Monte Carlo (MCMC) algorithm to infer branch-specific parameters reflecting the strength of selection in each branch of a graph. Additionally, we developed a set of summary statistics that are fast to compute and can indicate which branches are most likely to have experienced polygenic adaptation. We show via simulations that this method - which we call PolyGraph - has good power to detect polygenic adaptation, and applied it to human population genomic data from around the world. We also provide evidence that variants associated with several traits, including height, educational attainment, and self-reported unibrow, have been influenced by polygenic adaptation in different populations during human evolution.
3,182 downloads evolutionary biology
The multi-species coalescent has provided important progress for evolutionary inferences, including increasing the statistical rigor and objectivity of comparisons among competing species delimitation models. However, Bayesian species delimitation methods typically require brute force integration over gene trees via Markov chain Monte Carlo (MCMC), which introduces a large computation burden and precludes their application to genomic-scale data. Here we combine a recently introduced dynamic programming algorithm for estimating species trees that bypasses MCMC integration over gene trees with sophisticated methods for estimating marginal likelihoods, needed for Bayesian model selection, to provide a rigorous and computationally tractable technique for genome-wide species delimitation. We provide a critical yet simple correction that brings the likelihoods of different species trees, and more importantly their corresponding marginal likelihoods, to the same common denominator, which enables direct and accurate comparisons of competing species delimitation models using Bayes factors. We test this approach, which we call Bayes factor delimitation (*with genomic data; BFD*), using common species delimitation scenarios with computer simulations. Varying the numbers of loci and the number of samples suggest that the approach can distinguish the true model even with few loci and limited samples per species. Misspecification of the prior for population size θ has little impact on support for the true model. We apply the approach to West African forest geckos (Hemidactylus fasciatus complex) using genome-wide SNP data data. This new Bayesian method for species delimitation builds on a growing trend for objective species delimitation methods with explicit model assumptions that are easily tested.
3,177 downloads evolutionary biology
Our understanding of the genetic basis of human adaptation is biased toward loci of large phenotypic effect. Genome wide association studies (GWAS) now enable the study of genetic adaptation in highly polygenic phenotypes. Here we test for polygenic adaptation among 187 world-wide human populations using polygenic scores constructed from GWAS of 34 complex traits. Comparing these polygenic scores to a null distribution under genetic drift, we identify strong signals of selection for a suite of anthropometric traits including height, infant head circumference (IHC), hip circumference and waist-to-hip ratio (WHR), as well as type 2 diabetes (T2D). In addition to the known north-south gradient of polygenic height scores within Europe, we find that natural selection has contributed to a gradient of decreasing polygenic height scores from West to East across Eurasia. Analyzing a set of ancient DNA samples from across Eurasia, we show that much of this gradient can be explained by independent selection for increased height in two long diverged hunter-gatherer populations living in western and west-central Euraisa sometime during or shortly after the last glacial maximum. We further find that the signal of selection on hip circumference can largely be explained as a correlated response to selection on height. However, our signals in IHC and WHR cannot, suggesting that these patterns are the result of selection along multiple axes of body shape variation. Our observation that IHC and WHR polygenic scores follow a strong latitudinal cline in Western Eurasia support the role of natural selection in establishing Bergmann's Rule in humans, and are consistent with thermoregulatory adaptation in response to latitudinal temperature variation. Author's Note on Failure to Replicate: After this preprint was posted, the UK Biobank dataset was released, providing a new and open GWAS resource. When attempting to replicate the height selection results from this preprint using GWAS data from the UK Biobank, we discovered that we could not. In subsequent analyses, we determined that both the GIANT consortium height GWAS data, as well as another dataset that was used for replication, were impacted by stratification issues that created or at a minimum substantially inflated the height selection signals reported here. The results of this second investigation, written together with additional coauthors, have now been published (https://elifesciences.org/articles/39725 along with another paper by a separate group of authors, showing similar issues https://elifesciences.org/articles/39702). A preliminary investigation shows that the other non-height based results may suffer from similar issues. We stand by the theory and statistical methods reported in this paper, and the paper can be cited for these results. However, we have shown that the data on which the major empirical results were based are not sound, and so should be treated with caution until replicated.
3,116 downloads evolutionary biology
Nathan Nakatsuka, Priya Moorjani, Niraj Rai, Biswanath Sarkar, Arti Tandon, Nick Patterson, Gandham SriLakshmi Bhavani, Katta Mohan Girisha, Mohammed S Mustak, Sudha Srinivasan, Amit Kaushik, Saadi Abdul Vahab, Sujatha M Jagadeesh, Kapaettu Satyamoorthy, Lalji Singh, David Reich, Kumarasamy Thangaraj
The more than 1.5 billion people who live in South Asia are correctly viewed not as a single large population, but as many small endogamous groups. We assembled genome-wide data from over 2,800 individuals from over 260 distinct South Asian groups. We identify 81 unique groups, of which 14 have estimated census sizes of more than a million, that descend from founder events more extreme than those in Ashkenazi Jews and Finns, both of which have high rates of recessive disease due to founder events. We identify multiple examples of recessive diseases in South Asia that are the result of such founder events. This study highlights an under-appreciated opportunity for reducing disease burden among South Asians through the discovery of and testing for recessive disease genes.
3,077 downloads evolutionary biology
Reconstructing the phylogenetic relationships that unite all lineages (the tree of life) is a grand challenge. The paucity of homologous character data across disparately related lineages currently renders direct phylogenetic inference untenable. To reconstruct a comprehensive tree of life we therefore synthesized published phylogenies, together with taxonomic classifications for taxa never incorporated into a phylogeny. We present a draft tree containing 2.3 million tips -- the Open Tree of Life. Realization of this tree required the assembly of two additional community resources: 1) a novel comprehensive global reference taxonomy; and 2) a database of published phylogenetic trees mapped to this taxonomy. Our open source framework facilitates community comment and contribution, enabling the tree to be continuously updated when new phylogenetic and taxonomic data become digitally available. While data coverage and phylogenetic conflict across the Open Tree of Life illuminate gaps in both the underlying data available for phylogenetic reconstruction and the publication of trees as digital objects, the tree provides a compelling starting point for community contribution. This comprehensive tree will fuel fundamental research on the nature of biological diversity, ultimately providing up-to-date phylogenies for downstream applications in comparative biology, ecology, conservation biology, climate change, agriculture, and genomics.
3,075 downloads evolutionary biology
Hybridization between humans and Neanderthals has resulted in a low level of Neanderthal ancestry scattered across the genomes of many modern-day humans. After hybridization, on average, selection appears to have removed Neanderthal alleles from the human population. Quantifying the strength and causes of this selection against Neanderthal ancestry is key to understanding our relationship to Neanderthals and, more broadly, how populations remain distinct after secondary contact. Here, we develop a novel method for estimating the genome-wide average strength of selection and the density of selected sites using estimates of Neanderthal allele frequency along the genomes of modern-day humans. We confirm that East Asians had somewhat higher initial levels of Neanderthal ancestry than Europeans even after accounting for selection. We find that the bulk of purifying selection against Neanderthal ancestry is best understood as acting on many weakly deleterious alleles. We propose that the majority of these alleles were effectively neutral---and segregating at high frequency---in Neanderthals, but became selected against after entering human populations of much larger effective size. While individually of small effect, these alleles potentially imposed a heavy genetic load on the early-generation human--Neanderthal hybrids. This work suggests that differences in effective population size may play a far more important role in shaping levels of introgression than previously thought.
3,072 downloads evolutionary biology
This article describes a Bayesian method for inferring both species delimitations and species trees under the multispecies coalescent model using DNA sequences from multiple loci. The focus here is on species delimitation with no a priori assignment of individuals to species, and no guide tree. The method uses a new model for the population sizes along the branches of the species tree, and three new operators for sampling from the posterior using the Markov chain Monte Carlo (MCMC) algorithm. The correctness of the moves is demonstrated both by proofs and by tests of the implementation. Current practice, using a pipeline approach to species delimitation under the multispecies coalescent, has been shown to have major problems on simulated data (Olave et al, 2014). The same simulated data set is used to demonstrate the accuracy and efficiency of the present method. The method is implemented in a package called STACEY for BEAST2.
2,993 downloads evolutionary biology
The uneven distribution of species in the tree of life is rooted in unequal speciation and extinction among groups. Yet the causes of differential diversification are little known despite their relevance for sustaining biodiversity into the future. Here we investigate rates of species diversification across extant Mammalia, a compelling system that includes our own closest relatives. We develop a new phylogeny of nearly all ~6000 species using a 31-gene supermatrix and fossil node- and tip-dating approaches to establish a robust evolutionary timescale for mammals. Our findings link the causes of uneven modern species richness with ecologically-driven variation in rates of speciation and/or extinction, including 24 detected shifts in net diversification. Speciation rates are a stronger predictor of among-clade richness than clade age, countering claims of clock-like speciation in large phylogenies. Surprisingly, speciation rate heterogeneity in recent radiations shows limited association with latitude, despite the well-known increase in species richness toward the equator. Instead, we find a deeper-time association where clades of high-latitude species have the highest speciation rates, suggesting that species durations are shorter (turnover is higher) outside than inside the tropics. At shallower timescales (i.e., young clades), diurnality and low vagility are both linked to greater speciation rates and extant richness. We suggest that high turnover among small-ranged allopatric species has erased the signal of vagility in older clades, while diurnality has adaptively promoted lineage persistence. These findings highlight the underappreciated joint roles of ephemeral (turnover-based) and adaptive (persistence-based) processes of diversification, which manifest in recent and more ancient evolutionary radiations of mammals to explain modern diversity.
2,974 downloads evolutionary biology
As population genomic datasets grow in size, researchers are faced with the daunting task of making sense of a flood of information. To keep pace with this explosion of data, computational methodologies for population genetic inference are rapidly being developed to best utilize genomic sequence data. In this review we discuss a new paradigm that has emerged in computational population genomics: that of supervised machine learning. We review the fundamentals of machine learning, discuss recent applications of supervised machine learning to population genetics that outperform competing methods, and describe promising future directions in this area. Ultimately, we argue that supervised machine learning is an important and underutilized tool that has considerable potential for the world of evolutionary genomics.
- Top preprints of 2018
- Paper search
- Author leaderboards
- Overall metrics
- The API
- Email newsletter
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!