Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 70,836 bioRxiv papers from 309,131 authors.
Most downloaded bioRxiv papers, all time
in category evolutionary biology
4,576 results found. For more information, click each entry to expand.
3,160 downloads evolutionary biology
This article describes a Bayesian method for inferring both species delimitations and species trees under the multispecies coalescent model using DNA sequences from multiple loci. The focus here is on species delimitation with no a priori assignment of individuals to species, and no guide tree. The method uses a new model for the population sizes along the branches of the species tree, and three new operators for sampling from the posterior using the Markov chain Monte Carlo (MCMC) algorithm. The correctness of the moves is demonstrated both by proofs and by tests of the implementation. Current practice, using a pipeline approach to species delimitation under the multispecies coalescent, has been shown to have major problems on simulated data (Olave et al, 2014). The same simulated data set is used to demonstrate the accuracy and efficiency of the present method. The method is implemented in a package called STACEY for BEAST2.
3,133 downloads evolutionary biology
Background: Characterizations of the dynamics of hybrid zones in space and time can give insights about traits and processes important in population divergence and speciation. We characterized a hybrid zone between tanagers in the genus Ramphocelus (Aves, Thraupidae) located in southwestern Colombia. We tested whether this hybrid zone originated as a result of secondary contact or of primary differentiation, and described its dynamics across time using spatial analyses of molecular, morphological, and coloration data in combination with paleodistribution modeling. Results: Models of potential historical distributions based on climatic data and genetic signatures of demographic expansion suggested that the hybrid zone likely originated following secondary contact between populations that expanded their ranges out of isolated areas in the Quaternary. Concordant patterns of variation in phenotypic characters across the hybrid zone and its narrow extent are suggestive of a tension zone, maintained by a balance between dispersal and selection against hybrids. Estimates of phenotypic cline parameters obtained using specimens collected over nearly a century revealed that, in recent decades, the zone appears to have moved to the east and to higher elevations, and has apparently become narrower. Genetic variation was not clearly structured along the hybrid zone, but comparisons between historical and contemporary specimens suggested that temporal changes in its genetic makeup may also have occurred. Conclusions: Our data suggest that the hybrid zone likey resulted from secondary contact between populations. The observed changes in the hybrid zone may be a result of sexual selection, asymmetric gene flow, or environmental change.
3,124 downloads evolutionary biology
Reconstructing the phylogenetic relationships that unite all lineages (the tree of life) is a grand challenge. The paucity of homologous character data across disparately related lineages currently renders direct phylogenetic inference untenable. To reconstruct a comprehensive tree of life we therefore synthesized published phylogenies, together with taxonomic classifications for taxa never incorporated into a phylogeny. We present a draft tree containing 2.3 million tips -- the Open Tree of Life. Realization of this tree required the assembly of two additional community resources: 1) a novel comprehensive global reference taxonomy; and 2) a database of published phylogenetic trees mapped to this taxonomy. Our open source framework facilitates community comment and contribution, enabling the tree to be continuously updated when new phylogenetic and taxonomic data become digitally available. While data coverage and phylogenetic conflict across the Open Tree of Life illuminate gaps in both the underlying data available for phylogenetic reconstruction and the publication of trees as digital objects, the tree provides a compelling starting point for community contribution. This comprehensive tree will fuel fundamental research on the nature of biological diversity, ultimately providing up-to-date phylogenies for downstream applications in comparative biology, ecology, conservation biology, climate change, agriculture, and genomics.
3,119 downloads evolutionary biology
Hybridization between humans and Neanderthals has resulted in a low level of Neanderthal ancestry scattered across the genomes of many modern-day humans. After hybridization, on average, selection appears to have removed Neanderthal alleles from the human population. Quantifying the strength and causes of this selection against Neanderthal ancestry is key to understanding our relationship to Neanderthals and, more broadly, how populations remain distinct after secondary contact. Here, we develop a novel method for estimating the genome-wide average strength of selection and the density of selected sites using estimates of Neanderthal allele frequency along the genomes of modern-day humans. We confirm that East Asians had somewhat higher initial levels of Neanderthal ancestry than Europeans even after accounting for selection. We find that the bulk of purifying selection against Neanderthal ancestry is best understood as acting on many weakly deleterious alleles. We propose that the majority of these alleles were effectively neutral---and segregating at high frequency---in Neanderthals, but became selected against after entering human populations of much larger effective size. While individually of small effect, these alleles potentially imposed a heavy genetic load on the early-generation human--Neanderthal hybrids. This work suggests that differences in effective population size may play a far more important role in shaping levels of introgression than previously thought.
3,112 downloads evolutionary biology
The social hymenoptera are emerging as models for epigenetics. DNA methylation, the addition of a methyl group, is a common epigenetic marker. In mammals and flowering plants methylation affects allele specific expression. There is contradictory evidence for the role of methylation on allele specific expression in social insects. The aim of this paper is to investigate allele specific expression and monoallelic methylation in the bumblebee, Bombus terrestris. We found nineteen genes that were both monoallelically methylated and monoallelically expressed in a single bee. Fourteen of these genes express the hypermethylated allele, while the other five express the hypomethylated allele. We also searched for allele specific expression in twenty-nine published RNA-seq libraries. We found 555 loci with allele-specific expression. We discuss our results with reference to the functional role of methylation in gene expression in insects and in the, as yet unquantified, role of genetic cis effects in insect allele specific methylation and expression.
3,020 downloads evolutionary biology
As population genomic datasets grow in size, researchers are faced with the daunting task of making sense of a flood of information. To keep pace with this explosion of data, computational methodologies for population genetic inference are rapidly being developed to best utilize genomic sequence data. In this review we discuss a new paradigm that has emerged in computational population genomics: that of supervised machine learning. We review the fundamentals of machine learning, discuss recent applications of supervised machine learning to population genetics that outperform competing methods, and describe promising future directions in this area. Ultimately, we argue that supervised machine learning is an important and underutilized tool that has considerable potential for the world of evolutionary genomics.
2,983 downloads evolutionary biology
The relative importance of different modes of evolution in shaping phenotypic diversity remains a hotly debated question. Fossil data suggest that stasis may be a common mode of evolution, while modern data suggest very fast rates of evolution. One way to reconcile these observations is to imagine that evolution is punctuated, rather than gradual, on geological time scales. To test this hypothesis, we developed a novel maximum likelihood framework for fitting Levy processes to comparative morphological data. This class of stochastic processes includes both a gradual and punctuated component. We found that a plurality of modern vertebrate clades examined are best fit by punctuated processes over models of gradual change, gradual stasis, and adaptive radiation. When we compare our results to theoretical expectations of the rate and speed of regime shifts for models that detail fitness landscape dynamics, we find that our quantitative results are broadly compatible with both microevolutionary models and with observations from the fossil record.
2,957 downloads evolutionary biology
The sample frequency spectrum (SFS), or histogram of allele counts, is an important summary statistic in evolutionary biology, and is often used to infer the history of population size changes, migrations, and other demographic events affecting a set of populations. The expected multipopulation SFS under a given demographic model can be efficiently computed when the populations in the model are related by a tree, scaling to hundreds of populations. Admixture, back-migration, and introgression are common natural processes that violate the assumption of a tree-like population history, however, and until now the expected SFS could be computed for only a handful of populations when the demographic history is not a tree. In this article, we present a new method for efficiently computing the expected SFS and linear functionals of it, for demographies described by general directed acyclic graphs. This method can scale to more populations than previously possible for complex demographic histories including admixture. We apply our method to an 8-population SFS to estimate the timing and strength of a proposed "basal Eurasian" admixture event in human history. We implement and release our method in a new open-source software package momi2.
2,930 downloads evolutionary biology
Population-scale genomic datasets have given researchers incredible amounts of information from which to infer evolutionary histories. Concomitant with this flood of data, theoretical and methodological advances have sought to extract information from genomic sequences to infer demographic events such as population size changes and gene flow among closely related populations/species, construct recombination maps, and uncover loci underlying recent adaptation. To date most methods make use of only one or a few summaries of the input sequences and therefore ignore potentially useful information encoded in the data. The most sophisticated of these approaches involve likelihood calculations, which require theoretical advances for each new problem, and often focus on a single aspect of the data (e.g. only allele frequency information) in the interest of mathematical and computational tractability. Directly interrogating the entirety of the input sequence data in a likelihood-free manner would thus offer a fruitful alternative. Here we accomplish this by representing DNA sequence alignments as images and using a class of deep learning methods called convolutional neural networks (CNNs) to make population genetic inferences from these images. We apply CNNs to a number of evolutionary questions and find that they frequently match or exceed the accuracy of current methods. Importantly, we show that CNNs perform accurate evolutionary model selection and parameter estimation, even on problems that have not received detailed theoretical treatments. Thus, when applied to population genetic alignments, CNN are capable of outperforming expert-derived statistical methods, and offer a new path forward in cases where no likelihood approach exists.
2,927 downloads evolutionary biology
Species delimitation is problematic in many taxa due to the difficulty of evaluating predictions from species delimitation hypotheses, which chiefly relay on subjective interpretations of morphological observations and/or DNA sequence data. This problem is exacerbated in recalcitrant taxa for which genetic resources are scarce and inadequate to resolve questions regarding evolutionary relationships and uniqueness. In this case study we demonstrate the empirical utility of restriction site associated DNA sequencing (RAD-seq) by unambiguously resolving phylogenetic relationships among recalcitrant octocoral taxa with divergences greater than 80 million years. We objectively infer robust species boundaries in the genus Paragorgia, which contains some of the most important ecosystem engineers in the deep-sea, by testing alternative taxonomy-guided or unguided species delimitation hypotheses using the Bayes factors delimitation method (BFD*) with genome-wide single nucleotide polymorphism data. We present conclusive evidence rejecting the current morphological species delimitation model for the genus Paragorgia and indicating the presence of cryptic species boundaries associated with environmental variables. We argue that the suitability limits of RAD-seq for phylogenetic inferences in divergent taxa cannot be assessed in terms of absolute time, but depend on taxon-specific factors such as mutation rate, generation time and effective population size. We show that classic morphological taxonomy can greatly benefit from integrative approaches that provide objective tests to species delimitation hypothesis. Our results pave the way for addressing further questions in biogeography, species ranges, community ecology, population dynamics, conservation, and evolution in octocorals and other marine taxa.
2,909 downloads evolutionary biology
The hundreds of cichlid fish species in Lake Malawi constitute the most extensive recent vertebrate adaptive radiation. Here we characterize its genomic diversity by sequencing 134 individuals covering 73 species across all major lineages. Average sequence divergence between species pairs is only 0.1-0.25%. These divergence values overlap diversity within species, with 82% of heterozygosity shared between species. Phylogenetic analyses suggest that diversification initially proceeded by serial branching from a generalist Astatotilapia-like ancestor. However, no single species tree adequately represents all species relationships, with evidence for substantial gene flow at multiple times. Common signatures of selection on visual and oxygen transport genes shared by distantly related deep water species point to both adaptive introgression and independent selection. These findings enhance our understanding of genomic processes underlying rapid species diversification, and provide a platform for future genetic analysis of the Malawi radiation.
2,895 downloads evolutionary biology
Given genomic variation data from multiple individuals, computing the likelihood of complex population genetic models is often infeasible. To circumvent this problem, we introduce here a novel likelihood-free inference framework by applying deep learning, a powerful modern technique in machine learning. In contrast to Approximate Bayesian Computation, another likelihood-free approach widely used in population genetics and other fields, deep learning does not require a distance function on summary statistics or a rejection step, and it is robust to the addition of uninformative statistics. To demonstrate that deep learning can be effectively employed to estimate population genetic parameters and learn informative features of data, we focus on the challenging problem of jointly inferring natural selection and demography (in the form of a population size change history). Our method is able to separate the global nature of demography from the local nature of selection, without sequential steps for these two factors. Studying demography and selection jointly is motivated by Drosophila, where pervasive selection confounds demographic analysis. We apply our method to 197 African Drosophila melanogaster genomes from Zambia to infer both their overall demography, and regions of their genome under selection. We find many regions of the genome that have experienced hard sweeps, and fewer under selection on standing variation (soft sweep) or balancing selection. Interestingly, we find that soft sweeps and balancing selection occur more frequently closer to the centromere of each chromosome. In addition, our demographic inference suggests that previously estimated bottlenecks for African Drosophila melanogaster are too extreme, likely due in part to the unaccounted impact of selection.
2,868 downloads evolutionary biology
Pyricularia oryzae is a species complex that causes blast disease on more than 50 species of poaceous plants. Pyricularia oryzae has a worldwide distribution as a rice (Oryza) pathogen and in the last century emerged as an important wheat (Triticum) pathogen in southern Brazil. Presently, P. oryzae pathotype Oryza is considered the rice blast pathogen, whereas P. oryzae pathotype Triticum is the wheat blast pathogen. In this study we investigated whether the Oryza and Triticum pathotypes of P. oryzae were distinct at the species level. We also describe a new Pyricularia species causing blast on several other poaceous hosts in Brazil, including wheat. We conducted phylogenetic analyses using 10 housekeeping loci from an extensive sample (N = 128) of sympatric populations of P. oryzae adapted to rice, wheat and other poaceous hosts found in or near wheat fields. The Bayesian phylogenetic analysis grouped the isolates into two major monophyletic clusters (I and II) with high Bayesian probabilities (P = 0.99). Cluster I contained isolates obtained from wheat as well as other Poaceae hosts (P = 0.98). Cluster II was divided into three host-associated clades (Clades 1, 2 and 3; P > 0.75). Clade 1 contained isolates obtained from wheat and other poaceous hosts, Clade 2 contained exclusively wheat-derived isolates, and Clade 3 comprised isolates associated only with rice. Our interpretation was that cluster I and cluster II correspond to two distinct species: Pyricularia graminis-tritici sp. nov. (Pgt), newly described in this study, and Pyricularia oryzae (Po). The host-associated clades found in P. oryzae Cluster II correspond to P. oryzae pathotype Triticum (PoT; Clades 1 and 2), and P. oryzae pathotype Oryza (PoO; Clade 3). No morphological or cultural differences were observed among these species, but a distinctive pathogenicity spectrum was observed. Pgt and PoT were pathogenic and highly aggressive on Triticum aestivum (wheat), Hordeum vulgare (barley), Urochloa brizantha (signal grass) and Avena sativa (oats). PoO was highly virulent on the original rice host (Oryza sativa), and also on wheat, barley, and oats, but not on signal grass. We concluded that blast disease on wheat and its associated Poaceae hosts in Brazil is caused by multiple Pyricularia species: the newly described Pyricularia graminis-tritici sp. nov., and the known P. oryzae pathotypes Triticum and Oryza. To our knowledge, P. graminis-tritici sp. nov. is still restricted to Brazil, but obviously represents a serious threat to wheat cultivation globally.
2,859 downloads evolutionary biology
Whole-genome studies have documented that most Native American ancestry stems from a single population that diversified within the continent more than twelve thousand years ago. However, this shared ancestry hides a more complex history whereby at least four distinct streams of Eurasian migration have contributed to present-day and prehistoric Native American populations. Whole genome studies enhanced by technological breakthroughs in ancient DNA now provide evidence of a sequence of events involving initial migration from a structured Northeast Asian source population, followed by a divergence into northern and southern Native American lineages. During the Holocene, new migrations from Asia introduced the Saqqaq/Dorset Paleoeskimo population to the North American Arctic ~4,500 years ago, ancestry that is potentially connected with ancestry found in Athabaskan-speakers today. This was then followed by a major new population turnover in the high Arctic involving Thule-related peoples who are the ancestors of present-day Inuit. We highlight several open questions that could be addressed through future genomic research.
2,851 downloads evolutionary biology
Phylogenetic trees are routinely visualized to present and interpret the evolutionary relationships of species. Virtually all empirical evolutionary data studies contain a visualization of the inferred tree with branch support values. Ambiguous semantics in tree file formats can lead to erroneous tree visualizations and therefore to incorrect interpretations of phylogenetic analyses. Here, we discuss problems that can and do arise when displaying branch values on trees after re-rooting. Branch values are typically stored as node labels in the widely-used Newick tree format. However, such values are attributes of branches. Storing them as node labels can therefore yield errors when re-rooting trees. This depends on the mostly implicit semantics that tools deploy to interpret node labels. We reviewed 10 tree viewers and 10 bioinformatics toolkits that can display and re-root trees. We found that 14 out of 20 of these tools do not permit users to select the semantics of node labels. Thus, unaware users might obtain incorrect results when rooting trees inferred by common phylogenetic inference programs. We illustrate such incorrect mappings for several test cases and real examples taken from the literature. This review has already led to improvements and workarounds in 8 of the tested tools. We suggest tools should provide an option that explicitly forces users to define the semantics of node labels.
2,800 downloads evolutionary biology
Psilocybin is a psychoactive compound with clinical applications produced by dozens of mushroom species. There has been a longstanding interest in psilocybin research with regard to treatment for addiction, depression, and end-of-life suffering. However, until recently very little was known about psilocybin biosynthesis and its ecological role. Here we confirm and refine recent findings about the genes underpinning psilocybin biosynthesis, discover that there is more than one psilocybin biosynthesis cluster in mushrooms, and we provide the first data directly addressing psilocybin's ecological role. By analysing independent genome assemblies for the hallucinogenic mushrooms Psilocybe cyanescens and Pluteus salicinus we recapture the recently discovered psilocybin biosynthesis cluster and show that a transcription factor previously implicated in its regulation is actually not part of the cluster. Further, we show that the mushroom Inocybe corydalina produces psilocybin but does not contain the established biosynthetic cluster, and we present an alternative cluster. Finally, a meta-transcriptome analysis of wild-collected mushrooms provides evidence for intra-mushroom insect gene expression of flies whose larvae grow inside Psilocybe cyanescens. These larvae were successfully reared into adults. Our results show that psilocybin does not confer complete protection against insect mycophagy, and the hypothesis that it is produced as an adaptive defense compound may need to be reconsidered.
2,779 downloads evolutionary biology
With Next Generation Sequencing Data (NGS) coming off age and being routinely used, evolutionary biology is transforming into a data-driven science. As a consequence, researchers have to rely on a growing number of increasingly complex software. All widely used tools in our field have grown considerably, in terms of the number of features as well as lines of code. In addition, analysis pipelines now include substantially more components than 5-10 years ago. A topic that has received little attention in this context is the code quality of widely used codes. Unfortunately, the majority of users tend to blindly trust software and the results it produces. To this end, we assessed the code quality of 15 highly cited tools (e.g., MrBayes, MAFFT, SweepFinder etc.) from the broader area of evolutionary biology that are used in current data analysis pipelines. We also discuss widely unknown problems associated with floating point arithmetics for representing real numbers on computer systems. Since, the software quality of the tools we analyzed is rather mediocre, we provide a list of best practices for improving the quality of existing tools, but also list techniques that can be deployed for developing reliable, high quality scientific software from scratch. Finally, we also discuss journal and science policy as well as funding issues that need to be addressed for improving software quality as well as ensuring support for developing new and maintaining existing software. Our intention is to raise the awareness of the community regarding software quality issues and to emphasize the substantial lack of funding for scientific software development.
2,753 downloads evolutionary biology
Neanderthals and modern humans came in contact with each other and interbred at least twice in the past 100,000 years. Such contact and interbreeding likely led both to the transmission of viruses novel to either species and to the exchange of adaptive alleles that provided resistance against the same viruses. Here, we show that viruses were responsible for dozens of adaptive introgressions between Neanderthals and modern humans. We identify RNA viruses, specifically lentiviruses and orthomyxoviruses, as likely drivers of introgressions from Neanderthals to Europeans. Our results imply that many introgressions between Neanderthals and modern humans were adaptive, and that host genetic variation can be used to understand ancient viral epidemics, potentially providing important insights regarding current and future epidemics.
2,730 downloads evolutionary biology
There has been much interest in analyzing genome-scale DNA sequence data to infer population histories, but inference methods developed hitherto are limited in model complexity and computational scalability. Here we present an efficient, flexible statistical method, diCal2, that can utilize whole-genome sequence data from multiple populations to infer complex demographic models involving population size changes, population splits, admixture, and migration. Applying our method to data from Australian, East Asian, European, and Papuan populations, we find that the population ancestral to Australians and Papuans started separating from East Asians and Europeans about 100,000 years ago, and that the separation of East Asians and Europeans started about 50,000 years ago, with pervasive gene flow between all pairs of populations.
2,716 downloads evolutionary biology
Explaining the origin and evolutionary dynamics of the genetic architecture of adaptation is a major research goal of evolutionary genetics. Despite controversy surrounding success of the attempts to accomplish this goal, a full understanding of adaptive genetic variation necessitates knowledge about the genomic location and patterns of dispersion for the genetic components affecting fitness-related phenotypic traits. Even with advances in next generation sequencing technologies, the production of full genome sequences for non-model species is often cost prohibitive, especially for tree species such as pines where genome size often exceeds 20 to 30 Gbp. We address this need by constructing a dense linkage map for fox- tail pine (Pinus balfouriana Grev. & Balf.), with the ultimate goal of uncovering and explaining the origin and evolutionary dynamics of adaptive genetic variation in natural populations of this forest tree species. We utilized megagametophyte arrays (n = 76?95 megagametophytes/tree) from four maternal trees in combination with double-digestion restriction site associated DNA sequencing (ddRADseq) to produce a consensus linkage map covering 98.58% of the foxtail pine genome, which was estimated to be 1276 cM in length (95% CI: 1174cM to 1378cM). A novel bioinformatic approach using iterative rounds of marker ordering and imputation was employed to produce single-tree linkage maps (507?17066 contigs/map; lengths: 1037.40?1572.80 cM). These linkage maps were collinear across maternal trees, with highly correlated marker orderings (Spearman's ρ > 0.95). A consensus linkage map derived from these single-tree linkage maps contained 12 linkage groups along which 20 655 contigs were non-randomly distributed across 901 unique positions (n = 23 contigs/position), with an average spacing of 1.34 cM between adjacent positions. Of the 20 655 contigs positioned on the consensus linkage map, 5627 had enough sequence similarity to contigs contained within the most recent build of the loblolly pine (P. taeda L.) genome to identify them as putative homologs containing both genic and non-genic loci. Importantly, all 901 unique positions on the consensus linkage map had at least one contig with putative homology to loblolly pine. When combined with the other biological signals that predominate in our data (e.g., correlations of recombination fractions across single trees), we show that dense linkage maps for non-model forest tree species can be efficiently constructed using next generation sequencing technologies. We subsequently discuss the usefulness of these maps as community-wide resources and as tools with which to test hypotheses about the genetic architecture of adaptation.
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!