Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 62,745 bioRxiv papers from 278,406 authors.
Most downloaded bioRxiv papers, all time
in category evolutionary biology
4,154 results found. For more information, click each entry to expand.
2,969 downloads evolutionary biology
Background: Characterizations of the dynamics of hybrid zones in space and time can give insights about traits and processes important in population divergence and speciation. We characterized a hybrid zone between tanagers in the genus Ramphocelus (Aves, Thraupidae) located in southwestern Colombia. We tested whether this hybrid zone originated as a result of secondary contact or of primary differentiation, and described its dynamics across time using spatial analyses of molecular, morphological, and coloration data in combination with paleodistribution modeling. Results: Models of potential historical distributions based on climatic data and genetic signatures of demographic expansion suggested that the hybrid zone likely originated following secondary contact between populations that expanded their ranges out of isolated areas in the Quaternary. Concordant patterns of variation in phenotypic characters across the hybrid zone and its narrow extent are suggestive of a tension zone, maintained by a balance between dispersal and selection against hybrids. Estimates of phenotypic cline parameters obtained using specimens collected over nearly a century revealed that, in recent decades, the zone appears to have moved to the east and to higher elevations, and has apparently become narrower. Genetic variation was not clearly structured along the hybrid zone, but comparisons between historical and contemporary specimens suggested that temporal changes in its genetic makeup may also have occurred. Conclusions: Our data suggest that the hybrid zone likey resulted from secondary contact between populations. The observed changes in the hybrid zone may be a result of sexual selection, asymmetric gene flow, or environmental change.
2,923 downloads evolutionary biology
The relative importance of different modes of evolution in shaping phenotypic diversity remains a hotly debated question. Fossil data suggest that stasis may be a common mode of evolution, while modern data suggest very fast rates of evolution. One way to reconcile these observations is to imagine that evolution is punctuated, rather than gradual, on geological time scales. To test this hypothesis, we developed a novel maximum likelihood framework for fitting Levy processes to comparative morphological data. This class of stochastic processes includes both a gradual and punctuated component. We found that a plurality of modern vertebrate clades examined are best fit by punctuated processes over models of gradual change, gradual stasis, and adaptive radiation. When we compare our results to theoretical expectations of the rate and speed of regime shifts for models that detail fitness landscape dynamics, we find that our quantitative results are broadly compatible with both microevolutionary models and with observations from the fossil record.
2,882 downloads evolutionary biology
The social hymenoptera are emerging as models for epigenetics. DNA methylation, the addition of a methyl group, is a common epigenetic marker. In mammals and flowering plants methylation affects allele specific expression. There is contradictory evidence for the role of methylation on allele specific expression in social insects. The aim of this paper is to investigate allele specific expression and monoallelic methylation in the bumblebee, Bombus terrestris. We found nineteen genes that were both monoallelically methylated and monoallelically expressed in a single bee. Fourteen of these genes express the hypermethylated allele, while the other five express the hypomethylated allele. We also searched for allele specific expression in twenty-nine published RNA-seq libraries. We found 555 loci with allele-specific expression. We discuss our results with reference to the functional role of methylation in gene expression in insects and in the, as yet unquantified, role of genetic cis effects in insect allele specific methylation and expression.
2,862 downloads evolutionary biology
Given genomic variation data from multiple individuals, computing the likelihood of complex population genetic models is often infeasible. To circumvent this problem, we introduce here a novel likelihood-free inference framework by applying deep learning, a powerful modern technique in machine learning. In contrast to Approximate Bayesian Computation, another likelihood-free approach widely used in population genetics and other fields, deep learning does not require a distance function on summary statistics or a rejection step, and it is robust to the addition of uninformative statistics. To demonstrate that deep learning can be effectively employed to estimate population genetic parameters and learn informative features of data, we focus on the challenging problem of jointly inferring natural selection and demography (in the form of a population size change history). Our method is able to separate the global nature of demography from the local nature of selection, without sequential steps for these two factors. Studying demography and selection jointly is motivated by Drosophila, where pervasive selection confounds demographic analysis. We apply our method to 197 African Drosophila melanogaster genomes from Zambia to infer both their overall demography, and regions of their genome under selection. We find many regions of the genome that have experienced hard sweeps, and fewer under selection on standing variation (soft sweep) or balancing selection. Interestingly, we find that soft sweeps and balancing selection occur more frequently closer to the centromere of each chromosome. In addition, our demographic inference suggests that previously estimated bottlenecks for African Drosophila melanogaster are too extreme, likely due in part to the unaccounted impact of selection.
2,858 downloads evolutionary biology
Species delimitation is problematic in many taxa due to the difficulty of evaluating predictions from species delimitation hypotheses, which chiefly relay on subjective interpretations of morphological observations and/or DNA sequence data. This problem is exacerbated in recalcitrant taxa for which genetic resources are scarce and inadequate to resolve questions regarding evolutionary relationships and uniqueness. In this case study we demonstrate the empirical utility of restriction site associated DNA sequencing (RAD-seq) by unambiguously resolving phylogenetic relationships among recalcitrant octocoral taxa with divergences greater than 80 million years. We objectively infer robust species boundaries in the genus Paragorgia, which contains some of the most important ecosystem engineers in the deep-sea, by testing alternative taxonomy-guided or unguided species delimitation hypotheses using the Bayes factors delimitation method (BFD*) with genome-wide single nucleotide polymorphism data. We present conclusive evidence rejecting the current morphological species delimitation model for the genus Paragorgia and indicating the presence of cryptic species boundaries associated with environmental variables. We argue that the suitability limits of RAD-seq for phylogenetic inferences in divergent taxa cannot be assessed in terms of absolute time, but depend on taxon-specific factors such as mutation rate, generation time and effective population size. We show that classic morphological taxonomy can greatly benefit from integrative approaches that provide objective tests to species delimitation hypothesis. Our results pave the way for addressing further questions in biogeography, species ranges, community ecology, population dynamics, conservation, and evolution in octocorals and other marine taxa.
2,839 downloads evolutionary biology
The hundreds of cichlid fish species in Lake Malawi constitute the most extensive recent vertebrate adaptive radiation. Here we characterize its genomic diversity by sequencing 134 individuals covering 73 species across all major lineages. Average sequence divergence between species pairs is only 0.1-0.25%. These divergence values overlap diversity within species, with 82% of heterozygosity shared between species. Phylogenetic analyses suggest that diversification initially proceeded by serial branching from a generalist Astatotilapia-like ancestor. However, no single species tree adequately represents all species relationships, with evidence for substantial gene flow at multiple times. Common signatures of selection on visual and oxygen transport genes shared by distantly related deep water species point to both adaptive introgression and independent selection. These findings enhance our understanding of genomic processes underlying rapid species diversification, and provide a platform for future genetic analysis of the Malawi radiation.
2,818 downloads evolutionary biology
The sample frequency spectrum (SFS), or histogram of allele counts, is an important summary statistic in evolutionary biology, and is often used to infer the history of population size changes, migrations, and other demographic events affecting a set of populations. The expected multipopulation SFS under a given demographic model can be efficiently computed when the populations in the model are related by a tree, scaling to hundreds of populations. Admixture, back-migration, and introgression are common natural processes that violate the assumption of a tree-like population history, however, and until now the expected SFS could be computed for only a handful of populations when the demographic history is not a tree. In this article, we present a new method for efficiently computing the expected SFS and linear functionals of it, for demographies described by general directed acyclic graphs. This method can scale to more populations than previously possible for complex demographic histories including admixture. We apply our method to an 8-population SFS to estimate the timing and strength of a proposed "basal Eurasian" admixture event in human history. We implement and release our method in a new open-source software package momi2.
2,816 downloads evolutionary biology
Population-scale genomic datasets have given researchers incredible amounts of information from which to infer evolutionary histories. Concomitant with this flood of data, theoretical and methodological advances have sought to extract information from genomic sequences to infer demographic events such as population size changes and gene flow among closely related populations/species, construct recombination maps, and uncover loci underlying recent adaptation. To date most methods make use of only one or a few summaries of the input sequences and therefore ignore potentially useful information encoded in the data. The most sophisticated of these approaches involve likelihood calculations, which require theoretical advances for each new problem, and often focus on a single aspect of the data (e.g. only allele frequency information) in the interest of mathematical and computational tractability. Directly interrogating the entirety of the input sequence data in a likelihood-free manner would thus offer a fruitful alternative. Here we accomplish this by representing DNA sequence alignments as images and using a class of deep learning methods called convolutional neural networks (CNNs) to make population genetic inferences from these images. We apply CNNs to a number of evolutionary questions and find that they frequently match or exceed the accuracy of current methods. Importantly, we show that CNNs perform accurate evolutionary model selection and parameter estimation, even on problems that have not received detailed theoretical treatments. Thus, when applied to population genetic alignments, CNN are capable of outperforming expert-derived statistical methods, and offer a new path forward in cases where no likelihood approach exists.
2,812 downloads evolutionary biology
Pyricularia oryzae is a species complex that causes blast disease on more than 50 species of poaceous plants. Pyricularia oryzae has a worldwide distribution as a rice (Oryza) pathogen and in the last century emerged as an important wheat (Triticum) pathogen in southern Brazil. Presently, P. oryzae pathotype Oryza is considered the rice blast pathogen, whereas P. oryzae pathotype Triticum is the wheat blast pathogen. In this study we investigated whether the Oryza and Triticum pathotypes of P. oryzae were distinct at the species level. We also describe a new Pyricularia species causing blast on several other poaceous hosts in Brazil, including wheat. We conducted phylogenetic analyses using 10 housekeeping loci from an extensive sample (N = 128) of sympatric populations of P. oryzae adapted to rice, wheat and other poaceous hosts found in or near wheat fields. The Bayesian phylogenetic analysis grouped the isolates into two major monophyletic clusters (I and II) with high Bayesian probabilities (P = 0.99). Cluster I contained isolates obtained from wheat as well as other Poaceae hosts (P = 0.98). Cluster II was divided into three host-associated clades (Clades 1, 2 and 3; P > 0.75). Clade 1 contained isolates obtained from wheat and other poaceous hosts, Clade 2 contained exclusively wheat-derived isolates, and Clade 3 comprised isolates associated only with rice. Our interpretation was that cluster I and cluster II correspond to two distinct species: Pyricularia graminis-tritici sp. nov. (Pgt), newly described in this study, and Pyricularia oryzae (Po). The host-associated clades found in P. oryzae Cluster II correspond to P. oryzae pathotype Triticum (PoT; Clades 1 and 2), and P. oryzae pathotype Oryza (PoO; Clade 3). No morphological or cultural differences were observed among these species, but a distinctive pathogenicity spectrum was observed. Pgt and PoT were pathogenic and highly aggressive on Triticum aestivum (wheat), Hordeum vulgare (barley), Urochloa brizantha (signal grass) and Avena sativa (oats). PoO was highly virulent on the original rice host (Oryza sativa), and also on wheat, barley, and oats, but not on signal grass. We concluded that blast disease on wheat and its associated Poaceae hosts in Brazil is caused by multiple Pyricularia species: the newly described Pyricularia graminis-tritici sp. nov., and the known P. oryzae pathotypes Triticum and Oryza. To our knowledge, P. graminis-tritici sp. nov. is still restricted to Brazil, but obviously represents a serious threat to wheat cultivation globally.
2,802 downloads evolutionary biology
Whole-genome studies have documented that most Native American ancestry stems from a single population that diversified within the continent more than twelve thousand years ago. However, this shared ancestry hides a more complex history whereby at least four distinct streams of Eurasian migration have contributed to present-day and prehistoric Native American populations. Whole genome studies enhanced by technological breakthroughs in ancient DNA now provide evidence of a sequence of events involving initial migration from a structured Northeast Asian source population, followed by a divergence into northern and southern Native American lineages. During the Holocene, new migrations from Asia introduced the Saqqaq/Dorset Paleoeskimo population to the North American Arctic ~4,500 years ago, ancestry that is potentially connected with ancestry found in Athabaskan-speakers today. This was then followed by a major new population turnover in the high Arctic involving Thule-related peoples who are the ancestors of present-day Inuit. We highlight several open questions that could be addressed through future genomic research.
2,762 downloads evolutionary biology
With Next Generation Sequencing Data (NGS) coming off age and being routinely used, evolutionary biology is transforming into a data-driven science. As a consequence, researchers have to rely on a growing number of increasingly complex software. All widely used tools in our field have grown considerably, in terms of the number of features as well as lines of code. In addition, analysis pipelines now include substantially more components than 5-10 years ago. A topic that has received little attention in this context is the code quality of widely used codes. Unfortunately, the majority of users tend to blindly trust software and the results it produces. To this end, we assessed the code quality of 15 highly cited tools (e.g., MrBayes, MAFFT, SweepFinder etc.) from the broader area of evolutionary biology that are used in current data analysis pipelines. We also discuss widely unknown problems associated with floating point arithmetics for representing real numbers on computer systems. Since, the software quality of the tools we analyzed is rather mediocre, we provide a list of best practices for improving the quality of existing tools, but also list techniques that can be deployed for developing reliable, high quality scientific software from scratch. Finally, we also discuss journal and science policy as well as funding issues that need to be addressed for improving software quality as well as ensuring support for developing new and maintaining existing software. Our intention is to raise the awareness of the community regarding software quality issues and to emphasize the substantial lack of funding for scientific software development.
2,743 downloads evolutionary biology
Phylogenetic trees are routinely visualized to present and interpret the evolutionary relationships of species. Virtually all empirical evolutionary data studies contain a visualization of the inferred tree with branch support values. Ambiguous semantics in tree file formats can lead to erroneous tree visualizations and therefore to incorrect interpretations of phylogenetic analyses. Here, we discuss problems that can and do arise when displaying branch values on trees after re-rooting. Branch values are typically stored as node labels in the widely-used Newick tree format. However, such values are attributes of branches. Storing them as node labels can therefore yield errors when re-rooting trees. This depends on the mostly implicit semantics that tools deploy to interpret node labels. We reviewed 10 tree viewers and 10 bioinformatics toolkits that can display and re-root trees. We found that 14 out of 20 of these tools do not permit users to select the semantics of node labels. Thus, unaware users might obtain incorrect results when rooting trees inferred by common phylogenetic inference programs. We illustrate such incorrect mappings for several test cases and real examples taken from the literature. This review has already led to improvements and workarounds in 8 of the tested tools. We suggest tools should provide an option that explicitly forces users to define the semantics of node labels.
2,707 downloads evolutionary biology
Neanderthals and modern humans came in contact with each other and interbred at least twice in the past 100,000 years. Such contact and interbreeding likely led both to the transmission of viruses novel to either species and to the exchange of adaptive alleles that provided resistance against the same viruses. Here, we show that viruses were responsible for dozens of adaptive introgressions between Neanderthals and modern humans. We identify RNA viruses, specifically lentiviruses and orthomyxoviruses, as likely drivers of introgressions from Neanderthals to Europeans. Our results imply that many introgressions between Neanderthals and modern humans were adaptive, and that host genetic variation can be used to understand ancient viral epidemics, potentially providing important insights regarding current and future epidemics.
2,678 downloads evolutionary biology
Explaining the origin and evolutionary dynamics of the genetic architecture of adaptation is a major research goal of evolutionary genetics. Despite controversy surrounding success of the attempts to accomplish this goal, a full understanding of adaptive genetic variation necessitates knowledge about the genomic location and patterns of dispersion for the genetic components affecting fitness-related phenotypic traits. Even with advances in next generation sequencing technologies, the production of full genome sequences for non-model species is often cost prohibitive, especially for tree species such as pines where genome size often exceeds 20 to 30 Gbp. We address this need by constructing a dense linkage map for fox- tail pine (Pinus balfouriana Grev. & Balf.), with the ultimate goal of uncovering and explaining the origin and evolutionary dynamics of adaptive genetic variation in natural populations of this forest tree species. We utilized megagametophyte arrays (n = 76?95 megagametophytes/tree) from four maternal trees in combination with double-digestion restriction site associated DNA sequencing (ddRADseq) to produce a consensus linkage map covering 98.58% of the foxtail pine genome, which was estimated to be 1276 cM in length (95% CI: 1174cM to 1378cM). A novel bioinformatic approach using iterative rounds of marker ordering and imputation was employed to produce single-tree linkage maps (507?17066 contigs/map; lengths: 1037.40?1572.80 cM). These linkage maps were collinear across maternal trees, with highly correlated marker orderings (Spearman's ρ > 0.95). A consensus linkage map derived from these single-tree linkage maps contained 12 linkage groups along which 20 655 contigs were non-randomly distributed across 901 unique positions (n = 23 contigs/position), with an average spacing of 1.34 cM between adjacent positions. Of the 20 655 contigs positioned on the consensus linkage map, 5627 had enough sequence similarity to contigs contained within the most recent build of the loblolly pine (P. taeda L.) genome to identify them as putative homologs containing both genic and non-genic loci. Importantly, all 901 unique positions on the consensus linkage map had at least one contig with putative homology to loblolly pine. When combined with the other biological signals that predominate in our data (e.g., correlations of recombination fractions across single trees), we show that dense linkage maps for non-model forest tree species can be efficiently constructed using next generation sequencing technologies. We subsequently discuss the usefulness of these maps as community-wide resources and as tools with which to test hypotheses about the genetic architecture of adaptation.
2,670 downloads evolutionary biology
The size, shape and structure of insect wings are intimately linked to their ability to fly. However, there are few systematic studies of the variability of the natural patterns in wing morphology across insects. We assemble a comprehensive dataset of insect wings and analyze their morphology using topological and geometric notions in terms of i) wing size and contour shape, ii) vein geometry and topology, and iii) shape and distribution of wing membrane domains. These morphospaces are a first-step in defining the diversity of wing patterns across insect orders and set the stage for investigating their functional consequences.
2,660 downloads evolutionary biology
Higher paternal age at offspring conception increases de novo genetic mutations (Kong et al., 2012). Based on evolutionary genetic theory we predicted that the offspring of older fathers would be less likely to survive and reproduce, i.e. have lower fitness. In a sibling control study, we find clear support for negative paternal age effects on offspring survival, mating and reproductive success across four large populations with an aggregate N > 1.3 million in main analyses. Compared to a sibling born when the father was 10 years younger, individuals had 4-13% fewer surviving children in the four populations. Three populations were pre-industrial (1670-1850) Western populations and showed a pattern of paternal age effects across the offspring's lifespan. In 20th-century Sweden, we found no negative paternal age effects on child survival or marriage odds. Effects survived tests for competing explanations, including maternal age and parental loss. To the extent that we succeeded in isolating a mutation-driven effect of paternal age, our results can be understood to show that de novo mutations reduce offspring fitness across populations and time. We can use this understanding to predict the effect of increasingly delayed reproduction on offspring genetic load, mortality and fertility.
2,629 downloads evolutionary biology
There has been much interest in analyzing genome-scale DNA sequence data to infer population histories, but inference methods developed hitherto are limited in model complexity and computational scalability. Here we present an efficient, flexible statistical method, diCal2, that can utilize whole-genome sequence data from multiple populations to infer complex demographic models involving population size changes, population splits, admixture, and migration. Applying our method to data from Australian, East Asian, European, and Papuan populations, we find that the population ancestral to Australians and Papuans started separating from East Asians and Europeans about 100,000 years ago, and that the separation of East Asians and Europeans started about 50,000 years ago, with pervasive gene flow between all pairs of populations.
2,597 downloads evolutionary biology
Morphological and archaeological studies suggest that the Americas were first occupied by non-Mongoloids with Australo-Melanesian traits (the Paleoamerican model), which was subsequently followed by Southwest Europeans coming in along the pack ice of the North Atlantic Ocean (the Solutrean model) and by East Asians and Siberians arriving by way of the Bering Strait. Past DNA studies, however, have produced different accounts. With a better understanding of genetic diversity, we have now reinterpreted public DNA data. Consistent with our recent finding of a close relationship between South Pacific populations and Denisovans or Neanderthals who were archaic Africans with Eurasian admixtures, the ~9500 year old Kennewick Man skeleton with Australo-Melanesian affinity from North America was about equally related to Europeans and Africans, least related to East Asians among present-day people, and most related to the ~42000 year old Neanderthal Mezmaiskaya-2 from Adygea Russia among ancient Eurasian DNAs. The ~12700 year old Anzick-1 of the Clovis culture was most related to the ~18720 year old El Miron of the Magdalenian culture in Spain among ancient DNAs. Amerindian mtDNA haplotypes, unlike their Eurasian sister haplotypes, share informative SNPs with Australo-Melanesians, Africans, or Neanderthals. These results suggest a unifying account of informative findings on the settlement of the Americas.
2,575 downloads evolutionary biology
Gene duplication is a fundamental process in genome evolution. However, most young duplicates are degraded into pseudogenes by loss-of-function mutations, and the factors that allow some duplicate pairs to survive long-term remain controversial. One class of models to explain duplicate retention invokes sub- or neofunctionalization, especially through evolution of gene expression, while other models focus on sharing of gene dosage. While studies of whole genome duplications tend to support dosage sharing, the primary mechanisms in mammals-where duplications are small-scale and thus disrupt dosage balance-are unclear. Using RNA-seq data from 46 human and 26 mouse tissues we find that subfunctionalization of expression evolves slowly, and is rare among duplicates that arose within the placental mammals. A major impediment to subfunctionalization is that tandem duplicates tend to be co-regulated by shared genomic elements, in contrast to the standard assumption of modularity of gene expression. Instead, consistent with the dosage-sharing hypothesis, most young duplicates are down-regulated to match expression of outgroup singleton genes. Our data suggest that dosage sharing of expression is a key factor in the initial survival of mammalian duplicates, followed by slower functional adaptation enabling long-term preservation.
2,541 downloads evolutionary biology
Mashaal Sohail, Robert M. Maier, Andrea Ganna, Alex Bloemendal, Alicia R Martin, Michael C. Turchin, Charleston W. K. Chiang, Joel N Hirschhorn, Mark J. Daly, Nick Patterson, Benjamin M Neale, Iain Mathieson, David Reich, Shamil R. Sunyaev
Genetic predictions of height differ among human populations and these differences are too large to be explained by genetic drift. This observation has been interpreted as evidence of polygenic adaptation. Differences across populations were detected using SNPs genome-wide significantly associated with height, and many studies also found that the signals grew stronger when large numbers of sub-significant SNPs were analyzed. This has led to excitement about the prospect of analyzing large fractions of the genome to detect subtle signals of selection and claims of polygenic adaptation for multiple traits. Polygenic adaptation studies of height have been based on SNP effect size measurements in the GIANT Consortium meta-analysis. Here we repeat the height analyses in the UK Biobank, a much more homogeneously designed study. Our results show that polygenic adaptation signals based on large numbers of SNPs below genome-wide significance are extremely sensitive to biases due to uncorrected population structure.
- Top preprints of 2018
- Paper search
- Author leaderboards
- Overall metrics
- The API
- Email newsletter
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!