Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 66,827 bioRxiv papers from 294,255 authors.
Most downloaded bioRxiv papers, all time
in category evolutionary biology
4,383 results found. For more information, click each entry to expand.
2,655 downloads evolutionary biology
Morphological and archaeological studies suggest that the Americas were first occupied by non-Mongoloids with Australo-Melanesian traits (the Paleoamerican model), which was subsequently followed by Southwest Europeans coming in along the pack ice of the North Atlantic Ocean (the Solutrean model) and by East Asians and Siberians arriving by way of the Bering Strait. Past DNA studies, however, have produced different accounts. With a better understanding of genetic diversity, we have now reinterpreted public DNA data. Consistent with our recent finding of a close relationship between South Pacific populations and Denisovans or Neanderthals who were archaic Africans with Eurasian admixtures, the ~9500 year old Kennewick Man skeleton with Australo-Melanesian affinity from North America was about equally related to Europeans and Africans, least related to East Asians among present-day people, and most related to the ~42000 year old Neanderthal Mezmaiskaya-2 from Adygea Russia among ancient Eurasian DNAs. The ~12700 year old Anzick-1 of the Clovis culture was most related to the ~18720 year old El Miron of the Magdalenian culture in Spain among ancient DNAs. Amerindian mtDNA haplotypes, unlike their Eurasian sister haplotypes, share informative SNPs with Australo-Melanesians, Africans, or Neanderthals. These results suggest a unifying account of informative findings on the settlement of the Americas.
2,616 downloads evolutionary biology
The vast majority of human mutations have minor allele frequencies (MAF) under 1%, with the plurality observed only once (i.e., “singletons”). While Mendelian diseases are predominantly caused by rare alleles, their cumulative contribution to complex phenotypes remains largely unknown. We develop and rigorously validate an approach to jointly estimate the contribution of all alleles, including singletons, to phenotypic variation. We apply our approach to transcriptional regulation, an intermediate between genetic variation and complex disease. Using whole genome DNA and lymphoblastoid cell line RNA sequencing data from 360 European individuals, we conservatively estimate that singletons contribute ~25% of cis-heritability across genes (dwarfing the contributions of other frequencies). Strikingly, the majority (~76%) of singleton heritability derives from ultra-rare variants absent from thousands of additional samples. We develop a novel inference procedure to demonstrate that our results are consistent with rampant purifying selection shaping the regulatory architecture of most human genes.
2,594 downloads evolutionary biology
Gene duplication is a fundamental process in genome evolution. However, most young duplicates are degraded into pseudogenes by loss-of-function mutations, and the factors that allow some duplicate pairs to survive long-term remain controversial. One class of models to explain duplicate retention invokes sub- or neofunctionalization, especially through evolution of gene expression, while other models focus on sharing of gene dosage. While studies of whole genome duplications tend to support dosage sharing, the primary mechanisms in mammals-where duplications are small-scale and thus disrupt dosage balance-are unclear. Using RNA-seq data from 46 human and 26 mouse tissues we find that subfunctionalization of expression evolves slowly, and is rare among duplicates that arose within the placental mammals. A major impediment to subfunctionalization is that tandem duplicates tend to be co-regulated by shared genomic elements, in contrast to the standard assumption of modularity of gene expression. Instead, consistent with the dosage-sharing hypothesis, most young duplicates are down-regulated to match expression of outgroup singleton genes. Our data suggest that dosage sharing of expression is a key factor in the initial survival of mammalian duplicates, followed by slower functional adaptation enabling long-term preservation.
2,594 downloads evolutionary biology
Mashaal Sohail, Robert M. Maier, Andrea Ganna, Alex Bloemendal, Alicia R Martin, Michael C. Turchin, Charleston W. K. Chiang, Joel N Hirschhorn, Mark J. Daly, Nick Patterson, Benjamin M Neale, Iain Mathieson, David Reich, Shamil R. Sunyaev
Genetic predictions of height differ among human populations and these differences are too large to be explained by genetic drift. This observation has been interpreted as evidence of polygenic adaptation. Differences across populations were detected using SNPs genome-wide significantly associated with height, and many studies also found that the signals grew stronger when large numbers of sub-significant SNPs were analyzed. This has led to excitement about the prospect of analyzing large fractions of the genome to detect subtle signals of selection and claims of polygenic adaptation for multiple traits. Polygenic adaptation studies of height have been based on SNP effect size measurements in the GIANT Consortium meta-analysis. Here we repeat the height analyses in the UK Biobank, a much more homogeneously designed study. Our results show that polygenic adaptation signals based on large numbers of SNPs below genome-wide significance are extremely sensitive to biases due to uncorrected population structure.
2,583 downloads evolutionary biology
A central problem in evolutionary biology is to infer the full genealogical history of a set of DNA sequences. This history contains rich information about the forces that have influenced a sexually reproducing species. However, existing methods are limited: the most accurate is unable to cope with more than a few dozen samples. With modern genetic data sets rapidly approaching millions of genomes, there is an urgent need for efficient inference methods to exploit such rich resources. We introduce an algorithm to infer whole-genome history which has comparable accuracy to the state-of-the-art but can process around four orders of magnitude more sequences. Additionally, our method results in an "evolutionary encoding" of the original sequence data, enabling efficient access to genealogies and calculation of genetic statistics over the data. We apply this technique to human data from the 1000 Genomes Project, Simons Genome Diversity Project and UK Biobank, showing that the genealogies we estimate are both rich in biological signal and efficient to process.
2,543 downloads evolutionary biology
Powerful approaches to inferring recent or current population structure based on nearest neighbour haplotype coancestry have so far been inaccessible to users without high quality genome-wide haplotype data. With a boom in non-model organism genomics, there is a pressing need to bring these methods to communities without access to such data. Here we present RADpainter, a new program designed to infer the coancestry matrix from restriction-site-associated DNA sequencing (RADseq) data. We combine this program together with a previously published MCMC clustering algorithm into fineRADstructure - a complete, easy to use, and fast population inference package for RADseq data (https://github.com/millanek/fineRADstructure). Finally, with two example datasets, we illustrate its use, benefits, and robustness to missing RAD alleles in double digest RAD sequencing.
2,534 downloads evolutionary biology
With the increasing use of massively parallel sequencing approaches in evolutionary biology, the need for fast and accurate methods suitable to investigate genetic structure and evolutionary history are more important than ever. We propose new distance measures for estimating genetic distances between individuals when allelic variation, gene dosage and recombination could compromise standard approaches. We present four distance measures based on single nucleotide polymorphisms (SNP) and evaluate them against previously published measures using coalescent- based simulations. Simulations were used to test (i) whether the measures give unbiased and accurate distance estimates, (ii) whether they can accurately identify the genomic mixture of hybrid individuals and (iii) whether they give precise (low variance) estimates. The effect of rate variation among genes and recombination was also investigated. The results showed that the SNP-based GENPOFAD distance we propose appears to work well in the widest range of circumstances. It was the most accurate and precise method for estimating genetic distances and is also relatively good at estimating the genomic mixture of hybrid individuals. Our simulations provide benchmarks to compare the performance of different method that estimate genetic distances between organisms.
2,516 downloads evolutionary biology
Learning how complex traits like eyes originate is fundamental for understanding evolution. Here, we first sketch historical perspectives on trait origins and argue that new technologies offer key new insights. Next, we articulate four open questions about trait origins. To address them, we define a research program to break complex traits into components and study the individual evolutionary histories of those parts. By doing so, we can learn when the parts came together and perhaps understand why they stayed together. We apply the approach to five structural innovations critical for complex eyes, reviewing the history of the parts of each of those innovations. Photoreceptors evolved within animals by bricolage, recombining genes that originated far earlier. Multiple genes used in eyes today had ancestral roles in stress responses. We hypothesize that photo-stress could have increased the chance those genes were expressed together in places on animals where light was abundant.
2,485 downloads evolutionary biology
Tzachi Hagai, Xi Chen, Ricardo J Miragaia, Tomás Gomes, Raghd Rostom, Natalia Kunowska, Valentina Proserpio, Giacomo Donati, Lara Bossini-Castillo, Guy Naamati, Guy Emerton, Gosia Trynka, Ivanela Kondova, Mike Denis, Sarah A Teichmann
As the first line of defence against pathogens, cells mount an innate immune response, which is highly variable from cell to cell. The response must be potent yet carefully controlled to avoid self-damage. How these constraints have shaped the evolution of innate immunity remains poorly understood. Here, we characterise this programme's transcriptional divergence between species and expression variability across cells. Using bulk and single-cell transcriptomics in primate and rodent fibroblasts challenged with an immune stimulus, we reveal a striking architecture of the innate immune response. Transcriptionally diverging genes, including cytokines and chemokines, vary across cells and have distinct promoter structures. Conversely, genes involved in response regulation, such as transcription factors and kinases, are conserved between species and display low cell-to-cell variability. We suggest that this unique expression pattern, observed across species and conditions, has evolved as a mechanism for fine-tuned regulation, to achieve an effective but balanced response.
2,483 downloads evolutionary biology
A number of open questions in human evolutionary genetics would become tractable if we were able to directly measure evolutionary fitness. As a step towards this goal, we developed a method to examine whether individual genetic variants, or sets of genetic variants, currently influence viability. The approach consists in testing whether the frequency of an allele varies across ages, accounting for variation in ancestry. We applied it to the Genetic Epidemiology Research on Aging (GERA) cohort and to the parents of participants in the UK Biobank. Across the genome, we find only a few common variants with large effects on age-specific mortality: tagging the APOE ϵ4 allele and near CHRNA3. These results suggest that when large, even late onset effects are kept at low frequency by purifying selection. Testing viability effects of sets of genetic variants that jointly influence one of 42 traits, we detect a number of strong signals. In participants of the UK Biobank study of British ancestry, we find that variants that delay puberty timing are enriched in longer-lived parents (P~6×10-6 for fathers and P~2×10-3 for mothers), consistent with epidemiological studies. Similarly, in mothers, variants associated with later age at first birth are associated with a longer lifespan (P~1×10-3). Signals are also observed for variants influencing cholesterol levels, risk of coronary artery disease, body mass index, as well as risk of asthma. These signals exhibit consistent effects in the GERA cohort and among participants of the UK Biobank of non-British ancestry. Moreover, we see marked differences between males and females, most notably at the CHRNA3 locus, and variants associated with risk of coronary artery disease and cholesterol levels. Beyond our findings, the analysis serves as a proof of principle for how upcoming biomedical datasets can be used to learn about selection effects in contemporary humans.
2,468 downloads evolutionary biology
A powerful way to detect selection in a population is by modeling local allele frequency changes in a particular region of the genome under scenarios of selection and neutrality, and finding which model is most compatible with the data. Chen et al. (2010) developed a composite likelihood method called XP-CLR that uses an outgroup population to detect departures from neutrality which could be compatible with hard or soft sweeps, at linked sites near a beneficial allele. However, this method is most sensitive to recent selection and may miss selective events that happened a long time ago. To overcome this, we developed an extension of XP-CLR that jointly models the behavior of a selected allele in a three-population tree. Our method - called 3P-CLR - outperforms XP-CLR when testing for selection that occurred before two populations split from each other, and can distinguish between those events and events that occurred specifically in each of the populations after the split. We applied our new test to population genomic data from the 1000 Genomes Project, to search for selective sweeps that occurred before the split of Yoruba and Eurasians, but after their split from Neanderthals, and that could have led to the spread of modern-human-specific phenotypes. We also searched for sweep events that occurred in East Asians, Europeans and the ancestors of both populations, after their split from Yoruba. In both cases, we are able to confirm a number of regions identified by previous methods, and find several new candidates for selection in recent and ancient times. For some of these, we also find suggestive functional mutations that may have driven the selective events.
2,446 downloads evolutionary biology
Lehti Saag, Liivi Varul, Christiana Lyn Scheib, Jesper Stenderup, Morten E. Allentoft, Lauri Saag, Luca Pagani, Maere Reidla, Kristiina Tambets, Ene Metspalu, Aivar Kriiska, Eske Willerslev, Toomas Kivisild, Mait Metspalu
Farming-based economies appear relatively late in Northeast Europe and the extent to which they involve genetic ancestry change is still poorly understood. Here we present the analyses of low coverage whole genome sequence data from five hunter-gatherers and five farmers of Estonia dated to 4,500 to 6,300 years before present. We find evidence of significant differences between the two groups in the composition of autosomal as well as mtDNA, X and Y chromosome ancestries. We find that Estonian hunter-gatherers of Comb Ceramic Culture are closest to Eastern hunter-gatherers. The Estonian first farmers of Corded Ware Culture show high similarity in their autosomes with Steppe Belt Late Neolithic/Bronze Age individuals, Caucasus hunter-gatherers and Iranian farmers while their X chromosomes are most closely related with the European Early Farmers of Anatolian descent. These findings suggest that the shift to intensive cultivation and animal husbandry in Estonia was triggered by the arrival of new people with predominantly Steppe ancestry, but whose ancestors had undergone sex-specific admixture with early farmers with Anatolian ancestry.
2,445 downloads evolutionary biology
The testis expresses the largest number of genes of any mammalian organ, a finding that has long puzzled molecular biologists. Analyzing our single-cell transcriptomic maps of human and mouse spermatogenesis, we provide evidence that this widespread transcription serves to maintain DNA sequence integrity in the male germline by correcting DNA damage through 'transcriptional scanning'. Supporting this model, we find that genes expressed during spermatogenesis display lower mutation rates on the transcribed strand and have low diversity in the population. Moreover, this effect is fine-tuned by the level of gene expression during spermatogenesis. The unexpressed genes, which in our model do not benefit from transcriptional scanning, diverge faster over evolutionary time-scales and are enriched for sensory and immune-defense functions. Collectively, we propose that transcriptional scanning modulates germline mutation rates in a gene-specific manner, maintaining DNA sequence integrity for the bulk of genes but allowing for fast evolution in a specific subset.
2,445 downloads evolutionary biology
Making meaningful inferences from phylogenetic comparative data requires a meaningful model of trait evolution. It is thus important to determine whether the model is appropriate for the data and the question being addressed. One way to assess this is to ask whether the model provides a good statistical explanation for the variation in the data. To date, researchers have focused primarily on the explanatory power of a model relative to alternative models. Methods have been developed to assess the adequacy, or absolute explanatory power, of phylogenetic trait models but these have been restricted to specific models or questions. Here we present a general statistical framework for assessing the adequacy of phylogenetic trait models. We use our approach to evaluate the statistical performance of commonly used trait models on 337 comparative datasets covering three key Angiosperm functional traits. In general, the models we tested often provided poor statistical explanations for the evolution of these traits. This was true for many different groups and at many different scales. Whether such statistical inadequacy will qualitatively alter inferences draw from comparative datasets will depend on the context. Regardless, assessing model adequacy can provide interesting biological insights -- how and why a model fails to describe variation in a dataset gives us clues about what evolutionary processes may have driven trait evolution across time.
2,425 downloads evolutionary biology
Nathaniel B Edelman, Paul Frandsen, Michael Miyagi, Bernardo J. Clavijo, John Davey, Rebecca Dikow, Gonzalo Garcia Accinelli, Steven Van Belleghem, Nick Patterson, Daniel E. Neafsey, Richard Challis, Sujai Kumar, Gilson Moreira, Camilo Salazar, Mathieu Chouteau, Brian Counterman, Riccardo Papa, Mark Blaxter, Robert Reed, Kanchon Dasmahapatra, Marcus Kronforst, Mathieu Joron, Chris D Jiggins, W. Owen McMillan, Federica Di-Palma, Andrew J. Blumberg, John Wakeley, David Jaffe, James Mallet
We here pioneer a low-cost assembly strategy for 20 Heliconiini genomes to characterize the evolutionary history of the rapidly radiating genus Heliconius. A bifurcating tree provides a poor fit to the data, and we therefore explore a reticulate phylogeny for Heliconius. We probe the genomic architecture of gene flow, and develop a new method to distinguish incomplete lineage sorting from introgression. We find that most loci with non-canonical histories arose through introgression, and are strongly underrepresented in regions of low recombination and high gene density. This is expected if introgressed alleles are more likely to be purged in such regions due to tighter linkage with incompatibility loci. Finally, we identify a hitherto unrecognized inversion, and show it is a convergent structural rearrangement that captures a known color pattern switch locus within the genus. Our multi-genome assembly approach enables an improved understanding of adaptive radiation.
2,408 downloads evolutionary biology
The relatively narrow range of genetic polymorphism levels across species has been a major source of debate since the inception of molecular population genetics. Recently Corbett-Detig et al found evidence that linked selection strongly constrains levels of polymorphism in species with large census sizes. Here I reexamine this claim and find weak support for this conclusion. While linked selection is an important determinant of polymorphism levels along the genome in many species, we currently lack compelling evidence that it is a major determinant of polymorphism levels among obligately sexual species.
2,401 downloads evolutionary biology
Life inside ant colonies is orchestrated with a diverse set of pheromones, but it is not clear how ants perceive these social cues. It has been proposed that pheromone perception in ants evolved via expansions in the numbers of odorant receptors (ORs) and antennal lobe glomeruli. Here we generate the first mutant lines in ants by disrupting orco, a gene required for the function of all ORs. We find that orco mutants exhibit severe deficiencies in social behavior and fitness, suggesting that they are unable to perceive pheromones. Surprisingly, unlike in Drosophila melanogaster, orco mutant ants also lack most of the approximately 500 antennal lobe glomeruli found in wild-types. These results illustrate that ORs are essential for ant social organization, and raise the possibility that, similar to mammals, receptor function is required for the development and/or maintenance of the highly complex olfactory processing areas in the ant brain.
2,386 downloads evolutionary biology
Genome size evolution is a fundamental problem in molecular evolution. Statistical analysis of genome sizes brings new insight into the evolution of genome size. Although the variation of genome sizes is complicated, it is indicated that the genome size evolution can be explained more clearly at taxon level than at species level. I find that the genome size distribution for species in a taxon fits log-normal distribution. And I find a relationship between the phylogeny of life and the statistical features of genome size distributions among taxa. I observed different statistical features of genome size distributions between animal taxa and plant taxa. A log-normal stochastic process model is developed to simulate the genome size evolution. The simulation results on the log-normal distributions of genome sizes and their statistical features agree with the observations.
2,368 downloads evolutionary biology
Although homologous recombination is accepted to be common in bacteria, so far it has been challenging to accurately quantify its impact on genome evolution within bacterial species. We here introduce methods that use the statistics of single-nucleotide polymorphism (SNP) splits in the core genome alignment of a set of strains to show that, for many bacterial species, recombination dominates genome evolution. Each genomic locus has been overwritten so many times by recombination that it is impossible to reconstruct the clonal phylogeny and, instead of a consensus phylogeny, the phylogeny typically changes many thousands of times along the core genome alignment. We also show how SNP splits can be used to quantify the relative rates with which different subsets of strains have recombined in the past. We find that virtually every strain has a unique pattern of recombination frequencies with other strains and that the relative rates with which different subsets of strains share SNPs follow long-tailed distributions. Our findings show that bacterial populations are neither clonal nor freely recombining, but structured such that recombination rates between different lineages vary along a continuum spanning several orders of magnitude, with a unique pattern of rates for each lineage. Thus, rather than reflecting clonal ancestry, whole genome phylogenies reflect these long-tailed distributions of recombination rates.
2,343 downloads evolutionary biology
Meiosis is a key event of sexual life cycles in eukaryotes. Its mechanistic details have been uncovered in several model organisms, and most of its essential features have received various and often contradictory evolutionary interpretations. In this perspective, we present an overview of these often "weird" features. We discuss the origin of meiosis (origin of ploidy reduction and recombination, two-step meiosis), its secondary modifications (in polyploids or asexuals, inverted meiosis), its importance in punctuating life cycles (meiotic arrests, epigenetic resetting, meiotic asymmetry, meiotic fairness) and features associated with recombination (disjunction constraints, heterochiasmy, crossover interference and hotspots). We present the various evolutionary scenarios and selective pressures that have been proposed to account for these features, and we highlight that their evolutionary significance often remains largely mysterious. Resolving these mysteries will likely provide decisive steps towards understanding why sex and recombination are found in the majority of eukaryotes.
- Top preprints of 2018
- Paper search
- Author leaderboards
- Overall metrics
- The API
- Email newsletter
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!