Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 89,328 bioRxiv papers from 382,778 authors.
Most downloaded bioRxiv papers, since beginning of last month
in category evolutionary biology
5,356 results found. For more information, click each entry to expand.
13,138 downloads evolutionary biology
Bette T. Korber, WM Fischer, S. Gnanakaran, H Yoon, J Theiler, W Abfalterer, B Foley, EE Giorgi, Tanmoy Bhattacharya, MD Parker, DG Partridge, CM Evans, TM Freeman, Thushan I. de Silva, on behalf of the Sheffield COVID-19 Genomics Group, CC LaBranche, David Montefiori
We have developed an analysis pipeline to facilitate real-time mutation tracking in SARS-CoV-2, focusing initially on the Spike (S) protein because it mediates infection of human cells and is the target of most vaccine strategies and antibody-based therapeutics. To date we have identified fourteen mutations in Spike that are accumulating. Mutations are considered in a broader phylogenetic context, geographically, and over time, to provide an early warning system to reveal mutations that may confer selective advantages in transmission or resistance to interventions. Each one is evaluated for evidence of positive selection, and the implications of the mutation are explored through structural modeling. The mutation Spike D614G is of urgent concern; after beginning to spread in Europe in early February, when introduced to new regions it repeatedly and rapidly becomes the dominant form. Also, we present evidence of recombination between locally circulating strains, indicative of multiple strain infections. These finding have important implications for SARS-CoV-2 transmission, pathogenesis and immune interventions. ### Competing Interest Statement The authors have declared no competing interest.
12,700 downloads evolutionary biology
This paper has been withdrawn by its authors. They intend to revise it in response to comments received from the research community on their technical approach and their interpretation of the results. If you have any questions, please contact the corresponding author.
5,303 downloads evolutionary biology
Alice Latinne, Ben Hu, Kevin J Olival, Guangjian Zhu, Libiao Zhang, Hongying Li, Aleksei A Chmura, Hume E Field, Carlos Zambrana-Torrelio, Jonathan H Epstein, Bei Li, Wei Zhang, Lin-Fa Wang, Zheng-Li Shi, Peter Daszak
Bats are presumed reservoirs of diverse coronaviruses (CoVs) including progenitors of Severe Acute Respiratory Syndrome (SARS)-CoV and SARS-CoV-2, the causative agent of COVID-19. However, the evolution and diversification of these coronaviruses remains poorly understood. We used a Bayesian statistical framework and sequence data from all known bat-CoVs (including 630 novel CoV sequences) to study their macroevolution, cross-species transmission, and dispersal in China. We find that host-switching was more frequent and across more distantly related host taxa in alpha- than beta-CoVs, and more highly constrained by phylogenetic distance for beta-CoVs. We show that inter-family and -genus switching is most common in Rhinolophidae and the genus Rhinolophus. Our analyses identify the host taxa and geographic regions that define hotspots of CoV evolutionary diversity in China that could help target bat-CoV discovery for proactive zoonotic disease surveillance. Finally, we present a phylogenetic analysis suggesting a likely origin for SARS-CoV-2 in Rhinolophus spp. bats. ### Competing Interest Statement The authors have declared no competing interest.
2,295 downloads evolutionary biology
Identifying genomic regions with unusually high local haplotype homozygosity represents a powerful strategy to characterize candidate genes responding to natural or artificial positive selection. To that end, statistics measuring the extent of haplotype homozygosity within (e.g., EHH, IHS) and between (Rsb or XP-EHH) populations have been proposed in the literature. The rehh package for R was previously developed to facilitate genome-wide scans of selection, based on the analysis of long-range haplotypes. However, its performance wasn't sufficient to cope with the growing size of available data sets. Here we propose a major upgrade of the rehh package, which includes an improved processing of the input files, a faster algorithm to enumerate haplotypes, as well as multi-threading. As illustrated with the analysis of large human haplotype data sets, these improvements decrease the computation time by more than an order of magnitude. This new version of rehh will thus allow performing iHS-, Rsb- or XP-EHH-based scans on large data sets. The package rehh 2.0 is available from the CRAN repository (http://cran.r-project.org/web/packages/rehh/index.html) together with help files and a detailed manual.
2,103 downloads evolutionary biology
In a side-by-side comparison of evolutionary dynamics between the 2019/2020 SARS-CoV-2 and the 2003 SARS-CoV, we were surprised to find that SARS-CoV-2 resembles SARS-CoV in the late phase of the 2003 epidemic after SARS-CoV had developed several advantageous adaptations for human transmission. Our observations suggest that by the time SARS-CoV-2 was first detected in late 2019, it was already pre-adapted to human transmission to an extent similar to late epidemic SARS-CoV. However, no precursors or parallel branches of evolution stemming from a less human-adapted SARS-CoV-2-like virus have been detected. The sudden appearance of a highly infectious SARS-CoV-2 presents a major cause for concern that should motivate stronger international efforts to identify the source and prevent near future re-emergence. Any existing pools of SARS-CoV-2 progenitors would be particularly dangerous if similarly well adapted for human transmission. To look for clues regarding intermediate hosts, we analyze recent key findings relating to how SARS-CoV-2 could have evolved and adapted for human transmission, and examine the environmental samples from the Wuhan Huanan seafood market. Importantly, the market samples are genetically identical to human SARS-CoV-2 isolates and were therefore most likely from human sources. We conclude by describing and advocating for measured and effective approaches implemented in the 2002-2004 SARS outbreaks to identify lingering population(s) of progenitor virus. ### Competing Interest Statement Shing Hei Zhan is a Co-founder and lead bioinformatics scientist at Fusion Genomics Corporation, which develops molecular diagnostic assays for infectious diseases.
1,738 downloads evolutionary biology
Monitoring the mutation dynamics of SARS-CoV-2 is critical for the development of effective approaches to contain the pathogen. By analyzing 106 SARS-CoV-2 and 39 SARS genome sequences, we provided direct genetic evidence that SARS-CoV-2 has a much lower mutation rate than SARS. Minimum Evolution phylogeny analysis revealed the putative original status of SARS-CoV-2 and the early-stage spread history. The discrepant phylogenies for the spike protein and its receptor binding domain proved a previously reported structural rearrangement prior to the emergence of SARS-CoV-2. Despite that we found the spike glycoprotein of SARS-CoV-2 is particularly more conserved, we identified a mutation that leads to weaker receptor binding capability, which concerns a SARS-CoV-2 sample collected on 27th January 2020 from India. This represents the first report of a significant SARS-CoV-2 mutant, and raises the alarm that the ongoing vaccine development may become futile in future epidemic if more mutations were identified. ### Competing Interest Statement The authors have declared no competing interest.
1,427 downloads evolutionary biology
There are outstanding evolutionary questions on the recent emergence of coronavirus SARS-CoV-2/hCoV-19 in Hubei province that caused the COVID-19 pandemic, including (1) the relationship of the new virus to the SARS-related coronaviruses, (2) the role of bats as a reservoir species, (3) the potential role of other mammals in the emergence event, and (4) the role of recombination in viral emergence. Here, we address these questions and find that the sarbecoviruses -- the viral subgenus responsible for the emergence of SARS-CoV and SARS-CoV-2 -- exhibit frequent recombination, but the SARS-CoV-2 lineage itself is not a recombinant of any viruses detected to date. In order to employ phylogenetic methods to date the divergence events between SARS-CoV-2 and the bat sarbecovirus reservoir, recombinant regions of a 68-genome sarbecovirus alignment were removed with three independent methods. Bayesian evolutionary rate and divergence date estimates were consistent for all three recombination-free alignments and robust to two different prior specifications based on HCoV-OC43 and MERS-CoV evolutionary rates. Divergence dates between SARS-CoV-2 and the bat sarbecovirus reservoir were estimated as 1948 (95% HPD: 1879-1999), 1969 (95% HPD: 1930-2000), and 1982 (95% HPD: 1948-2009). Despite intensified characterization of sarbecoviruses since SARS, the lineage giving rise to SARS-CoV-2 has been circulating unnoticed for decades in bats and been transmitted to other hosts such as pangolins. The occurrence of a third significant coronavirus emergence in 17 years together with the high prevalence and virus diversity in bats implies that these viruses are likely to cross species boundaries again.
1,188 downloads evolutionary biology
Mammals progress through similar physiological stages during life, from early development to puberty, aging, and death. Yet, the extent to which this conserved physiology reflects conserved molecular events is unclear. Here, we map common epigenetic changes experienced by mammalian genomes as they age, focusing on evolutionary comparisons of humans to dogs, an emerging model of aging. Using targeted sequencing, we characterize the methylomes of 104 Labrador retrievers spanning a 16 year age range, achieving >150X coverage within mammalian syntenic blocks. Comparison with human methylomes reveals a nonlinear relationship which translates dog to human years, aligns the timing of major physiological milestones between the two species, and extends to mice. Conserved changes center on specific developmental gene networks which are sufficient to capture the effects of anti-aging interventions in multiple mammals. These results establish methylation not only as a diagnostic age readout but as a cross-species translator of physiological aging milestones.
950 downloads evolutionary biology
Background: Nearly all Eurasians have ~2% Neanderthal ancestry due to several events of inbreeding between anatomically modern humans and archaic hominins. Previous studies characterizing the legacy of Neanderthal ancestry in modern Eurasians have identified examples of both adaptive and deleterious effects of admixture. However, we lack a comprehensive understanding of the genome-wide influence of Neanderthal introgression on modern human diseases and traits. Results: We integrate recent maps of Neanderthal ancestry with well-powered association studies for more than 400 diverse traits to estimate heritability enrichment patterns in regions of the human genome tolerant of Neanderthal ancestry and in introgressed Neanderthal variants themselves. First, we find that variants in regions tolerant of Neanderthal ancestry are depleted of heritability for all traits considered, except skin and hair-related traits. Second, the introgressed variants remaining in modern Europeans are depleted of heritability for most traits; however, we discover that they are enriched for heritability of several traits with potential relevance to human adaptation to non-African environments, including hair and skin traits, autoimmunity, chronotype, bone density, lung capacity, and menopause age. To better understand the phenotypic consequences of these enrichments, we adapt recent methods to test for consistent directional effects of introgressed alleles, and we find directionality for several traits. Finally, we use a direction-of-effect-aware approach to highlight novel candidate introgressed variants that influence risk for disease. Conclusion: Our results demonstrate that genomic regions retaining Neanderthal ancestry are not only less functional at the molecular-level, but are also depleted for variation influencing a diverse array of complex traits in modern humans. In spite of this depletion, we identify traits where introgression has an outsized effect. Integrating our results, we propose a framework for using quantification of trait heritability and direction of effect in introgressed regions to understand how Neanderthals were different from modern humans, how selection acted on different traits, and how introgression may have facilitated adaptation to non-African environments. ### Competing Interest Statement The authors have declared no competing interest.
906 downloads evolutionary biology
Classical evolutionary theory maintains that mutation rate variation between genes should be random with respect to fitness and evolutionary optimization of genic mutation rates remains controversial. However, it has now become known that cytogenetic (DNA sequence + epigenomic) features influence local mutation probabilities, which is predicted by more recent theory to be a prerequisite for beneficial mutation rates between different classes of genes to readily evolve. To test this possibility, we used de novo mutations in Arabidopsis thaliana to create a high resolution predictive model of mutation rates as a function of cytogenetic features across the genome. As expected, mutation rates are significantly predicted by features such as GC content, histone modifications, and chromatin accessibility. Deeper analyses of predicted mutation rates reveal effects of introns and untranslated exon regions in distancing coding sequences from mutational hotspots at the start and end of transcribed regions in A. thaliana . Finally, predicted coding region mutation rates are significantly lower in genes where mutations are more likely to be deleterious, supported by numerous estimates of evolutionary and functional constraint. These findings contradict neutral expectations that mutation probabilities are independent of fitness consequences. Instead they are consistent with the evolution of lower mutation rates in functionally constrained loci due to cytogenetic features, with important implications for evolutionary biology. ### Competing Interest Statement The authors have declared no competing interest.
883 downloads evolutionary biology
Diyendo Massilani, Laurits Skov, Mateja Hajdinjak, Byambaa Gunchinsuren, Damdinsuren Tseveendorj, Seonbok Yi, Jungeun Lee, Sarah Nagel, Birgit Nickel, Thibaut Devièse, Tom Higham, Matthias Meyer, Janet Kelso, Benjamin M Peter, Svante Pääbo
We present analyses of the genome of a ~34,000-year-old hominin skull cap discovered in the Salkhit Valley in North East Mongolia. We show that this individual was a female member of a modern human population that, following the split between East and West Eurasians, experienced substantial gene flow from West Eurasians. Both she and a 40,000-year-old individual from Tianyuan outside Beijing carried genomic segments of Denisovan ancestry. These segments derive from the same Denisovan admixture event(s) that contributed to present-day mainland Asians but are distinct from the Denisovan DNA segments in present-day Papuans and Aboriginal Australians. ### Competing Interest Statement The authors have declared no competing interest.
840 downloads evolutionary biology
Spatiotemporal bias in genome sequence sampling can severely confound phylogeographic inference based on discrete trait ancestral reconstruction. This has impeded our ability to accurately track the emergence and spread of SARS-CoV-2, which is the virus responsible for the COVID-19 pandemic. Despite the availability of staggering numbers of genomes on a global scale, evolutionary reconstructions of SARS-CoV-2 are hindered by the slow accumulation of sequence divergence over its relatively short transmission history. When confronted with these issues, incorporating additional contextual data may critically inform phylodynamic reconstructions. Here, we present a new approach to integrate individual travel history data in Bayesian phylogeographic inference and apply it to the early spread of SARS-CoV-2, while also including global air transportation data. We demonstrate that including travel history data for each SARS-CoV-2 genome yields more realistic reconstructions of virus spread, particularly when travelers from undersampled locations are included to mitigate sampling bias. We further explore the impact of sampling bias by incorporating unsampled sequences from undersampled locations in the analyses. Our reconstructions reinforce specific transmission hypotheses suggested by the inclusion of travel history data, but also suggest alternative routes of virus migration that are plausible within the epidemiological context but are not apparent with current sampling efforts. Although further research is needed to fully examine the performance of our new data integration approaches and to further improve them, they represent multiple new avenues for directly addressing the colossal issue of sample bias in phylogeographic inference. ### Competing Interest Statement The authors have declared no competing interest.
655 downloads evolutionary biology
Joana I. Meier, Patricio A. Salazar, Marek Kučka, Robert William Davies, Andreea Dréau, Ismael Aldás, Olivia Box Power, Nicola J. Nadeau, Jon R. Bridle, Campbell Rolian, Nicholas H. Barton, W Owen McMillan, Chris D. Jiggins, Yingguang Frank Chan
Genetic variation segregates as linked sets of variants, or haplotypes. Haplotypes and linkage are central to genetics and underpin virtually all genetic and selection analysis. And yet, genomic data often lack haplotype information, due to constraints in sequencing technologies. Here we present “haplotagging”, a simple, low-cost linked-read sequencing technique that allows sequencing of hundreds of individuals while retaining linkage information. We apply haplotagging to construct megabase-size haplotypes for over 600 individual butterflies ( Heliconius erato and H. melpomene ), which form overlapping hybrid zones across an elevational gradient in Ecuador. Haplotagging identifies loci controlling distinctive high- and lowland wing color patterns. Divergent haplotypes are found at the same major loci in both species, while chromosome rearrangements show no parallelism. Remarkably, in both species the geographic clines for the major wing pattern loci are displaced by 18 km, leading to the rise of a novel hybrid morph in the centre of the hybrid zone. We propose that shared warning signalling (Müllerian mimicry) may couple the cline shifts seen in both species, and facilitate the parallel co-emergence of a novel hybrid morph in both co-mimetic species. Our results show the power of efficient haplotyping methods when combined with large-scale sequencing data from natural populations. One-sentence summary Haplotagging, a novel linked-read sequencing technique that enables whole genome haplotyping in large populations, reveals the formation of a novel hybrid race in parallel hybrid zones of two co-mimicking Heliconius butterfly species through strikingly parallel divergences in their genomes. ### Competing Interest Statement The authors declare competing financial interests in the form of patent and employment by the Max Planck Society. The European Research Council provides funding for the research but no other competing interests.
649 downloads evolutionary biology
The hand of molecular mimicry in shaping SARS-CoV-2 evolution and immune evasion remains to be deciphered. Here, we identify 33 distinct 8-mer/9-mer peptides that are identical between SARS-CoV-2 and human proteomes, including 20 novel peptides not observed in any previous human coronavirus (HCoV) strains. Four of these mimicked 8-mers/9-mers map onto HLA-B*40:01, HLA-B*40:02, and HLA-B*35:01 binding peptides from human PAM, ANXA7, PGD, and ALOX5AP proteins. This striking mimicry of multiple human proteins by SARS-CoV-2 is made more salient by the targeted genes being focally expressed in arteries, lungs, esophagus, pancreas, and macrophages. Further, HLA-A*03 restricted 8-mer peptides are shared broadly by human and coronaviridae helicases with primary expression of the mimicked human proteins in the neurons and immune cells. These findings highlight molecular mimicry as a shared strategy adopted by evolutionary titans -- the virus in its quest for escaping herd immune surveillance, and the host immune systems that are constantly learning the patterns of 'self' and 'non-self'. ### Competing Interest Statement AJV, NK, PA and VS are employees of nference and have financial interests in the company. ADB is a consultant for Abbvie, is on scientific advisory boards for Nference and Zentalis, and is founder and President of Splissen therapeutics. One or more of the investigators associated with this project and Mayo Clinic have a Financial Conflict of Interest in technology used in the research and that the investigator(s) and Mayo Clinic may stand to gain financially from the successful outcome of the research. This research has been reviewed by the Mayo Clinic Conflict of Interest Review Board and is being conducted in compliance with Mayo Clinic Conflict of Interest policies.
551 downloads evolutionary biology
The recent outbreak of a new coronavirus (SARS-CoV-2) in Wuhan, China, underscores the need for understanding the evolutionary processes that drive the emergence and adaptation of zoonotic viruses in humans. Here, we show that recombination in betacoronaviruses, including human-infecting viruses like SARS-CoV and MERS-CoV, frequently encompasses the Receptor Binding Domain (RBD) in the Spike gene. We find that this common process likely led to a recombination event at least 11 years ago in an ancestor of the SARS-CoV-2 involving the RBD. As a result of this recombination event, SARS-CoV and SARS-CoV-2 share a similar genotype in RBD, including two insertions (positions 432-436 and 460-472), and alleles 427N and 436Y. Both 427N and 436Y belong to a helix that interacts with the human ACE2 receptor. Ancestral state analyses revealed that SARS-CoV-2 differentiated from its most recent common ancestor with RaTG13 by accumulating a significant number of amino acid changes in the RBD. In sum, we propose a two-hit scenario in the emergence of the SARS-CoV-2 virus whereby the SARS-CoV-2 ancestors in bats first acquired genetic characteristics of SARS-CoV by incorporation of a SARS-like RBD through recombination before 2009, and subsequently, the lineage that led to SARS-CoV-2 accumulated further, unique changes specifically in the RBD.
538 downloads evolutionary biology
RNA viruses are proficient at switching to novel host species due to their fast mutation rates. Implicit in this assumption is the need to evolve adaptations in the new host species to exploit their cells efficiently. However, SARS-CoV-2 has required no significant adaptation to humans since the pandemic began, with no observed selective sweeps to date. Here we contrast the role of positive selection and recombination in the Sarbecoviruses in horseshoe bats to SARS-CoV-2 evolution in humans. While methods can detect some evidence for positive selection in SARS-CoV-2, we demonstrate these are mostly due to recombination and sequencing artefacts. Purifying selection is also substantially weaker in SARS-CoV-2 than in the related bat Sarbecoviruses. In comparison, our results show evidence for positive, specifically episodic selection, acting on the bat virus lineage SARS-CoV-2 emerged from. This signature of selection can also be observed among synonymous substitutions, for example, linked to ancestral CpG depletion on this bat lineage. We show the bat virus RmYN02 has recombinant CpG content in Spike pointing to coinfection and evolution in bats without involvement of other species. Our results suggest the non-human progenitor of SARS-CoV-2 was capable of human-human transmission as a consequence of its natural evolution in bats. ### Competing Interest Statement The authors have declared no competing interest.
462 downloads evolutionary biology
Booming and busting populations modulate the accumulation of genetic diversity, encoding histories of living populations in present-day variation. Many methods exist to decode these histories, and all must make strong model assumptions. It is typical to assume that mutations accumulate uniformly across the genome at a constant rate that does not vary between closely related populations. However, recent work shows that mutational processes in human and great ape populations vary across genomic regions and evolve over time. This perturbs the mutation spectrum : the relative mutation rates in different local nucleotide contexts. Here, we develop theoretical tools in the framework of Kingman's coalescent to accommodate mutation spectrum dynamics. We describe mushi : a method to perform fast, nonparametric joint inference of demographic and mutation spectrum histories from allele frequency data. We use mushi to reconstruct trajectories of effective population size and mutation spectrum divergence between human populations, identify mutation signatures and their dynamics in different human populations, and produce more accurate time calibration for a previously-reported mutational pulse in the ancestors of Europeans. We show that mutation spectrum histories can be productively incorporated in a well-studied theoretical setting, and rigorously inferred from genomic variation data like other features of evolutionary history. ### Competing Interest Statement The authors have declared no competing interest.
435 downloads evolutionary biology
Building a genotype-phenotype-fitness map of adaptation is a central goal in evolutionary biology. It is notoriously difficult even when the adaptive mutations are known because it is hard to enumerate which phenotypes make these mutations adaptive. We address this problem by first quantifying how the fitness of hundreds of adaptive yeast mutants responds to subtle environmental shifts and then modeling the number of phenotypes they must collectively influence by decomposing these patterns of fitness variation. We find that a small number of phenotypes predicts fitness of the adaptive mutations near their original glucose-limited evolution condition. Importantly, phenotypes that matter little to fitness at or near the evolution condition can matter strongly in distant environments. This suggests that adaptive mutations are locally modular - affecting a small number of phenotypes that matter to fitness in the environment where they evolved - yet globally pleiotropic - affecting additional phenotypes that may reduce or improve fitness in new environments. ### Competing Interest Statement The authors have declared no competing interest.
422 downloads evolutionary biology
Actin is a major component of the eukaryotic cytoskeleton. Many related actin homologues can be found in eukaryotes1, some of them being present in most or all eukaryotic lineages. The gene repertoire of the Last Eukaryotic Common Ancestor (LECA) therefore would have harbored both actin and various actin-related proteins (ARPs). A current hypothesis is that the different ARPs originated by gene duplication in the proto-eukaryotic lineage from an actin gene that was inherited from Asgard archaea. Here, we report the first detection of actin-related genes in viruses (viractins), encoded by 19 genomes belonging to the Imitervirales, a viral order encompassing the giant Mimiviridae. Most viractins were closely related to the actin, contrasting with actin-related genes of Asgard archaea and Bathyarchaea (a newly discovered clade). Our phylogenetic analysis suggests viractins could have been acquired from proto-eukaryotes and possibly gave rise to the conventional eukaryotic actin after being reintroduced into the pre-LECA eukaryotic lineage. ### Competing Interest Statement The authors have declared no competing interest.
417 downloads evolutionary biology
The rooting of the SARS-CoV-2 phylogeny is important for understanding the origin and early spread of the virus. Previously published phylogenies have used different rootings that do not always provide consistent results. We use several different strategies for rooting the SARS-CoV-2 tree and provide measures of statistical uncertainty for all methods. We show that methods based on the molecular clock tend to place the root in the B clade, while methods based on outgroup rooting tend to place the root in the A clade. The results from the two approaches are statistically incompatible, possibly as a consequence of deviations from a molecular clock or excess back-mutations. We also show that none of the methods provide strong statistical support for the placement of the root in any particular edge of the tree. Our results suggest that inferences on the origin and early spread of SARS-CoV-2 based on rooted trees should be interpreted with caution. ### Competing Interest Statement The authors have declared no competing interest.
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!