21: The Genetic Cost of Neanderthal Introgression
Posted to bioRxiv 30 Oct 2015

The Genetic Cost of Neanderthal Introgression
3,952 downloads evolutionary biology

Kelley Harris, Rasmus Nielsen

Approximately 2-4\% of genetic material in human populations outside Africa is derived from Neanderthals who interbred with anatomically modern humans. Recent studies have shown that this Neanderthal DNA is depleted around functional genomic regions; this has been suggested to be a consequence of harmful epistatic interactions between human and Neanderthal alleles. However, using published estimates of Neanderthal inbreeding and the distribution of mutational fitness effects, we infer that Neanderthals had at least 40% lower fitness than humans on average; this increased load predicts the reduction in Neanderthal introgression around genes without the need to invoke epistasis. We also predict a residual Neanderthal mutational load in non-Africans, leading to a fitness reduction of at least 0.5%. This effect of Neanderthal admixture has been left out of previous debate on mutation load differences between Africans and non-Africans. We also show that if many deleterious mutations are recessive, the Neanderthal admixture fraction could increase over time due to the protective effect of Neanderthal haplotypes against deleterious alleles that arose recently in the human population. This might partially explain why so many organisms retain gene flow from other species and appear to derive adaptive benefits from introgression.

22: Unravelling the diversity behind Ophiocordyceps unilateralis complex: Three new species of Zombie-Ant fungus from Brazilian Amazon
Posted to bioRxiv 03 Apr 2014

Unravelling the diversity behind Ophiocordyceps unilateralis complex: Three new species of Zombie-Ant fungus from Brazilian Amazon
3,844 downloads evolutionary biology

João P. M. Araújo, Harry C Evans, David M Geiser, William P Mackay, David P. Hughes

In tropical forests, one of the most common relationships between parasites and insects is that between the fungus Ophiocordyceps (Ophiocordycipitaceae, Hypocreales, Ascomycota) and ants, especially within the tribe Camponotini. These fungi have the ability to penetrate the exoskeleton of the ant and to manipulate the behavior of the host, making it leave the nest and ascend understorey shrubs, to die biting onto the vegetation: hence, the term zombie-ant fungi to describe this behavioral changes on the host. It is posited that this behavioral change aids spore dispersal and thus increases the chances of infection. Despite their undoubted importance for ecosystem functioning, these fungal pathogens are still poorly documented, especially regarding their diversity, ecology and evolutionary relationships. Here, we describe three new and host-specific species of the genus Ophiocordyceps on Camponotus ants from the central Amazonian region of Brazil which can readily be separated using classic taxonomic criteria, in particular ascospore morphology. In addition, we also employed molecular techniques to show for the first time the phylogenetic relationships between these taxa and closely related species within the Ophiocordyceps unilateralis complex, as well as with other members of the family Ophiocordycipitaceae.

23: Trees, Population Structure, F-statistics!
Posted to bioRxiv 09 Oct 2015

Trees, Population Structure, F-statistics!
3,785 downloads evolutionary biology

Benjamin M Peter

Many questions about human genetic history can be addressed by examining the patterns of shared genetic variation between sets of populations. A useful methodological framework for this purpose are F-statistics, that measure shared genetic drift between sets of two, three and four populations, and can be used to test simple and complex hypotheses about admixture between populations. Here, we put these statistics in context of phylogenetic and population genetic theory. We show how measures of genetic drift can be interpreted as branch lengths, paths through an admixture graph or in terms of the internal branches in coalescent trees. We show that the admixture tests can be interpreted as testing general properties of phylogenies, allowing us to generalize applications for arbitrary phylogenetic trees. Furthermore, we derive novel expressions for the F-statistics, which enables us to explore the behavior of F-statistic under population structure models. In particular, we show that population substructure may complicate inference.

24: Genes mirror migrations and cultures in prehistoric Europe - a population genomic perspective
Posted to bioRxiv 01 Sep 2016

Genes mirror migrations and cultures in prehistoric Europe - a population genomic perspective
3,751 downloads evolutionary biology

Torsten Günther, Mattias Jakobsson

Genomic information from ancient human remains is beginning to show its full potential for learning about human prehistory. We review the last few years' dramatic finds about European prehistory based on genomic data from humans that lived many millennia ago and relate it to modern-day patterns of genomic variation. The early times, the Upper Palaeolithic, appears to contain several population turn-overs followed by more stable populations after the Last Glacial Maximum and during the Mesolithic. Some 11,000 years ago the migrations driving the Neolithic transition start from around Anatolia and reach the north and the west of Europe millennia later followed by major migrations during the Bronze age. These findings show that culture and lifestyle were major determinants of genomic differentiation and similarity in pre-historic Europe rather than geography as is the case today.

25: Differential gene expression and alternative splicing in insect immune specificity
Posted to bioRxiv 14 Feb 2014

Differential gene expression and alternative splicing in insect immune specificity
3,741 downloads evolutionary biology

Carolyn E. Riddell, Juan D. Lobaton Garces, Sally Adams, Seth M. Barribeau, David Twell, Eamonn B. Mallon

Ecological studies routinely show genotype-genotype interactions between insects and their parasites. The mechanisms behind these interactions are not clearly understood. Using the bumblebee Bombus terrestris / trypanosome Crithidia bombi model system, we have carried out a transcriptome-wide analysis of gene expression and alternative splicing in bees during C. bombi infection. We have performed four analyses, 1) comparing gene expression in infected and non-infected bees 24 hours after infection by Crithidia bombi, 2) comparing expression at 24 and 48 hours after C.bombi infection, 3) searching for differential gene expression associated with the host-parasite genotype-genotype interaction at 24 hours after infection and 4) searching for alternative splicing associated with the host-parasite genotype-genotype interaction at 24 hours post infection. We found a large number of genes differentially regulated related to numerous canonical immune pathways. These genes include receptors, signaling pathways and effectors. We discovered a possible interaction between the peritrophic membrane and the insect immune system in defense against Crithidia. Most interestingly we found differential expression and alternative splicing of Dscam related transcripts and a novel immunoglobulin related gene Twitchin depends on the genotype-genotype interactions of the given bumblebee colony and Crithidia strain.

26: Real-time DNA barcoding in a remote rainforest using nanopore sequencing
Posted to bioRxiv 15 Sep 2017

Real-time DNA barcoding in a remote rainforest using nanopore sequencing
3,605 downloads evolutionary biology

Aaron F. Pomerantz, Nicolás Peñafiel, Alejandro Arteaga, Lucas Bustamante, Frank Pichardo, Luis A. Coloma, César L. Barrio-Amorós, David Salazar-Valenzuela, Stefan Prost

Advancements in portable scientific instruments provide promising avenues to expedite field work in order to understand the diverse array of organisms that inhabit our planet. Here we tested the feasibility for in situ molecular analyses of endemic fauna using a portable laboratory fitting within a single backpack, in one of the most imperiled biodiversity hotspots: the Ecuadorian Choco rainforest. We utilized portable equipment, including the MinION DNA sequencer (Oxford Nanopore Technologies) and miniPCR (miniPCR), to perform DNA extraction, PCR amplification and real-time DNA barcode sequencing of reptile specimens in the field. We demonstrate that nanopore sequencing can be implemented in a remote tropical forest to quickly and accurately identify species using DNA barcoding, as we generated consensus sequences for species resolution with an accuracy of >99% in less than 24 hours after collecting specimens. In addition, we generated sequence information at Universidad Tecnologica Indoamerica in Quito for the recently re-discovered Jambato toad Atelopus ignescens, which was thought to be extinct for 28 years, a rare species of blind snake Trilepida guayaquilensis, and two undescribed species of Dipsas snakes. In this study we establish how mobile laboratories and nanopore sequencing can help to accelerate species identification in remote areas (especially for species that are difficult to diagnose based on characters of external morphology), be applied to local research facilities in developing countries, and rapidly generate information for species that are rare, endangered and undescribed, which can potentially aid in conservation efforts.

27: Beyond Brownian motion and the Ornstein-Uhlenbeck process: Stochastic diffusion models for the evolution of quantitative characters.
Posted to bioRxiv 02 Aug 2016

Beyond Brownian motion and the Ornstein-Uhlenbeck process: Stochastic diffusion models for the evolution of quantitative characters.
3,594 downloads evolutionary biology

Simon Phillip Blomberg

Gaussian processes such as Brownian motion and the Ornstein-Uhlenbeck process have been popular models for the evolution of quantitative traits and are widely used in phylogenetic comparative methods. However, they have drawbacks which limit their utility. Here I describe new, non-Gaussian stochastic differential equation (diffusion) models of quantitative trait evolution. I present general methods for deriving new diffusion models, and discuss possible schemes for fitting non-Gaussian evolutionary models to trait data. The theory of stochastic processes provides a mathematical framework for understanding the properties of current, new and future phylogenetic comparative methods. Attention to the mathematical details of models of trait evolution and diversification may help avoid some pitfalls when using stochastic processes to model macroevolution.

28: Carriers of mitochondrial DNA macrohaplogroup L3 basic lineages migrated back to Africa from Asia around 70,000 years ago.
Posted to bioRxiv 13 Dec 2017

Carriers of mitochondrial DNA macrohaplogroup L3 basic lineages migrated back to Africa from Asia around 70,000 years ago.
3,554 downloads evolutionary biology

Vicente M. Cabrera, Patricia Marrero, Khaled K. Abu-Amero, Jose M Larruga

Background: After three decades of mtDNA studies on human evolution the only incontrovertible main result is the African origin of all extant modern humans. In addition, a southern coastal route has been relentlessly imposed to explain the Eurasian colonization of these African pioneers. Based on the age of macrohaplogroup L3, from which all maternal Eurasian and the majority of African lineages originated, that out-of-Africa event has been dated around 60-70 kya. On the opposite side, we have proposed a northern route through Central Asia across the Levant for that expansion. Consistent with the fossil record, we have dated it around 125 kya. To help bridge differences between the molecular and fossil record ages, in this article we assess the possibility that mtDNA macrohaplogroup L3 matured in Eurasia and returned to Africa as basic L3 lineages around 70 kya. Results: The coalescence ages of all Eurasian (M,N) and African L3 lineages, both around 71 kya, are not significantly different. The oldest M and N Eurasian clades are found in southeastern Asia instead near of Africa as expected by the southern route hypothesis. The split of the Y-chromosome composite DE haplogroup is very similar to the age of mtDNA L3. A Eurasian origin and back migration to Africa has been proposed for the African Y-chromosome haplogroup E. Inside Africa, frequency distributions of maternal L3 and paternal E lineages are positively correlated. This correlation is not fully explained by geographic or ethnic affinities. It seems better to be the result of a joint and global replacement of the old autochthonous male and female African lineages by the new Eurasian incomers. Conclusions: These results are congruent with a model proposing an out-of-Africa of early anatomically modern humans around 125 kya. A return to Africa of Eurasian fully modern humans around 70 kya, and a second Eurasian global expansion by 60 kya. Climatic conditions and the presence of Neanderthals played key roles in these human movements.

29: Ecological causes of speciation and species richness in the mammal tree of life
Posted to bioRxiv 04 Jan 2019

Ecological causes of speciation and species richness in the mammal tree of life
3,523 downloads evolutionary biology

Nathan S. Upham, Jacob A Esselstyn, Walter Jetz

Biodiversity is distributed unevenly from the poles to the equator, and among branches of the tree of life, yet how those patterns are related is unclear. We investigated global speciation-rate variation across crown Mammalia using a novel time-scaled phylogeny ( N =5,911 species, ~70% with DNA), finding that trait- and latitude-associated speciation has caused uneven species richness among groups. We identify 24 branch-specific shifts in net diversification rates linked to ecological traits. Using time-slices to define clades, we show that speciation rates are a stronger predictor of clade richness than age. Mammals that are low dispersal or diurnal diversify the fastest, indicating roles for geographic and ecological speciation, respectively. Speciation is slower in tropical than extra-tropical lineages, consistent with evidence that longer tropical species durations underpin the latitudinal diversity gradient. These findings juxtapose modes of lineage diversification that are alternatively turnover-based, and thus non-adaptive, or persistence-based as associated with resource adaptations.

30: Rethinking phylogenetic comparative methods
Posted to bioRxiv 21 Nov 2017

Rethinking phylogenetic comparative methods
3,479 downloads evolutionary biology

Josef C. Uyeda, Rosana Zenil-Ferguson, Matthew W. Pennell

As a result of the process of descent with modification, closely related species tend to be similar to one another in a myriad different ways. In statistical terms, this means that traits measured on one species will not be independent of traits measured on others. Since their introduction in the 1980s, phylogenetic comparative methods (PCMs) have been framed as a solution to this problem. In this paper, we argue that this way of thinking about PCMs is deeply misleading. Not only has this sowed widespread confusion in the literature about what PCMs are doing but has led us to develop methods that are susceptible to the very thing we sought to build defenses against --- unreplicated evolutionary events. Through three Case Studies, we demonstrate that the susceptibility to singular events is indeed a recurring problem in comparative biology that links several seemingly unrelated controversies. In each Case Study we propose a potential solution to the problem. While the details of our proposed solutions differ, they share a common theme: unifying hypothesis testing with data-driven approaches (which we term "phylogenetic natural history") to disentangle the impact of singular evolutionary events from that of the factors we are investigating. More broadly, we argue that our field has, at times, been sloppy when weighing evidence in support of causal hypotheses. We suggest that one way to refine our inferences is to re-imagine phylogenies as probabilistic graphical models; adopting this way of thinking will help clarify precisely what we are testing and what evidence supports our claims.

31: Evolutionary dynamics of bacteria in the gut microbiome within and across hosts
Posted to bioRxiv 30 Oct 2017

Evolutionary dynamics of bacteria in the gut microbiome within and across hosts
3,410 downloads evolutionary biology

Nandita R. Garud, Benjamin H. Good, Oskar Hallatschek, Katherine S. Pollard

Gut microbiota are shaped by a combination of ecological and evolutionary forces. While the ecological dynamics have been extensively studied, much less is known about how species of gut bacteria evolve over time. Here we introduce a model-based framework for quantifying evolutionary dynamics within and across hosts using a panel of metagenomic samples. We use this approach to study evolution in ~30 prevalent species in the human gut. Although the patterns of between-host diversity are consistent with quasi-sexual evolution and purifying selection on long timescales, we identify new genealogical signatures that challenge standard population genetic models of these processes. On shorter timescales within hosts, we find that genetic differences only rarely arise from the invasion of distantly related strains. Instead, the resident strains more commonly acquire a smaller number of evolutionary changes, in which nucleotide variants or gene gains or losses rapidly sweep to high frequency over ~6 month timescales. By comparing these mutations with the typical between-host differences, we find evidence that sweeps are driven by introgression from other strains, rather than by new mutations. Our results suggest that gut bacteria evolve on human-relevant timescales, and highlight the feedback between short- and long-term evolution across hosts.

32: Polygenic Adaptation has Impacted Multiple Anthropometric Traits
Posted to bioRxiv 23 Jul 2017

Polygenic Adaptation has Impacted Multiple Anthropometric Traits
3,371 downloads evolutionary biology

Jeremy J. Berg, Xinjun Zhang, Graham Coop

Our understanding of the genetic basis of human adaptation is biased toward loci of large phenotypic effect. Genome wide association studies (GWAS) now enable the study of genetic adaptation in highly polygenic phenotypes. Here we test for polygenic adaptation among 187 world-wide human populations using polygenic scores constructed from GWAS of 34 complex traits. Comparing these polygenic scores to a null distribution under genetic drift, we identify strong signals of selection for a suite of anthropometric traits including height, infant head circumference (IHC), hip circumference and waist-to-hip ratio (WHR), as well as type 2 diabetes (T2D). In addition to the known north-south gradient of polygenic height scores within Europe, we find that natural selection has contributed to a gradient of decreasing polygenic height scores from West to East across Eurasia. Analyzing a set of ancient DNA samples from across Eurasia, we show that much of this gradient can be explained by independent selection for increased height in two long diverged hunter-gatherer populations living in western and west-central Euraisa sometime during or shortly after the last glacial maximum. We further find that the signal of selection on hip circumference can largely be explained as a correlated response to selection on height. However, our signals in IHC and WHR cannot, suggesting that these patterns are the result of selection along multiple axes of body shape variation. Our observation that IHC and WHR polygenic scores follow a strong latitudinal cline in Western Eurasia support the role of natural selection in establishing Bergmann's Rule in humans, and are consistent with thermoregulatory adaptation in response to latitudinal temperature variation. Author's Note on Failure to Replicate: After this preprint was posted, the UK Biobank dataset was released, providing a new and open GWAS resource. When attempting to replicate the height selection results from this preprint using GWAS data from the UK Biobank, we discovered that we could not. In subsequent analyses, we determined that both the GIANT consortium height GWAS data, as well as another dataset that was used for replication, were impacted by stratification issues that created or at a minimum substantially inflated the height selection signals reported here. The results of this second investigation, written together with additional coauthors, have now been published (https://elifesciences.org/articles/39725 along with another paper by a separate group of authors, showing similar issues https://elifesciences.org/articles/39702). A preliminary investigation shows that the other non-height based results may suffer from similar issues. We stand by the theory and statistical methods reported in this paper, and the paper can be cited for these results. However, we have shown that the data on which the major empirical results were based are not sound, and so should be treated with caution until replicated.

33: Evaluating the use of ABBA-BABA statistics to locate introgressed loci
Posted to bioRxiv 11 Dec 2013

Evaluating the use of ABBA-BABA statistics to locate introgressed loci
3,318 downloads evolutionary biology

Simon H. Martin, John W. Davey, Chris D Jiggins

Several methods have been proposed to test for introgression across genomes. One method tests for a genome-wide excess of shared derived alleles between taxa using Patterson?s D statistic, but does not establish which loci show such an excess or whether the excess is due to introgression or ancestral population structure. Several recent studies have extended the use of D by applying the statistic to small genomic regions, rather than genome-wide. Here, we use simulations and whole genome data from Heliconius butterflies to investigate the behavior of D in small genomic regions. We find that D is unreliable in this situation as it gives inflated values when effective population size is low, causing D outliers to cluster in genomic regions of reduced diversity. As an alternative, we propose a related statistic f̂d, a modified version of a statistic originally developed to estimate the genome-wide fraction of admixture. f̂d is not subject to the same biases as D, and is better at identifying introgressed loci. Finally, we show that both D and f̂d outliers tend to cluster in regions of low absolute divergence (dXY), which can confound a recently proposed test for differentiating introgression from shared ancestral variation at individual loci.

34: A Spatial Framework for Understanding Population Structure and Admixture.
Posted to bioRxiv 07 Jan 2015

A Spatial Framework for Understanding Population Structure and Admixture.
3,301 downloads evolutionary biology

Gideon S. Bradburd, Graham Coop

Geographic patterns of genetic variation within modern populations, produced by complex histories of migration, can be difficult to infer and visually summarize. A general consequence of geographically limited dispersal is that samples from nearby locations tend to be more closely related than samples from distant locations, and so genetic covariance often recapitulates geographic proximity. We use genome-wide polymorphism data to build ``geogenetic maps,'' which, when applied to stationary populations, produces a map of the geographic positions of the populations, but with distances distorted to reflect historical rates of gene flow. In the underlying model, allele frequency covariance is a decreasing function of geogenetic distance, and nonlocal gene flow such as admixture can be identified as anomalously strong covariance over long distances. This admixture is explicitly co-estimated and depicted as arrows, from the source of admixture to the recipient, on the geogenetic map. We demonstrate the utility of this method on a circum-Tibetan sampling of the greenish warbler (Phylloscopus trochiloides), in which we find evidence for gene flow between the adjacent, terminal populations of the ring species. We also analyze a global sampling of human populations, for which we largely recover the geography of the sampling, with support for significant histories of admixture in many samples. This new tool for understanding and visualizing patterns of population structure is implemented in a Bayesian framework in the program SpaceMix.

35: Denisovan Ancestry in East Eurasian and Native American Populations.
Posted to bioRxiv 03 Apr 2015

Denisovan Ancestry in East Eurasian and Native American Populations.
3,275 downloads evolutionary biology

Pengfei Qin, Mark Stoneking

Although initial studies suggested that Denisovan ancestry was found only in modern human populations from island Southeast Asia and Oceania, more recent studies have suggested that Denisovan ancestry may be more widespread. However, the geographic extent of Denisovan ancestry has not been determined, and moreover the relationship between the Denisovan ancestry in Oceania and that elsewhere has not been studied. Here we analyze genome-wide SNP data from 2493 individuals from 221 worldwide populations, and show that there is a widespread signal of a very low level of Denisovan ancestry across Eastern Eurasian and Native American (EE/NA) populations. We also verify a higher level of Denisovan ancestry in Oceania than that in EE/NA; the Denisovan ancestry in Oceania is correlated with the amount of New Guinea ancestry, but not the amount of Australian ancestry, indicating that recent gene flow from New Guinea likely accounts for signals of Denisovan ancestry across Oceania. However, Denisovan ancestry in EE/NA populations is equally correlated with their New Guinea or their Australian ancestry, suggesting a common source for the Denisovan ancestry in EE/NA and Oceanian populations. Our results suggest that Denisovan ancestry in EE/NA is derived either from common ancestry with, or gene flow from, the common ancestor of New Guineans and Australians, indicating a more complex history involving East Eurasians and Oceanians than previously suspected.

36: Detecting polygenic adaptation in admixture graphs
Posted to bioRxiv 04 Jun 2017

Detecting polygenic adaptation in admixture graphs
3,258 downloads evolutionary biology

Fernando Racimo, Jeremy J. Berg, Joseph K. Pickrell

An open question in human evolution is the importance of polygenic adaptation: adaptive changes in the mean of a multifactorial trait due to shifts in allele frequencies across many loci. In recent years, several methods have been developed to detect polygenic adaptation using loci identified in genome-wide association studies (GWAS). Though powerful, these methods suffer from limited interpretability: they can detect which sets of populations have evidence for polygenic adaptation, but are unable to reveal where in the history of multiple populations these processes occurred. To address this, we created a method to detect polygenic adaptation in an admixture graph, which is a representation of the historical divergences and admixture events relating different populations through time. We developed a Markov chain Monte Carlo (MCMC) algorithm to infer branch-specific parameters reflecting the strength of selection in each branch of a graph. Additionally, we developed a set of summary statistics that are fast to compute and can indicate which branches are most likely to have experienced polygenic adaptation. We show via simulations that this method - which we call PolyGraph - has good power to detect polygenic adaptation, and applied it to human population genomic data from around the world. We also provide evidence that variants associated with several traits, including height, educational attainment, and self-reported unibrow, have been influenced by polygenic adaptation in different populations during human evolution.

37: Size, shape and structure of insect wings
Posted to bioRxiv 26 Nov 2018

Size, shape and structure of insect wings
3,232 downloads evolutionary biology

Mary K. Salcedo, Jordan Hoffmann, Seth Donoughe, L. Mahadevan

The size, shape and structure of insect wings are intimately linked to their ability to fly. However, there are few systematic studies of the variability of the natural patterns in wing morphology across insects. We assemble a comprehensive dataset of insect wings and analyze their morphology using topological and geometric notions in terms of i) wing size and contour shape, ii) vein geometry and topology, and iii) shape and distribution of wing membrane domains. These morphospaces are a first-step in defining the diversity of wing patterns across insect orders and set the stage for investigating their functional consequences.

38: Species Delimitation using Genome-Wide SNP Data
Posted to bioRxiv 05 Dec 2013

Species Delimitation using Genome-Wide SNP Data
3,208 downloads evolutionary biology

Adam D Leaché, Matthew K. Fujita, Vladimir N. Minin, Remco R. Bouckaert

The multi-species coalescent has provided important progress for evolutionary inferences, including increasing the statistical rigor and objectivity of comparisons among competing species delimitation models. However, Bayesian species delimitation methods typically require brute force integration over gene trees via Markov chain Monte Carlo (MCMC), which introduces a large computation burden and precludes their application to genomic-scale data. Here we combine a recently introduced dynamic programming algorithm for estimating species trees that bypasses MCMC integration over gene trees with sophisticated methods for estimating marginal likelihoods, needed for Bayesian model selection, to provide a rigorous and computationally tractable technique for genome-wide species delimitation. We provide a critical yet simple correction that brings the likelihoods of different species trees, and more importantly their corresponding marginal likelihoods, to the same common denominator, which enables direct and accurate comparisons of competing species delimitation models using Bayes factors. We test this approach, which we call Bayes factor delimitation (*with genomic data; BFD*), using common species delimitation scenarios with computer simulations. Varying the numbers of loci and the number of samples suggest that the approach can distinguish the true model even with few loci and limited samples per species. Misspecification of the prior for population size θ has little impact on support for the true model. We apply the approach to West African forest geckos (Hemidactylus fasciatus complex) using genome-wide SNP data data. This new Bayesian method for species delimitation builds on a growing trend for objective species delimitation methods with explicit model assumptions that are easily tested.

39: The promise of disease gene discovery in South Asia
Posted to bioRxiv 06 Apr 2016

The promise of disease gene discovery in South Asia
3,206 downloads evolutionary biology

Nathan Nakatsuka, Priya Moorjani, Niraj Rai, Biswanath Sarkar, Arti Tandon, Nick Patterson, Gandham SriLakshmi Bhavani, Katta Mohan Girisha, Mohammed S Mustak, Sudha Srinivasan, Amit Kaushik, Saadi Abdul Vahab, Sujatha M Jagadeesh, Kapaettu Satyamoorthy, Lalji Singh, David Reich, Kumarasamy Thangaraj

The more than 1.5 billion people who live in South Asia are correctly viewed not as a single large population, but as many small endogamous groups. We assembled genome-wide data from over 2,800 individuals from over 260 distinct South Asian groups. We identify 81 unique groups, of which 14 have estimated census sizes of more than a million, that descend from founder events more extreme than those in Ashkenazi Jews and Finns, both of which have high rates of recessive disease due to founder events. We identify multiple examples of recessive diseases in South Asia that are the result of such founder events. This study highlights an under-appreciated opportunity for reducing disease burden among South Asians through the discovery of and testing for recessive disease genes.

40: Inferring the landscape of recombination using recurrent neural networks
Posted to bioRxiv 06 Jun 2019

Inferring the landscape of recombination using recurrent neural networks
3,175 downloads evolutionary biology

Jeffrey R. Adrion, Jared G. Galloway, Andrew D. Kern

Accurately inferring the genome-wide landscape of recombination rates in natural populations is a central aim in genomics, as patterns of linkage influence everything from genetic mapping to understanding evolutionary history. Here we describe ReLERNN, a deep learning method for estimating a genome-wide recombination map that is accurate even with small numbers of pooled or individually sequenced genomes. Rather than use summaries of linkage disequilibrium as its input, ReLERNN takes columns from a genotype alignment, which are then modeled as a sequence across the genome using a recurrent neural network. We demonstrate that ReLERNN improves accuracy and reduces bias relative to existing methods and maintains high accuracy in the face of demographic model misspecification, missing genotype calls, and genome inaccessibility. We apply ReLERNN to natural populations of African Drosophila melanogaster and show that genome-wide recombination landscapes, while largely correlated among populations, exhibit important population-specific differences. Lastly, we connect the inferred patterns of recombination with the frequencies of major inversions segregating in natural Drosophila populations.

