Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 65,445 bioRxiv papers from 289,895 authors.
Most downloaded bioRxiv papers, all time
in category genetics
3,644 results found. For more information, click each entry to expand.
3,027 downloads genetics
Neuroticism is a personality trait of fundamental importance for psychological wellbeing and public health. It is strongly associated with major depressive disorder (MDD) and several other psychiatric conditions. Although neuroticism is heritable, attempts to identify the alleles involved in previous studies have been limited by relatively small sample sizes and heterogeneity in the measurement of neuroticism. Here we report a genome-wide association study of neuroticism in 91,370 participants of the UK Biobank cohort and a combined meta-analysis which includes a further 6,659 participants from the Generation Scotland Scottish Family Health Study (GS:SFHS) and 8,687 participants from a QIMR Berghofer Medical Research Institute (QIMR) cohort. All participants were assessed using the same neuroticism instrument, the Eysenck Personality Questionnaire-Revised (EPQ-R-S) Short Form Neuroticism scale. We found a SNP-based heritability estimate for neuroticism of approximately 15% (SE = 0.7%). Meta-analysis identified 9 novel loci associated with neuroticism. The strongest evidence for association was at a locus on chromosome 8 (p = 1.5x10-15) spanning 4 Mb and containing at least 36 genes. Other associated loci included interesting candidate genes on chromosome 1 (GRIK3, glutamate receptor ionotropic kainate 3), chromosome 4 (KLHL2, Kelch-like protein 2), chromosome 17 (CRHR1, corticotropin-releasing hormone receptor 1 and MAPT, microtubule-associated protein Tau), and on chromosome 18 (CELF4, CUGBP elav-like family member 4). We found no evidence for genetic differences in the common allelic architecture of neuroticism by sex. By comparing our findings with those of the Psychiatric Genetics Consortia, we identified a strong genetic correlation between neuroticism and MDD (0.64) and a less strong but significant genetic correlation with schizophrenia (0.22), although not with bipolar disorder. Polygenic risk scores derived from the primary UK Biobank sample captured about 1% of the variance in neuroticism in independent samples. Overall, our findings confirm a polygenic basis for neuroticism and substantial shared genetic architecture between neuroticism and MDD. The identification of 9 new neuroticism-associated loci will drive forward future work on the neurobiology of neuroticism and related phenotypes.
3,022 downloads genetics
In order to infer that a single-nucleotide polymorphism (SNP) either affects a phenotype or is linkage disequilibrium with a causal site, we must have some assurance that any SNP-phenotype correlation is not the result of confounding with environmental variables that also affect the trait. In this work we study the properties of LD Score regression, a recently developed method for using summary statistics from genome-wide association studies (GWAS) to ensure that confounding does not inflate the number of false positives. We do not treat the effects of genetic variation as a random variable and thus are able to obtain results about the unbiasedness of this method. We demonstrate that LD Score regression can produce estimates of confounding at null SNPs that are unbiased or conservative under fairly general conditions. This robustness holds in the case of the parent genotype affecting the offspring phenotype through some environmental mechanism, despite the resulting correlation over SNPs between LD Scores and the degree of confounding. Additionally, we demonstrate that LD Score regression can produce reasonably robust estimates of the genetic correlation, even when its estimates of the genetic covariance and the two univariate heritabilities are substantially biased.
3,007 downloads genetics
Wesley A. Wierson, Jordan M. Welker, Maira P. Almeida, Carla M. Mann, Dennis A. Webster, Melanie E. Torrie, Trevor J. Weiss, Macy K. Vollbrecht, Merrina Lan, Kenna C. McKeighan, Jacklyn Levey, Zhitao Ming, Alec Wehmeier, Christopher S. Mikelson, Jeffrey A. Haltom, Kristen M. Kwan, Chi-Bin Chien, Darius Balciunas, Stephen C Ekker, Karl J Clark, Beau R. Webber, Branden Moriarity, Staci L. Solin, Daniel F. Carlson, Drena L. Dobbs, Maura McGrail, Jeffrey J Essner
Choices for genome engineering and integration involve high efficiency with little or no target specificity or high specificity with low activity. Here, we describe a targeted integration strategy, called GeneWeld, and a vector series for gene tagging, pGTag (plasmids for Gene Tagging), which promote highly efficient and precise targeted integration in zebrafish embryos, pig fibroblasts, and human cells utilizing the CRISPR/Cas9 system. Our work demonstrates that in vivo targeting of a genomic locus of interest with CRISPR/Cas9 and a donor vector containing as little as 24 to 48 base pairs of homology directs precise and efficient knock-in when the homology arms are exposed with a double strand break in vivo. Our results suggest that the length of homology is not important in the design of knock-in vectors but rather how the homology is presented to a double strand break in the genome. Given our results targeting multiple loci in different species, we expect the accompanying protocols, vectors, and web interface for homology arm design to help streamline gene targeting and applications in CRISPR and TALEN compatible systems.
3,002 downloads genetics
Aude SAINT PIERRE, Joanna Giemza, Matilde Karakachoff, Isabel Alves, Philippe Amouyel, Jean-François Dartigues, Christophe Tzourio, Martial Monteil, Pilar Galan, Serge Hercberg, Richard Redon, Emmanuelle Génin, Christian Dina
The study of the genetic structure of different countries within Europe has provided significant insights into their demographic history and their actual stratification. Although France occupies a particular location at the end of the European peninsula and at the crossroads of migration routes, few population genetic studies have been conducted so far with genome-wide data. In this study, we analyzed SNP-chip genetic data from 2 184 individuals born in France who were enrolled in two independent population cohorts. Using FineStructure, six different genetic clusters of individuals were found that were very consistent between the two cohorts. These clusters match extremely well the geography and overlap with historical and linguistic divisions of France. By modeling the relationship between genetics and geography using EEMS software, we were able to detect gene flow barriers that are similar in the two cohorts and corresponds to major French rivers or mountains. Estimations of effective population sizes using IBDNe program also revealed very similar patterns in both cohorts with a rapid increase of effective population sizes over the last 150 generations similar to what was observed in other European countries. A marked bottleneck is also consistently seen in the two datasets starting in the fourteenth century when the Black Death raged in Europe. In conclusion, by performing the first exhaustive study of the genetic structure of France, we fill a gap in the genetic studies in Europe that would be useful to medical geneticists but also historians and archeologists.
3,001 downloads genetics
Linkage and association studies have mapped thousands of genomic regions that contribute to phenotypic variation, but narrowing these regions to the underlying causal genes and variants has proven much more challenging. Resolution of genetic mapping is limited by the recombination rate. We developed a method that uses CRISPR to build mapping panels with targeted recombination events. We tested the method by generating a panel with recombination events spaced along a yeast chromosome arm, mapping trait variation, and then targeting a high density of recombination events to the region of interest. Using this approach, we fine-mapped manganese sensitivity to a single polymorphism in the transporter Pmr1. Targeting recombination events to regions of interest allows us to rapidly and systematically identify causal variants underlying trait differences.
2,994 downloads genetics
Dramatic events in human prehistory, such as the spread of agriculture to Europe from Anatolia and the Late Neolithic/Bronze Age (LNBA) migration from the Pontic-Caspian steppe, can be investigated using patterns of genetic variation among the people that lived in those times. In particular, studies of differing female and male demographic histories on the basis of ancient genomes can provide information about complexities of social structures and cultural interactions in prehistoric populations. We use a mechanistic admixture model to compare the sex-specifically-inherited X chromosome to the autosomes in 20 early Neolithic and 16 LNBA human remains. Contrary to previous hypotheses suggested by the patrilocality of many agricultural populations, we find no evidence of sex-biased admixture during the migration that spread farming across Europe during the early Neolithic. For later migrations from the Pontic steppe during the LNBA, however, we estimate a dramatic male bias, with ~5-14 migrating males for every migrating female. We find evidence of ongoing, primarily male, migration from the steppe to central Europe over a period of multiple generations, with a level of sex bias that excludes a pulse migration during a single generation. The contrasting patterns of sex-specific migration during these two migrations suggest a view of differing cultural histories in which the Neolithic transition was driven by mass migration of both males and females in roughly equal numbers, perhaps whole families, whereas the later Bronze Age migration and cultural shift were instead driven by male migration, potentially connected to new technology and conquest.
2,979 downloads genetics
Abstract Background: Lentiviral vectors (LVs) allowing efficient establishment of stable transgene overexpression mammalian and human cell lines are invaluable tools for genetic research. Currently, although LV transductions are broadly adopted, they are often limited due to their low titers for efficient transduction. Results: Here, we described a set of optimized, efficient techniques, which could produce sufficiently high LV titers, and, provide efficient transduction of cells. According to these optimizations, most of the mammalian and human cells, both primary cells and cell lines, could be transduced successfully with high levels of transgene stable expression, including both constitutive and induced expressions. Conclusions: Our data demonstrated the highly usefulness of our optimized methods. Therefore, this study provided an efficient method for most of LV transduction experiments in vitro.
2,972 downloads genetics
We combined de novo mutation (DNM) data from 10,927 cases of developmental delay and autism to identify 301 candidate neurodevelopmental disease genes showing an excess of missense and/or likely gene-disruptive (LGD) mutations. 164 genes were predicted by two different DNM models, including 116 genes with an excess of LGD mutations. Among the 301 genes, 76% show DNM in both autism and intellectual disability/developmental delay cohorts where they occur in 10.3% and 28.4% of the cases, respectively. Intersecting these results with copy number variation (CNV) morbidity data identifies a significant enrichment for the intersection of our gene set and genomic disorder regions (36/301, LR+ 2.53, p=0.0005). This analysis confirms many recurrent LGD genes and CNV deletion syndromes (e.g., KANSL1, PAFAH1B1, RA1, etc.), consistent with a model of haploinsufficiency. We also identify genes with an excess of missense DNMs overlapping deletion syndromes (e.g., KIF1A and the 2q37 deletion) as well as duplication syndromes, such as recurrent MAPK3 missense mutations within the chromosome 16p11.2 duplication, recurrent CHD4 missense DNMs in the 12p13 duplication region, and recurrent WDFY4 missense DNMs in the 10q11.23 duplication region. Finally, we also identify pathogenic CNVs overlapping more than one recurrently mutated gene (e.g., Sotos and Kleefstra syndromes) raising the possibility that multiple gene-dosage imbalances may contribute to phenotypic complexity of these disorders. Network analyses of genes showing an excess of DNMs confirm previous well-known enrichments but also highlight new functional networks, including cell-specific enrichments in the D1+ and D2+ spiny neurons of the striatum for both recurrently mutated genes and genes where missense mutations cluster.
2,962 downloads genetics
The mosquito Aedes aegypti is a potent vector of the Chikungunya, yellow fever, and Dengue viruses, which result in hundreds of millions of infections and over 50,000 human deaths per year. Loss-of-function mutagenesis in Ae. aegypti has been established with TALENs, ZFNs, and homing endonucleases, which require the engineering of DNA-binding protein domains to generate target specificity for a particular stretch of genomic DNA. Here, we describe the first use of the CRISPR-Cas9 system to generate targeted, site-specific mutations in Ae. aegypti. CRISPR-Cas9 relies on RNA-DNA base-pairing to generate targeting specificity, resulting in cheaper, faster, and more flexible genome-editing reagents. We investigate the efficiency of reagent concentrations and compositions, demonstrate the ability of CRISPR-Cas9 to generate several different types of mutations via disparate repair mechanisms, and show that stable germ-line mutations can be readily generated at the vast majority of genomic loci tested. This work offers a detailed exploration into the optimal use of CRISPR-Cas9 in Ae. aegypti that should be applicable to non-model organisms previously out of reach of genetic modification.
2,957 downloads genetics
Po-Ru Loh, Gaurav Bhatia, Alexander Gusev, Hilary K Finucane, Brendan K Bulik-Sullivan, Samuela J Pollack, Schizophrenia Working Group of the Psychiatric Genomics Consortiumy, Teresa R de Candia, Sang Hong Lee, Naomi R. Wray, Kenneth S. Kendler, Michael C O’Donovan, Benjamin M Neale, Nick Patterson, Alkes L. Price
Heritability analyses of GWAS cohorts have yielded important insights into complex disease architecture, and increasing sample sizes hold the promise of further discoveries. Here, we analyze the genetic architecture of schizophrenia in 49,806 samples from the PGC, and nine complex diseases in 54,734 samples from the GERA cohort. For schizophrenia, we infer an overwhelmingly polygenic disease architecture in which ≥71% of 1Mb genomic regions harbor at least one variant influencing schizophrenia risk. We also observe significant enrichment of heritability in GC-rich regions and in higher-frequency SNPs for both schizophrenia and GERA diseases. In bivariate analyses, we observe significant genetic correlations (ranging from 0.18 to 0.85) among several pairs of GERA diseases; genetic correlations were on average 1.3x stronger than correlations of overall disease liabilities. To accomplish these analyses, we developed a fast algorithm for multi-component, multi-trait variance components analysis that overcomes prior computational barriers that made such analyses intractable at this scale.
2,950 downloads genetics
This study investigates the creation of polygenic scores (PGS)s for human population research. PGSs are a linear, usually weighted, combination of risk alleles that estimate the cumulative genetic risk of an individual for a particular trait. While conceptually simple, there are numerous ways to estimate PGSs, not all achieving the same end goals. In this paper, we systematically investigate the impact of four key decisions in the building of PGSs from published genome-wide association meta-analysis results: 1) whether to use single nucleotide polymorphisms (SNPs) assessed by imputation, 2) criteria for selecting which SNPs to include in the score, 3) whether to account for linkage disequilibrium (LD), and 4) if accounting for LD, which type of method best captures the correlation structure among SNPs (i.e. clumping vs. pruning). Using the Health and Retirement Study (HRS), a nationally representative, population-based longitudinal panel study of Americans over the age of 50, we examine the predictive ability as well as the variability and co-variability in PGSs arising from these different estimation approaches. We examine four traits with large published and replicated genome-wide association studies (height, body mass index, educational attainment, and depression). Our central finding demonstrates PGSs that include all available SNPs either explain the most amount of variation in an outcome or are not significantly different than the PGSs that does. Thus, for reproducibility through rigor and transparency, we recommend that researchers include a PGS with all available SNPs as a reference, and provide substantial justification for using alternative methods.
2,936 downloads genetics
Biobank-based genome-wide association studies are enabling exciting insights in complex trait genetics, but much uncertainty remains over best practices for optimizing statistical power and computational efficiency in GWAS while controlling confounders. Here, we introduce a much faster version of our BOLT-LMM Bayesian mixed model association method --- capable of running analyses of the full UK Biobank cohort in a few days on a single compute node --- and show that it produces highly powered, robust test statistics when run on all 459K European samples (retaining related individuals). When used to conduct a GWAS for height in UK Biobank, BOLT-LMM achieved power equivalent to linear regression on 650K samples --- a 93% increase in effective sample size versus the common practice of analyzing unrelated British samples using linear regression (UK Biobank documentation; Bycroft et al. bioRxiv). Across a broader set of 23 highly heritable traits, the total number of independent GWAS loci detected increased from 5,839 to 10,759, an 84% increase. We recommend the use of BOLT-LMM (retaining related individuals) for biobank-scale analyses, and we have publicly released BOLT-LMM summary association statistics for the 23 traits analyzed as a resource for all researchers.
2,935 downloads genetics
Rare diseases and their underlying molecular causes are often poorly studied, posing challenges for patient diagnosis and prognosis. The development of next-generation sequencing and its decreasing costs promises to alleviate such issues by supplying personal genomic information at a moderate price. Here, we used crowdfunding as an alternative funding source to sequence the genome of Lil BUB, a celebrity cat affected by rare disease phenotypes characterized by supernumerary digits, osteopetrosis and dwarfism, all phenotypic traits that also occur in human patients. We discovered that Lil BUB is affected by two distinct mutations: a heterozygous mutation in the limb enhancer of the Sonic hedgehog gene, previously associated with polydactyly in Hemingway cats; and a novel homozygous frameshift deletion affecting the TNFRSF11A (RANK) gene, which has been linked to osteopetrosis in humans. We communicated the progress of this project to a large online audience, detailing the 'inner workings' of personalized whole genome sequencing with the aim of improving genetic literacy. Our results highlight the importance of genomic analysis in the identification of disease-causing mutations and support crowdfunding as a means to fund low-budget projects and as a platform for scientific communication.
2,928 downloads genetics
Gail Davies, Max Lam, Sarah E Harris, Joey W. Trampush, Michelle Luciano, W. David Hill, Saskia P Hagenaars, Stuart J Ritchie, Riccardo E Marioni, Chloe Fawns-Ritchie, David CM Liewald, Judith A Okely, Ari V Ahola-Olli, Catriona LK Barnes, Lars Bertram, Joshua C Bis, Katherine E. Burdick, Andrea Christoforou, Pamela DeRosse, Srdjan Djurovic, Thomas Espeseth, Stella Giakoumaki, Sudheer Giddaluru, Daniel E Gustavson, Caroline Hayward, Edith Hofer, M Arfan Ikram, Robert Karlsson, Emma Knowles, Jari Lahti, Markus Leber, Shuo Li, Karen A. Mather, Ingrid Melle, Derek Morris, Christopher Oldmeadow, Teemu Palviainen, Antony Payton, Raha Pazoki, Katja Petrovic, Chandra A Reynolds, Muralidharan Sargurupremraj, Markus Scholz, Jennifer A. Smith, Albert V Smith, Natalie Terzikhan, Anbu Thalamuthu, Stella Trompet, Sven J. van der Lee, Erin B. Ware, B Gwen Windham, Margaret J Wright, Jingyun Yang, Jin Yu, David Ames, Najaf Amin, Philippe Amouyel, Ole A Andreassen, Nicola J. Armstrong, Amelia A. Assareh, John R. Attia, Deborah Attix, Dimitrios Avramopoulos, David A. Bennett, Anne C. Böhmer, Patricia A. Boyle, Henry Brodaty, Harry Campbell, Tyrone D. Cannon, Elizabeth T. Cirulli, Eliza Congdon, Emily Drabant Conley, Janie Corley, Simon R Cox, Anders M Dale, Abbas Dehghan, Danielle Dick, Dwight Dickinson, Johan G. Eriksson, Evangelos Evangelou, Jessica D. Faul, Ian Ford, Nelson A. Freimer, He Gao, Ina Giegling, Nathan A Gillespie, Scott D Gordon, Rebecca F. Gottesman, Michael E. Griswold, Vilmundur Gudnason, Tamara B. Harris, Annette M Hartmann, Alex Hatzimanolis, Gerardo Heiss, Elizabeth G. Holliday, Peter K Joshi, Mika Kähönen, Sharon LR Kardia, Ida Karlsson, Luca Kleineidam, David S. Knopman, Nicole A Kochan, Bettina Konte, John B. Kwok, Stephanie Le Hellard, Teresa Lee, Terho Lehtimäki, Shu-Chen Li, Tian Liu, Marisa Koini, Edythe London, Will T Longstreth, Oscar L. Lopez, Anu Loukola, Tobias Luck, Astri J. Lundervold, Anders Lundquist, Leo-Pekka Lyytikäinen, Nicholas G Martin, Grant W. Montgomery, Alison D Murray, Anna C. Need, Raymond Noordam, Lars Nyberg, William Ollier, Goran Papenberg, Alison Pattie, Ozren Polasek, Russell A. Poldrack, Bruce M Psaty, Simone Reppermund, Steffi G. Riedel-Heller, Richard J Rose, Jerome I Rotter, Panos Roussos, Suvi P Rovio, Yasaman Saba, Fred W. Sabb, Perminder S. Sachdev, Claudia Satizabal, Matthias Schmid, Rodney J. Scott, Matthew A. Scult, Jeannette Simino, P. Eline Slagboom, Nikolaos Smyrnis, Aïcha Soumaré, Nikos C. Stefanis, David J. Stott, Richard E Straub, Kjetil Sundet, Adele M. Taylor, Kent D Taylor, Ioanna Tzoulaki, Christophe Tzourio, André Uitterlinden, Veronique Vitart, Aristotle N. Voineskos, Jaakko Kaprio, Michael Wagner, Holger Wagner, Leonie Weinhold, K. Hoyan Wen, Elisabeth Widen, Qiong Yang, Wei Zhao, Hieab HH Adams, Dan E Arking, Robert M. Bilder, Panos Bitsios, Eric Boerwinkle, Ornit Chiba-Falek, Aiden Corvin, Philip L. De Jager, Stéphanie Debette, Gary Donohoe, Paul Elliott, Annette L. Fitzpatrick, Michael Gill, David C Glahn, Sara Hägg, Narelle K. Hansell, Ahmad R Hariri, M Kamran Ikram, J Wouter Jukema, Eero Vuoksimaa, Matthew C. Keller, William S Kremen, Lenore Launer, Ulman Lindenberger, Aarno Palotie, Nancy L. Pedersen, Neil Pendleton, David J Porteous, Katri Räikkönen, Olli T Raitakari, Alfredo Ramirez, Ivar Reinvang, Igor Rudan, Dan Rujescu, Reinhold Schmidt, Helena Schmidt, Peter W. Schofield, Peter R. Schofield, John M. Starr, Vidar M. Steen, Julian N. Trollor, Steven T. Turner, Cornelia M Van Duijn, Arno Villringer, Daniel R Weinberger, David R. Weir, James F Wilson, Anil Malhotra, Andrew M McIntosh, Catharine R Gale, Sudha Seshadri, Thomas H Mosley, Jan Bressler, Todd Lencz, Ian J Deary
General cognitive function is a prominent human trait associated with many important life outcomes including longevity. The substantial heritability of general cognitive function is known to be polygenic, but it has had little explication in terms of the contributing genetic variants. Here, we combined cognitive and genetic data from the CHARGE and COGENT consortia, and UK Biobank (total N=280,360; age range = 16 to 102). We found 9,714 genome-wide significant SNPs (P<5 x 10-8) in 99 independent loci. Most showed clear evidence of functional importance. Among many novel genes associated with general cognitive function were SGCZ, ATXN1, MAPT, AUTS2, and P2RY6. Within the novel genetic loci were variants associated with neurodegenerative disorders, neurodevelopmental disorders, physical and psychiatric illnesses, brain structure, and BMI. Gene-based analyses found 536 genes significantly associated with general cognitive function; many were highly expressed in the brain, and associated with neurogenesis and dendrite gene sets. Genetic association results predicted up to 4% of general cognitive function variance in independent samples. There was significant genetic overlap between general cognitive function and information processing speed, as well as many health variables including longevity.
2,922 downloads genetics
Gaurav Bhatia, Alexander Gusev, Po-Ru Loh, Bjarni J Vilhjálmsson, Stephan Ripke, Schizophrenia Working Group of the Psychiatric Genomics Consortium, Shaun Purcell, Eli Stahl, Mark Daly, Teresa R de Candia, Kenneth S. Kendler, Michael C O’Donovan, Sang Hong Lee, Naomi R. Wray, Benjamin M Neale, Matthew C. Keller, Noah A Zaitlen, Bogdan Pasaniuc, Jian Yang, Alkes L. Price
While genome-wide significant associations generally explain only a small proportion of the narrow-sense heritability of complex disease (h2), recent work has shown that more heritability is explained by all genotyped SNPs (hg2). However, much of the heritability is still missing (hg2 < h2). For example, for schizophrenia, h2 is estimated at 0.7-0.8 but hg2 is estimated at ~0.3. Efforts at increasing coverage through accurately imputed variants have yielded only small increases in the heritability explained, and poorly imputed variants can lead to assay artifacts for case-control traits. We propose to estimate the heritability explained by a set of haplotype variants (haploSNPs) constructed directly from the study sample (hhap2). Our method constructs a set of haplotypes from phased genotypes by extending shared haplotypes subject to the 4-gamete test. In a large schizophrenia data set (PGC2-SCZ), haploSNPs with MAF > 0.1% explained substantially more phenotypic variance (hhap2 = 0.64 (S.E. 0.084)) than genotyped SNPs alone (hg2 = 0.32 (S.E. 0.029)). These estimates were based on cross-cohort comparisons, ensuring that cohort-specific assay artifacts did not contribute to our estimates. In a large multiple sclerosis data set (WTCCC2-MS), we observed an even larger difference between hhap2 and hg2, though data from other cohorts will be required to validate this result. Overall, our results suggest that haplotypes of common SNPs can explain a large fraction of missing heritability of complex disease, shedding light on genetic architecture and informing disease mapping strategies.
2,900 downloads genetics
Any individual's genome contains ~4-5 million genetic variants that differ from reference, and understanding how these variants give rise to trait diversity and disease susceptibility is a central goal of human genetics. A vast majority (96-99%) of an individual's variants are common, though at a population level the overwhelming majority of variants are rare. Because of their scarcity in an individual's genome, rare variants that play important roles in complex traits are likely to have large functional effects. Mutations that cause an exon to be skipped can have severe functional consequences on gene function, and many known disease-causing mutations reduce or eliminate exon recognition. Here we explore the extent to which rare genetic variation in humans results in near complete loss of exon recognition. We developed a Multiplexed Functional Assay of Splicing using Sort-seq (MFASS) that allows us to measure exon inclusion in thousands of human exons and surrounding intronic sequence simultaneously. We assayed 27,733 extant variants in the Exome Aggregation Consortium (ExAC within or adjacent to 2,339 human exons, and found that 3.8% (1,050) of the variants, almost all of which were extremely rare, led to large-effect defects in exon recognition. Importantly, we find that 83% of these splice-disrupting variants (SDVs) are located outside of canonical splice sites, are distributed evenly across distinct exonic and intronic regions, and are difficult to predict a priori. Our results indicate that loss of exon recognition is an important and underappreciated means by which rare variants exert large functional effects, and that MFASS enables their empirical assessment for splicing defects at scale.
2,899 downloads genetics
CRISPR/Cas technology allows rapid, site-specific genome modification in a wide variety of organisms. CRISPR components produced by integrated transgenes have been shown to mutagenise some genomic target sites in Drosophila melanogaster with high efficiency, but whether this is a general feature of this system remains unknown. Here, we systematically evaluate available CRISPR/Cas reagents and experimental designs in Drosophila. Our findings allow evidence-based choices of Cas9 sources and strategies for generating knock-in alleles. We perform gene editing at a large number of target sites using a highly active Cas9 line and a collection of transgenic gRNA strains. The vast majority of target sites can be mutated with remarkable efficiency using these tools. We contrast our method to recently developed autonomous gene drive technology for genome engineering (Gantz & Bier, 2015) and conclude that optimised CRISPR with independent transgenes is as efficient, more versatile and does not represent a biosafety risk.
2,884 downloads genetics
It has been widely accepted that the Finno-Ugric Hungarian language, originated from proto Uralic people, was brought into the Carpathian Basin by the Hungarian Conquerors. From the middle of the 19th century this view prevailed against the deep-rooted Hungarian Hun tradition, maintained in folk memory as well as in Hungarian and foreign written medieval sources, which claimed that Hungarians were kinsfolk of the Huns. In order to shed light on the genetic origin of the Conquerors we sequenced 102 mitogenomes from early Conqueror cemeteries and compared them to sequences of all available databases. We applied novel population genetic algorithms, named Shared Haplogroup Distance and MITOMIX, to reveal past admixture of maternal lineages. Phylogenetic and population genetic analysis indicated that more than one third of the Conqueror maternal lineages were derived from Central-Inner Asia and their most probable ultimate sources were the Asian Huns. The rest of the lineages most likely originated from the Bronze Age Potapovka-Poltavka-Srubnaya cultures of the Pontic-Caspian steppe, which area was part of the later European Hun empire. Our data give support to the Hungarian Hun tradition and provides indirect evidence for the genetic connection between Asian and European Huns. Available data imply that the Conquerors did not have a major contribution to the gene pool of the Carpathian Basin, raising doubts about the Conqueror origin of Hungarian language.
2,858 downloads genetics
Bayesian methods for large-scale multiple regression provide attractive approaches to the analysis of genome-wide association studies (GWAS). For example, they can estimate heritability of complex traits, allowing for both polygenic and sparse models; and by incorporating external genomic data into the priors they can increase power and yield new biological insights. However, these methods require access to individual genotypes and phenotypes, which are often not easily available. Here we provide a framework for performing these analyses without individual-level data. Specifically, we introduce a "Regression with Summary Statistics" (RSS) likelihood, which relates the multiple regression coefficients to univariate regression results that are often easily available. The RSS likelihood requires estimates of correlations among covariates (SNPs), which also can be obtained from public databases. We perform Bayesian multiple regression analysis by combining the RSS likelihood with previously-proposed prior distributions, sampling posteriors by Markov chain Monte Carlo. In a wide range of simulations RSS performs similarly to analyses using the individual data, both for estimating heritability and detecting associations. We apply RSS to a GWAS of human height that contains 253,288 individuals typed at 1.06 million SNPs, for which analyses of individual-level data are practically impossible. Estimates of heritability (52%) are consistent with, but more precise, than previous results using subsets of these data. We also identify many previously-unreported loci that show evidence for association with height in our analyses. Software implementing RSS is available at https://github.com/stephenslab/rss.
2,849 downloads genetics
One of the major goals of population genetics is to quantitatively understand variation of genetic polymorphisms among individuals. To this end, researchers have developed sophisticated statistical methods to capture the complex population structure that underlies observed genotypes in humans, and such methods have been effective for analyzing modestly sized genomic data sets. However, the number of genotyped humans has grown significantly in recent years, and it is accelerating. In aggregate about 1M individuals have been genotyped to date. Analyzing these data will bring us closer to a nearly complete picture of human genetic variation; but existing methods for population genetics analysis do not scale to data of this size. To solve this problem we developed TeraStructure. TeraStructure is a new algorithm to fit Bayesian models of genetic variation in human populations on tera-sample-sized data sets (1012 observed genotypes, e.g., 1M individuals at 1M SNPs). It is a principled approach to Bayesian inference that iterates between subsampling locations of the genome and updating an estimate of the latent population structure of the individuals. On data sets of up to 2K individuals, TeraStructure matches the existing state of the art in terms of both speed and accuracy. On simulated data sets of up to 10K individuals, TeraStructure is twice as fast as existing methods and has higher accuracy in recovering the latent population structure. On genomic data simulated at the tera-sample-size scales, TeraStructure continues to be accurate and is the only method that can complete its analysis.
- Top preprints of 2018
- Paper search
- Author leaderboards
- Overall metrics
- The API
- Email newsletter
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!