Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 62,472 bioRxiv papers from 277,337 authors.
Most downloaded bioRxiv papers, all time
in category genetics
3,478 results found. For more information, click each entry to expand.
4,095 downloads genetics
Carlos Eduardo G. Amorim, Stefania Vai, Cosimo Posth, Alessandra Modi, István Koncz, Susanne Hakenbeck, Maria Cristina La Rocca, Balazs Mende, Dean Bobo, Walter Pohl, Luisella Pejrani Baricco, Elena Bedini, Paolo Francalacci, Caterina Giostra, Tivadar Vida, Daniel Winger, Uta von Freeden, Silvia Ghirotto, Martina Lari, Guido Barbujani, Johannes Krause, David Caramelli, Patrick J Geary, Krishna R Veeramah
Despite centuries of research, much about the barbarian migrations that took place between the fourth and sixth centuries in Europe remains hotly debated. To better understand this key era that marks the dawn of modern European societies, we obtained ancient genomic DNA from 63 samples from two cemeteries (from Hungary and Northern Italy) that have been previously associated with the Longobards, a barbarian people that ruled large parts of Italy for over 200 years after invading from Pannonia in 568 CE. Our dense cemetery-based sampling revealed that each cemetery was primarily organized around one large pedigree, suggesting that biological relationships played an important role in these early Medieval societies. Moreover, we identified genetic structure in each cemetery involving at least two groups with different ancestry that were very distinct in terms of their funerary customs. Finally, our data was consistent with the proposed long-distance migration from Pannonia to Northern Italy.
4,078 downloads genetics
Mendelian randomization (MR) is widely used to identify causal relationships among heritable traits, but can be confounded by genetic correlations reflecting shared etiology. We propose a model in which a latent causal variable mediates the genetic correlation between two traits. Under the latent causal variable (LCV) model, trait 1 is fully genetically causal for trait 2 if it is perfectly genetically correlated with the latent variable, implying that the entire genetic component of trait 1 is causal for trait 2; it is partially genetically causal for trait 2 if the latent variable has a high genetic correlation with the latent variable, implying that part of the genetic component of trait 1 is causal for trait 2. To quantify the degree of partial genetic causality, we define the genetic causality proportion (gcp). We fit this model using mixed fourth moments E(α21α1α2) and E(α22α1α2) of marginal effect sizes for each trait, exploiting the fact that if trait 1 is causal for trait 2 then SNPs with large effects on trait 1 (large E(α21)) will have correlated effects on trait 2 (large E(α1α2)), but not vice versa. We performed simulations under a wide range of genetic architectures and determined that LCV, unlike state-of-the-art MR methods, produced well-calibrated false positive rates and reliable gcp estimates in the presence of genetic correlations and asymmetric genetic architectures; we also determined that LCV is well-powered to detect a causal effect. We applied LCV to GWAS summary statistics for 52 traits (average N=331k), identifying partially or fully genetically causal effects (1% FDR) for 59 pairs of traits, including 30 pairs of traits with high gcp estimates (gcp>0.6). Results consistent with the published literature included causal effects on myocardial infarction (MI) for LDL, triglycerides and BMI. Novel findings included an effect of LDL on bone mineral density, consistent with clinical trials of statins in osteoporosis. These results demonstrate that it is possible to distinguish between correlation and causation using genetic data.
3,905 downloads genetics
Charleston W.K. Chiang, Joseph H. Marcus, Carlo Sidore, Hussein Al-Asadi, Magdalena Zoledziewska, Maristella Pitzalis, Fabio Busonero, Andrea Maschio, Giorgio Pistis, Maristella Steri, Andrea Angius, Kirk E Lohmueller, Goncalo R Abecasis, David Schlessinger, Francesco Cucca, John Novembre
The population of the Mediterranean island of Sardinia has made important contributions to genome-wide association studies of traits and diseases. The history of the Sardinian population has also been the focus of much research, and in recent ancient DNA (aDNA) studies, Sardinia has provided unique insight into the peopling of Europe and the spread of agriculture. In this study, we analyze whole-genome sequences of 3,514 Sardinians to address hypotheses regarding the founding of Sardinia and its relation to the peopling of Europe, including examining fine-scale substructure, population size history, and signals of admixture. We find the population of the mountainous Gennargentu region shows elevated genetic isolation with higher levels of ancestry associated with mainland Neolithic farmers and depleted ancestry associated with more recent Bronze Age Steppe migrations on the mainland. Notably, the Gennargentu region also has elevated levels of pre-Neolithic hunter-gatherer ancestry and increased affinity to Basque populations. Further, allele sharing with pre-Neolithic and Neolithic mainland populations is larger on the X chromosome compared to the autosome, providing evidence for a sex-biased demographic history in Sardinia. These results give new insight to the demography of ancestral Sardinians and help further the understanding of sharing of disease risk alleles between Sardinia and mainland populations.
3,867 downloads genetics
Heritability estimation provides important information about the relative contribution of genetic and environmental factors to phenotypic variation, and provides an upper bound for the utility of genetic risk prediction models. Recent technological and statistical advances have enabled the estimation of additive heritability attributable to common genetic variants (SNP heritability) across a broad phenotypic spectrum. However, assessing the comparative heritability of multiple traits estimated in different cohorts may be misleading due to the population-specific nature of heritability. Here we report the SNP heritability for 551 complex traits derived from the large-scale, population-based UK Biobank, comprising both quantitative phenotypes and disease codes, and examine the moderating effect of three major demographic variables (age, sex and socioeconomic status) on the heritability estimates. Our study represents the first comprehensive phenome-wide heritability analysis in the UK Biobank, and underscores the importance of considering population characteristics in comparing and interpreting heritability.
3,859 downloads genetics
SNP heritability, the proportion of phenotypic variance explained by SNPs, has been reported for many hundreds of traits. Its estimation requires strong prior assumptions about the distribution of heritability across the genome, but the assumptions in current use have not been thoroughly tested. By analyzing imputed data for a large number of human traits, we empirically derive a model that more accurately describes how heritability varies with minor allele frequency, linkage disequilibrium and genotype certainty. Across 19 traits, our improved model leads to estimates of common SNP heritability on average 43% (SD 3) higher than those obtained from the widely-used software GCTA, and 25% (SD 2) higher than those from the recently-proposed extension GCTA-LDMS. Previously, DNaseI hypersensitivity sites were reported to explain 79% of SNP heritability; using our improved heritability model their estimated contribution is only 24%.
3,825 downloads genetics
Target identification (identifying the correct drug targets for each disease) and target validation (demonstrating the effect of target perturbation on disease biomarkers and disease end-points) are essential steps in drug development. We showed previously that biomarker and disease endpoint associations of single nucleotide polymorphisms (SNPs) in a gene encoding a drug target accurately depict the effect of modifying the same target with a pharmacological agent; others have shown that genomic support for a target is associated with a higher rate of drug development success. To delineate drug development (including repurposing) opportunities arising from this paradigm, we connected complex disease- and biomarker-associated loci from genome wide association studies (GWAS) to an updated set of genes encoding druggable human proteins, to compounds with bioactivity against these targets and, where these were licensed drugs, to clinical indications. We used this set of genes to inform the design of a new genotyping array, to enable druggable genome-wide association studies for drug target selection and validation in human disease.
3,820 downloads genetics
The ever-growing genome-wide association studies (GWAS) have revealed widespread pleiotropy. To exploit this, various methods which consider variant association with multiple traits jointly have been developed. However, most effort has been put on improving discovery power: how to replicate and interpret these discovered pleiotropic loci using multivariate methods has yet to be discussed fully. Using only multiple publicly available single-trait GWAS summary statistics, we develop a fast and flexible multi-trait framework that contains modules for (i) multi-trait genetic discovery, (ii) replication of locus pleiotropic profile, and (iii) multi-trait conditional analysis. The procedure is able to handle any level of sample overlap. As an empirical example, we discovered and replicated 23 novel pleiotropic loci for human anthropometry and evaluated their pleiotropic effects on other traits. By applying conditional multivariate analysis on the 23 loci, we discovered and replicated two additional multi-trait associated SNPs. Our results provide empirical evidence that multi-trait analysis allows detection of additional, replicable, highly pleiotropic genetic associations without genotyping additional individuals. The methods are implemented in a free and open source R package MultiABEL.
3,807 downloads genetics
Hilary K Finucane, Yakir A. Reshef, Verneri Anttila, Kamil Slowikowski, Alexander Gusev, Andrea Byrnes, Steven Gazal, Po-Ru Loh, Caleb Lareau, Noam Shoresh, Giulio Genovese, Arpiar Saunders, Evan Macosko, Samuela Pollack, The Brainstorm Consortium, John RB Perry, Jason D Buenrostro, Bradley E. Bernstein, Soumya Raychaudhuri, Steven McCarroll, Benjamin M Neale, Alkes L. Price
Genetics can provide a systematic approach to discovering the tissues and cell types relevant for a complex disease or trait. Identifying these tissues and cell types is critical for following up on non-coding allelic function, developing ex-vivo models, and identifying therapeutic targets. Here, we analyze gene expression data from several sources, including the GTEx and PsychENCODE consortia, together with genome-wide association study (GWAS) summary statistics for 48 diseases and traits with an average sample size of 169,331, to identify disease-relevant tissues and cell types. We develop and apply an approach that uses stratified LD score regression to test whether disease heritability is enriched in regions surrounding genes with the highest specific expression in a given tissue. We detect tissue-specific enrichments at FDR < 5% for 34 diseases and traits across a broad range of tissues that recapitulate known biology. In our analysis of traits with observed central nervous system enrichment, we detect an enrichment of neurons over other brain cell types for several brain-related traits, enrichment of inhibitory over excitatory neurons for bipolar disorder but excitatory over inhibitory neurons for schizophrenia and body mass index, and enrichments in the cortex for schizophrenia and in the striatum for migraine. In our analysis of traits with observed immunological enrichment, we identify enrichments of T cells for asthma and eczema, B cells for primary biliary cirrhosis, and myeloid cells for Alzheimer's disease, which we validated with independent chromatin data. Our results demonstrate that our polygenic approach is a powerful way to leverage gene expression data for interpreting GWAS signal.
3,745 downloads genetics
Pedigree-based analyses of intelligence have reported that genetic differences account for 50-80% of the phenotypic variation. For personality traits these effects are smaller, with 34-48% of the variance being explained by genetic differences. However, molecular genetic studies using unrelated individuals typically report a heritability estimate of around 30% for intelligence and between 0% and 15% for personality variables. Pedigree-based estimates and molecular genetic estimates may differ because current genotyping platforms are poor at tagging causal variants, variants with low minor allele frequency, copy number variants, and structural variants. Using ~20 000 individuals in the Generation Scotland family cohort genotyped for ~700 000 single nucleotide polymorphisms (SNPs), we exploit the high levels of linkage disequilibrium (LD) found in members of the same family to quantify the total effect of genetic variants that are not tagged in GWASs of unrelated individuals. In our models, genetic variants in low LD with genotyped SNPs explain over half of the genetic variance in intelligence, education, and neuroticism. By capturing these additional genetic effects our models closely approximate the heritability estimates from twin studies for intelligence and education, but not for neuroticism and extraversion. We then replicated our finding using imputed molecular genetic data from unrelated individuals to show that ~50% of differences in intelligence, and ~40% of the differences in education, can be explained by genetic effects when a larger number of rare SNPs are included. From an evolutionary genetic perspective, a substantial contribution of rare genetic variants to individual differences in intelligence and education is consistent with mutation-selection balance.
3,731 downloads genetics
Coding variants represent many of the strongest associations between genotype and phenotype, however they exhibit inter-individual differences in effect, known as variable penetrance. In this work, we study how cis-regulatory variation modifies the penetrance of coding variants in their target gene. Using functional genomic and genetic data from GTEx, we observed that in the general population, purifying selection has depleted haplotype combinations that lead to higher penetrance of pathogenic coding variants. Conversely, in cancer and autism patients, we observed an enrichment of haplotype combinations that lead to higher penetrance of pathogenic coding variants in disease implicated genes, which provides direct evidence that regulatory haplotype configuration of causal coding variants affects disease risk. Finally, we experimentally demonstrated that a regulatory variant can modify the penetrance of a coding variant by introducing a Mendelian SNP using CRISPR/Cas9 on distinct expression haplotypes and using the transcriptome as a phenotypic readout. Our results demonstrate that joint effects of regulatory and coding variants are an important part of the genetic architecture of human traits, and contribute to modified penetrance of disease-causing variants.
3,684 downloads genetics
CRISPR-cas mediated gene editing has enabled the direct manipulation of gene function in many species. However, the reproductive biology of reptiles presents unique barriers for the use of this technology, and there are currently no reptiles with effective methods for targeted mutagenesis. Here we present a new approach that enables the efficient production of CRISPR-cas induced mutations in Anolis lizards, an important model for studies of reptile evolution and development.
3,684 downloads genetics
David M. Howard, Mark James Adams, Toni-Kim Clarke, Jonathan D. Hafferty, Jude Gibson, Masoud Shirali, Jonathan Coleman, Saskia P Hagenaars, Joey Ward, Eleanor M. Wigmore, Clara Alloza, Xueyi Shen, Miruna C. Barbu, Eileen Y. Xu, Heather C Whalley, Riccardo E Marioni, David J Porteous, Gail Davies, Ian J Deary, Gibran Hemani, Klaus Berger, Henning Teismann, Rajesh Rawal, Volker Arolt, Bernhard T. Baune, Udo Dannlowski, Katharina Domschke, Chao Tian, David A. Hinds, 23andMe Research Team, Major Depressive Disorder Working Group of the Psychiatric Genomics Consortium, Maciej Trzaskowski, Enda M. Byrne, Stephan Ripke, Daniel J Smith, Patrick F Sullivan, Naomi R. Wray, Gerome Breen, Cathryn M Lewis, Andrew M McIntosh
Major depression is a debilitating psychiatric illness that is typically associated with low mood, anhedonia and a range of comorbidities. Depression has a heritable component that has remained difficult to elucidate with current sample sizes due to the polygenic nature of the disorder. To maximise sample size, we meta-analysed data on 807,553 individuals (246,363 cases and 561,190 controls) from the three largest genome-wide association studies of depression. We identified 102 independent variants, 269 genes, and 15 gene-sets associated with depression, including both genes and gene-pathways associated with synaptic structure and neurotransmission. Further evidence of the importance of prefrontal brain regions in depression was provided by an enrichment analysis. In an independent replication sample of 1,306,354 individuals (414,055 cases and 892,299 controls), 87 of the 102 associated variants were significant following multiple testing correction. Based on the putative genes associated with depression this work also highlights several potential drug repositioning opportunities. These findings advance our understanding of the complex genetic architecture of depression and provide several future avenues for understanding aetiology and developing new treatment approaches.
3,632 downloads genetics
Andrew D Grotzinger, Mijke Rhemtulla, Ronald de Vlaming, Stuart J Ritchie, Travis T. Mallard, W. David Hill, Hill F. Ip, Andrew M McIntosh, Ian J Deary, Philipp D Koellinger, K Paige Harden, Michel G. Nivard, Elliot M Tucker-Drob
Methods for using GWAS to estimate genetic correlations between pairwise combinations of traits have produced 'atlases' of genetic architecture. Genetic atlases reveal pervasive pleiotropy, and genome-wide significant loci are often shared across different phenotypes. We introduce genomic structural equation modeling (Genomic SEM), a multivariate method for analyzing the joint genetic architectures of complex traits. Using formal methods for modeling covariance structure, Genomic SEM synthesizes genetic correlations and SNP-heritabilities inferred from GWAS summary statistics of individual traits from samples with varying and unknown degrees of overlap. Genomic SEM can be used to identify variants with effects on general dimensions of cross-trait liability, boost power for discovery, and calculate more predictive polygenic scores. Finally, Genomic SEM can be used to identify loci that cause divergence between traits, aiding the search for what uniquely differentiates highly correlated phenotypes. We demonstrate several applications of Genomic SEM, including a joint analysis of GWAS summary statistics from five genetically correlated psychiatric traits. We identify 27 independent SNPs not previously identified in the univariate GWASs, 5 of which have been reported in other published GWASs of the included traits. Polygenic scores derived from Genomic SEM consistently outperform polygenic scores derived from GWASs of the individual traits. Genomic SEM is flexible, open ended, and allows for continuous innovations in how multivariate genetic architecture is modeled.
3,619 downloads genetics
Michael Inouye, Gad Abraham, Christopher P Nelson, Angela M. Wood, Michael J Sweeting, Frank Dudbridge, Florence Y Lai, Stephen Kaptoge, Marta Brozynska, Tingting Wang, Shu Ye, Thomas R Webb, Martin K. Rutter, Ioanna Tzoulaki, Riyaz S Patel, Ruth JF Loos, Bernard Keavney, Harry Hemingway, John Thompson, Hugh Watkins, Panos Deloukas, Emanuele Di Angelantonio, Adam S. Butterworth, John Danesh, Nilesh J Samani, for The UK Biobank CardioMetabolic Consortium CHD Working Group
Background: Coronary artery disease (CAD) has substantial heritability and a polygenic architecture; however, genomic risk scores have not yet leveraged the totality of genetic information available nor been externally tested at population-scale to show potential utility in primary prevention. Methods: Using a meta-analytic approach to combine large-scale genome-wide and targeted genetic association data, we developed a new genomic risk score for CAD (metaGRS), consisting of 1.7 million genetic variants. We externally tested metaGRS, individually and in combination with available conventional risk factors, in 22,242 CAD cases and 460,387 non-cases from UK Biobank. Findings: In UK Biobank, a standard deviation increase in metaGRS had a hazard ratio (HR) of 1.71 (95% CI 1.68-1.73) for CAD, greater than any other externally tested genetic risk score. Individuals in the top 20% of the metaGRS distribution had a HR of 4.17 (95% CI 3.97-4.38) compared with those in the bottom 20%. The metaGRS had higher C-index (C=0.623, 95% CI 0.615-0.631) for incident CAD than any of four conventional factors (smoking, diabetes, hypertension, and body mass index), and addition of the metaGRS to a model of conventional risk factors increased C-index by 3.7%. In individuals on lipid-lowering or anti-hypertensive medications at recruitment, metaGRS hazard for incident CAD was significantly but only partially attenuated with HR of 2.83 (95% CI 2.61-3.07) between the top and bottom 20% of the metaGRS distribution. Interpretation: Recent genetic association studies have yielded enough information to meaningfully stratify individuals using the metaGRS for CAD risk in both early and later life, thus enabling targeted primary intervention in combination with conventional risk factors. The metaGRS effect was partially attenuated by lipid and blood pressure-lowering medication, however other prevention strategies will be required to fully benefit from earlier genomic risk stratification.
3,616 downloads genetics
Early genome-wide association studies (GWAS) led to the surprising discovery that, for typical complex traits, the most significant genetic variants contribute only a small fraction of the estimated heritability. Instead, it has become clear that a huge number of common variants, each with tiny effects, explain most of the heritability. Previously, we argued that these patterns conflict with standard conceptual models, and that new models are needed. Here we provide a formal model in which genetic contributions to complex traits can be partitioned into direct effects from core genes, and indirect effects from peripheral genes acting as trans-regulators. We argue that the central importance of peripheral genes is a direct consequence of the large contribution of trans-acting variation to gene expression variation. In particular, we propose that if the core genes for a trait are co-regulated - as seems likely - then the effects of peripheral variation can be amplified by these co-regulated networks such that nearly all of the genetic variance is driven by peripheral genes. Thus our model proposes a framework for understanding key features of the architecture of complex traits.
3,550 downloads genetics
Hexanucleotide repeat expansions in the C9orf72 gene are the most common cause of amyotrophic lateral sclerosis and frontotemporal dementia (c9FTD/ALS). The nucleotide repeat expansions are translated into dipeptide repeat (DPR) proteins, which are aggregation-prone and may contribute to neurodegeneration. Studies in model organisms, including yeast and flies have converged upon nucleocytoplasmic transport as one underlying pathogenic mechanism, but a comprehensive understanding of the molecular and cellular underpinnings of DPR toxicity in human cells is still lacking. We used the bacteria-derived clustered regularly interspaced short palindromic repeats (CRISPR)-Cas9 system to perform genome-wide gene knockout screens for suppressors and enhancers of C9orf72 DPR toxicity in human cells. We validated hits by performing secondary CRISPR-Cas9 screens in primary mouse neurons. Our screens revealed genes involved in nucleocytoplasmic transport, reinforcing the previous findings from model systems. We also uncovered new potent modifiers of DPR toxicity whose gene products function in the endoplasmic reticulum (ER), proteasome, RNA processing pathways, and in chromatin modification. Since regulators of ER stress emerged prominently from the screens, we further investigated one such modifier, TMX2, which we identified as a modulator of the ER-stress signature elicited by C9orf72 DPRs in neurons. Together, this work identifies novel suppressors of DPR toxicity that represent potential therapeutic targets and demonstrates the promise of CRISPR-Cas9 screens to define mechanisms of neurodegenerative diseases.
3,502 downloads genetics
Stephen E Lincoln, Justin M Zook, Shimul Chowdhury, Shazia Mahamdallie, Andrew Fellowes, Eric W Klee, Rebecca Truty, Catherine Huang, Farol L Tomson, Megan H Cleveland, Peter M Vallone, Yan Ding, Sheila Seal, Wasanthi DeSilva, Russell K Garlick, Marc Salit, Nazneen Rahman, Stephen F Kingsmore, Swaroop Aradhya, Robert Nussbaum, Matthew J Ferber, Brian H Shirts
Purpose: Next-generation sequencing (NGS) is widely used and cost-effective. Depending on the specific methods, NGS can have limitations detecting certain technically challenging variant types even though they are both prevalent in patients and medically important. These types are underrepresented in validation studies, hindering the uniform assessment of test methodologies by laboratory directors and clinicians. Specimens containing such variants can be difficult to obtain; thus, we evaluated a novel solution to this problem. Methods: A diverse set of technically challenging variants was synthesized and introduced into a known genomic background. This specimen was sequenced by 7 laboratories using 10 different NGS workflows. Results: The specimen was compatible with all 10 workflows and presented biochemical and bioinformatic challenges similar to those of patient specimens. Only 10 of 22 challenging variants were correctly identified by all 10 workflows, and only 3 workflows detected all 22. Many, but not all, of the sensitivity limitations were bioinformatic in nature. Conclusions: Synthetic controls can provide an efficient and informative mechanism to augment studies with technically challenging variants that are difficult to obtain otherwise. Data from such specimens can facilitate inter-laboratory methodologic comparisons and can help establish standards that improve communication between clinicians and laboratories.
3,496 downloads genetics
Daphna Rothschild, Omer Weissbrod, Elad Barkan, Tal Korem, David Zeevi, Paul I Costea, Anastasia Godneva, Iris Kalka, Noam Bar, Niv Zmora, Meirav Pevsner-Fischer, David Israeli, Noa Kosower, Gal Malka, Bat Chen Wolf, Tali Avnit-Sagi, Maya Lotan-Pompan, Adina Weinberger, Zamir Halpern, Shai Carmi, Eran Elinav, Eran Segal
Human gut microbiome composition is shaped by multiple host intrinsic and extrinsic factors, but the relative contribution of host genetic compared to environmental factors remains elusive. Here, we genotyped a cohort of 696 healthy individuals from several distinct ancestral origins and a relatively common environment, and demonstrate that there is no statistically significant association between microbiome composition and ethnicity, single nucleotide polymorphisms (SNPs), or overall genetic similarity, and that only 5 of 211 (2.4%) previously reported microbiome-SNP associations replicate in our cohort. In contrast, we find similarities in the microbiome composition of genetically unrelated individuals who share a household. We define the term biome-explainability as the variance of a host phenotype explained by the microbiome after accounting for the contribution of human genetics. Consistent with our finding that microbiome and host genetics are largely independent, we find significant biome-explainability levels of 16-33% for body mass index (BMI), fasting glucose, high-density lipoprotein (HDL) cholesterol, waist circumference, waist-hip ratio (WHR), and lactose consumption. We further show that several human phenotypes can be predicted substantially more accurately when adding microbiome data to host genetics data, and that the contribution of both data sources to prediction accuracy is largely additive. Overall, our results suggest that human microbiome composition is dominated by environmental factors rather than by host genetics.
3,485 downloads genetics
CRISPR-based genome editing using ribonucleoprotein (RNP) complexes and synthetic single stranded oligodeoxynucleotide (ssODN) donors can be highly effective. However, reproducibility can vary, and precise, targeted integration of longer constructs – such as green fluorescent protein (GFP) tags remains challenging in many systems. Here we describe a streamlined and optimized editing protocol for the nematode C. elegans. We demonstrate its efficacy, flexibility, and cost-effectiveness by affinity-tagging all twelve of the Worm-specific Argonaute (WAGO) proteins in C. elegans using ssODN donors. In addition, we describe a novel PCR-based partially single-stranded "hybrid" donor design that yields high efficiency editing with large (kilobase-scale) constructs. We use these hybrid donors to introduce fluorescent protein tags into multiple loci achieving editing efficiencies that approach those previously obtained only with much shorter ssODN donors. The principals and strategies described here are likely to translate to other systems and should allow researchers to reproducibly and efficiently obtain both long and short precision genome edits.
3,462 downloads genetics
Michael Wainberg, Nasa Sinnott-Armstrong, Nicholas Mancuso, Alvaro N Barbeira, David A. Knowles, David Golan, Raili Ermel, Arno Ruusalepp, Thomas Quertermous, Ke Hao, Johan LM Björkegren, Hae Kyung Im, Bogdan Pasaniuc, Manuel A Rivas, Anshul Kundaje
Transcriptome-wide association studies (TWAS) integrate GWAS and expression quantitative trait locus (eQTL) datasets to discover candidate causal gene-trait associations. We integrate multi-tissue expression panels and summary GWAS for LDL cholesterol and Crohn's disease to show that TWAS are highly vulnerable to discovering non-causal genes, because variants at a single GWAS hit locus are often eQTLs for multiple genes. TWAS exhibit acute instability when the tissue of the expression panel is changed: candidate causal genes that are TWAS hits in one tissue are usually no longer hits in another, due to lack of expression or strong eQTLs, while non-causal genes at the same loci remain. While TWAS is statistically valid when used as a weighted burden test to identify trait-associated loci, it is invalid to interpret TWAS associations as causal genes because the false discovery rate for TWAS causal gene discovery is not only high, but unquantifiable. More broadly, our results showcase limitations of using expression variation across individuals to determine causal genes at GWAS loci.
- Top preprints of 2018
- Paper search
- Author leaderboards
- Overall metrics
- The API
- Email newsletter
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!