81: Robust genome editing with short single-stranded and long, partially single-stranded DNA donors in C. elegans
Posted to bioRxiv 20 Jun 2018

Robust genome editing with short single-stranded and long, partially single-stranded DNA donors in C. elegans
3,503 downloads genetics

Gregoriy A Dokshin, Krishna S Ghanta, Katherine M Piscopo, Craig C Mello

CRISPR-based genome editing using ribonucleoprotein (RNP) complexes and synthetic single stranded oligodeoxynucleotide (ssODN) donors can be highly effective. However, reproducibility can vary, and precise, targeted integration of longer constructs – such as green fluorescent protein (GFP) tags remains challenging in many systems. Here we describe a streamlined and optimized editing protocol for the nematode C. elegans. We demonstrate its efficacy, flexibility, and cost-effectiveness by affinity-tagging all twelve of the Worm-specific Argonaute (WAGO) proteins in C. elegans using ssODN donors. In addition, we describe a novel PCR-based partially single-stranded "hybrid" donor design that yields high efficiency editing with large (kilobase-scale) constructs. We use these hybrid donors to introduce fluorescent protein tags into multiple loci achieving editing efficiencies that approach those previously obtained only with much shorter ssODN donors. The principals and strategies described here are likely to translate to other systems and should allow researchers to reproducibly and efficiently obtain both long and short precision genome edits.

82: Variable prediction accuracy of polygenic scores within an ancestry group
Posted to bioRxiv 07 May 2019

Variable prediction accuracy of polygenic scores within an ancestry group
3,428 downloads genetics

Hakhamanesh Mostafavi, Arbel Harpak, Dalton Conley, Jonathan K Pritchard, Molly Przeworski

Fields as diverse as human genetics and sociology are increasingly using polygenic scores based on genome-wide association studies (GWAS) for phenotypic prediction. However, recent work has shown that polygenic scores have limited portability across groups of different genetic ancestries, restricting the contexts in which they can be used reliably and potentially creating serious inequities in future clinical applications. Using the UK Biobank data, we demonstrate that even within a single ancestry group, the prediction accuracy of polygenic scores depends on characteristics such as the age or sex composition of the individuals in which the GWAS and the prediction were conducted, and on the GWAS study design. Our findings highlight both the complexities of interpreting polygenic scores and underappreciated obstacles to their broad use.

83: The nature of nurture: effects of parental genotypes
Posted to bioRxiv 14 Nov 2017

The nature of nurture: effects of parental genotypes
3,397 downloads genetics

Augustine Kong, Gudmar Thorleifsson, Michael L. Frigge, Bjarni J Vilhjálmsson, Alexander I. Young, Thorgeir E. Thorgeirsson, Stefania Benonisdottir, Asmundur Oddsson, Bjarni V. Halldórsson, Gísli Masson, Daniel F. Gudbjartsson, Agnar Helgason, Gyda Bjornsdottir, Unnur Thorsteinsdottir, Kari Stefansson

Sequence variants in the parental genomes that are not transmitted to a child/proband are often ignored in genetic studies. Here we show that non-transmitted alleles can impact a child through their effects on the parents and other relatives, a phenomenon we call genetic nurture. Using results from a meta-analysis of educational attainment, the polygenic score computed for the non-transmitted alleles of 21,637 probands with at least one parent genotyped has an estimated effect on the educational attainment of the proband that is 29.9% (P = 1.6×10-14) of that of the transmitted polygenic score. Genetic nurturing effects of this polygenic score extend to other traits. Paternal and maternal polygenic scores have similar effects on educational attainment, but mothers contribute more than fathers to nutrition/heath related traits.

84: Whole-genome sequencing is more powerful than whole-exome sequencing for detecting exome variants
Posted to bioRxiv 14 Oct 2014

Whole-genome sequencing is more powerful than whole-exome sequencing for detecting exome variants
3,361 downloads genetics

Aziz Belkadi, Alexandre Bolze, Yuval Itan, Aurélie Cobat, Quentin B Vincent, Alexander Antipenko, Lei Shang, Bertrand Boisson, Jean-Laurent Casanova, Laurent Abel

We compared whole-exome sequencing (WES) and whole-genome sequencing (WGS) in six unrelated individuals. In the regions targeted by WES capture (81.5% of the consensus coding genome), the mean numbers of single-nucleotide variants (SNVs) and small insertions/deletions (indels) detected per sample were 84,192 and 13,325, respectively, for WES, and 84,968 and 12,702, respectively, for WGS. For both SNVs and indels, the distributions of coverage depth, genotype quality, and minor read ratio were more uniform for WGS than for WES. After filtering, a mean of 74,398 (95.3%) high-quality (HQ) SNVs and 9,033 (70.6%) HQ indels were called by both platforms. A mean of 105 coding HQ SNVs and 32 indels were identified exclusively by WES, whereas 692 HQ SNVs and 105 indels were identified exclusively by WGS. We Sanger sequenced a random selection of these exclusive variants. For SNVs, the proportion of false-positive variants was higher for WES (78%) than for WGS (17%). The estimated mean number of real coding SNVs (656, ~3% of all coding HQ SNVs) identified by WGS and missed by WES was greater than the number of SNVs identified by WES and missed by WGS (26). For indels, the proportions of false-positive variants were similar for WES (44%) and WGS (46%). Finally, WES was not reliable for the detection of copy number variations, almost all of which extended beyond the targeted regions. Although currently more expensive, WGS is more powerful than WES for detecting potential disease-causing mutations within WES regions, particularly those due to SNVs.

85: Association mapping of inflammatory bowel disease loci to single variant resolution
Posted to bioRxiv 08 Oct 2015

Association mapping of inflammatory bowel disease loci to single variant resolution
3,320 downloads genetics

Hailiang Huang, Ming Fang, Luke Jostins, Maša Umićević Mirkov, Gabrielle Boucher, Carl A. Anderson, Vibeke Andersen, Isabelle Cleynen, Adrian Cortes, François Crins, Mauro D’Amato, Valérie Deffontaine, Julia Dimitrieva, Elisa Docampo, Mahmoud Elansary, Kyle Kai-How Farh, Andre Franke, Ann-Stephan Gori, Philippe Goyette, Jonas Halfvarson, Talin Haritunians, Jo Knight, Ian C Lawrance, Charlie W Lees, Edouard Louis, Rob Mariman, Theo Meuwissen, Myriam Mni, Yukihide Momozawa, Miles Parkes, Sarah L. Spain, Emilie Théâtre, Gosia Trynka, Jack Satsangi, Suzanne van Sommeren, Severine Vermeire, Ramnik J. Xavier, International IBD Genetics Consortium, Rinse K Weersma, Richard H Duerr, Christopher G. Mathew, John D Rioux, Dermot P.B. McGovern, Judy H Cho, Michel Georges, Mark J. Daly, Jeffrey C Barrett

Inflammatory bowel disease (IBD) is a chronic gastrointestinal inflammatory disorder that affects millions worldwide. Genome-wide association studies (GWAS) have identified 200 IBD-associated loci, but few have been conclusively resolved to specific functional variants. Here we report fine-mapping of 94 IBD loci using high-density genotyping in 67,852 individuals. Of the 139 independent associations identified in these regions, 18 were pinpointed to a single causal variant with >95% certainty, and an additional 27 associations to a single variant with >50% certainty. These 45 variants are significantly enriched for protein-coding changes (n=13), direct disruption of transcription factor binding sites (n=3) and tissue specific epigenetic marks (n=10), with the latter category showing enrichment in specific immune cells among associations stronger in CD and gut mucosa among associations stronger in UC. The results of this study suggest that high-resolution, fine-mapping in large samples can convert many GWAS discoveries into statistically convincing causal variants, providing a powerful substrate for experimental elucidation of disease mechanisms.

86: An optimized CRISPR/Cas toolbox for efficient germline and somatic genome engineering in Drosophila
Posted to bioRxiv 24 Mar 2014

An optimized CRISPR/Cas toolbox for efficient germline and somatic genome engineering in Drosophila
3,277 downloads genetics

Fillip Port, Hui-Min Chen, Tzumin Lee, Simon L Bullock

The type II CRISPR/Cas system has recently emerged as a powerful method to manipulate the genomes of various organisms. Here, we report a novel toolbox for high efficiency genome engineering of Drosophila melanogaster consisting of transgenic Cas9 lines and versatile guide RNA (gRNA) expression plasmids. Systematic evaluation reveals Cas9 lines with ubiquitous or germline restricted patterns of activity. We also demonstrate differential activity of the same gRNA expressed from different U6 snRNA promoters, with the previously untested U6:3 promoter giving the most potent effect. Choosing an appropriate combination of Cas9 and gRNA allows targeting of essential and non-essential genes with transmission rates ranging from 25% - 100%. We also provide evidence that our optimized CRISPR/Cas tools can be used for offset nicking-based mutagenesis and, in combination with oligonucleotide donors, to precisely edit the genome by homologous recombination with efficiencies that do not require the use of visible markers. Lastly, we demonstrate a novel application of CRISPR/Cas-mediated technology in revealing loss-of-function phenotypes in somatic cells following efficient biallelic targeting by Cas9 expressed in a ubiquitous or tissue-restricted manner. In summary, our CRISPR/Cas tools will facilitate the rapid evaluation of mutant phenotypes of specific genes and the precise modification of the genome with single nucleotide precision. Our results also pave the way for high throughput genetic screening with CRISPR/Cas.

87: Expansion of the CRISPR toolbox in an animal with tRNA-flanked Cas9 and Cpf1 gRNAs
Posted to bioRxiv 31 Mar 2016

Expansion of the CRISPR toolbox in an animal with tRNA-flanked Cas9 and Cpf1 gRNAs
3,274 downloads genetics

Fillip Port, Simon L Bullock

We present vectors for producing multiple CRISPR gRNAs from a single RNA polymerase II or III transcript in Drosophila. The system, which is based on liberation of gRNAs by processing of flanking tRNAs, permits highly efficient multiplexing of Cas9-based mutagenesis. We also demonstrate that the tRNA-gRNA system markedly increases the efficacy of conditional gene disruption by Cas9 and can promote editing by the recently discovered RNA-guided endonuclease Cpf1.

88: Medical relevance of protein-truncating variants across 337,208 individuals in the UK Biobank study
Posted to bioRxiv 23 Aug 2017

Medical relevance of protein-truncating variants across 337,208 individuals in the UK Biobank study
3,223 downloads genetics

Christopher DeBoever, Yosuke Tanigawa, Greg McInnes, Adam Lavertu, Chris Chang, Carlos D Bustamante, Mark J. Daly, Manuel A Rivas

Protein-truncating variants can have profound effects on gene function and are critical for clinical genome interpretation and generating therapeutic hypotheses, but their relevance to medical phenotypes has not been systematically assessed. We characterized the effect of 18,228 protein-truncating variants across 135 phenotypes from the UK Biobank and found 27 associations between medical phenotypes and protein-truncating variants in genes outside the major histocompatibility complex. We performed phenome-wide analyses and directly measured the effect of homozygous carriers, commonly referred to as "human knockouts," across medical phenotypes for genes implicated to be protective against disease or associated with at least one phenotype in our study and found several genes with strong pleiotropic or non-additive effects. Our results illustrate the importance of protein-truncating variants in a variety of diseases.

89: Visualizing the Structure of RNA-seq Expression Data using Grade of Membership Models
Posted to bioRxiv 04 May 2016

Visualizing the Structure of RNA-seq Expression Data using Grade of Membership Models
3,203 downloads genetics

Kushal K Dey, Chiaowen Joyce Hsiao, Matthew Stephens

Grade of membership models, also known as "admixture models", "topic models" or "Latent Dirichlet Allocation", are a generalization of cluster models that allow each sample to have membership in multiple clusters. These models are widely used in population genetics to model admixed individuals who have ancestry from multiple "populations", and in natural language processing to model documents having words from multiple "topics". Here we illustrate the potential for these models to cluster samples of RNA-seq gene expression data, measured on either bulk samples or single cells. We also provide methods to help interpret the clusters, by identifying genes that are distinctively expressed in each cluster. By applying these methods to several example RNA-seq applications we demonstrate their utility in identifying and summarizing structure and heterogeneity. Applied to data from the GTEx project on 51 human tissues, the approach highlights similarities among biologically-related tissues and identifies distinctively-expressed genes that recapitulate known biology. Applied to single-cell expression data from mouse preimplantation embryos, the approach highlights both discrete and continuous variation through early embryonic development stages, and highlights genes involved in a variety of relevant processes - from germ cell development, through compaction and morula formation, to the formation of inner cell mass and trophoblast at the blastocyte stage. The methods are implemented in the Bioconductor package CountClust.

90: The Genetic Legacy of the Expansion of Turkic-Speaking Nomads Across Eurasia
Posted to bioRxiv 30 Jul 2014

The Genetic Legacy of the Expansion of Turkic-Speaking Nomads Across Eurasia
3,185 downloads genetics

Bayazit Yunusbayev, Mait Metspalu, Ene Metspalu, Albert Valeev, Sergei Litvinov, Ruslan Valiev, Vita Akhmetova, Elena Balanovska, Oleg Balanovsky, Shahlo Turdikulova, Dilbar Dalimova, Pagbajabyn Nymadawa, Ardeshir Bahmanimehr, Hovhannes Sahakyan, Kristiina Tambets, Sardana Fedorova, Nikolay Barashkov, Irina Khidiatova, Evelin Mihailov, Rita Khusainova, Larisa Damba, Miroslava Derenko, Boris Malyarchuk, Ludmila Osipova, Mikhail Voevoda, Levon Yepiskoposyan, Toomas Kivisild, Elza Khusnutdinova, Richard Villems

The Turkic peoples represent a diverse collection of ethnic groups defined by the Turkic languages. These groups have dispersed across a vast area, including Siberia, Northwest China, Central Asia, East Europe, the Caucasus, Anatolia, the Middle East, and Afghanistan. The origin and early dispersal history of the Turkic peoples is disputed, with candidates for their ancient homeland ranging from the Transcaspian steppe to Manchuria in Northeast Asia. Previous genetic studies have not identified a clear-cut unifying genetic signal for the Turkic peoples, which lends support for language replacement rather than demic diffusion as the model for the Turkic language's expansion. We addressed the genetic origin of 373 individuals from 22 Turkic-speaking populations, representing their current geographic range, by analyzing genome-wide high-density genotype data. Most of the Turkic peoples studied, except those in Central Asia, genetically resembled their geographic neighbors, in agreement with the elite dominance model of language expansion. However, western Turkic peoples sampled across West Eurasia shared an excess of long chromosomal tracts that are identical by descent (IBD) with populations from present-day South Siberia and Mongolia (SSM), an area where historians center a series of early Turkic and non-Turkic steppe polities. The observed excess of long chromosomal tracts IBD (>1cM) between populations from SSM and Turkic peoples across West Eurasia was statistically significant. Finally, we used the ALDER method and inferred admixture dates (~9th-17th centuries) that overlap with the Turkic migrations of the 5th-16th centuries. Thus, our results indicate historical admixture among Turkic peoples, and the recent shared ancestry with modern populations in SSM supports one of the hypothesized homelands for their nomadic Turkic and related Mongolic ancestors.

91: Psychiatric Genomics: An Update and an Agenda
Posted to bioRxiv 10 Mar 2017

Psychiatric Genomics: An Update and an Agenda
3,101 downloads genetics

Patrick F Sullivan, Arpana Agrawal, Cynthia Bulik, Ole A Andreassen, Anders Boerglum, Gerome Breen, Sven Cichon, Howard Edenberg, Stephen V. Faraone, Joel Gelernter, Carol A Mathews, Caroline M Nievergelt, Jordan Smoller, Michael O'Donovan, for the Psychiatric Genomics Consortium

The Psychiatric Genomics Consortium (PGC) is the largest consortium in the history of psychiatry. In the past decade, this global effort has delivered a rapidly increasing flow of new knowledge about the fundamental basis of common psychiatric disorders, particularly given its dedication to rapid progress and open science. The PGC has recently commenced a program of research designed to deliver “actionable” findings - genomic results that (a) reveal the fundamental biology, (b) inform clinical practice, and (c) deliver new therapeutic targets. This is the central idea of the PGC: to convert the family history risk factor into biologically, clinically, and therapeutically meaningful insights. The emerging findings suggest that we are entering into a phase of accelerated translation of genetic discoveries to impact psychiatric practice within a precision medicine framework.

92: Models of archaic admixture and recent history from two-locus statistics
Posted to bioRxiv 07 Dec 2018

Models of archaic admixture and recent history from two-locus statistics
3,098 downloads genetics

Aaron P Ragsdale, Simon Gravel

We learn about population history and underlying evolutionary biology through patterns of genetic polymorphism. Many approaches to reconstruct evolutionary histories focus on a limited number of informative statistics describing distributions of allele frequencies or patterns of linkage disequilibrium. We show that many commonly used statistics are part of a broad family of two-locus moments whose expectation can be computed jointly and rapidly under a wide range of scenarios, including complex multi-population demographies with continuous migration and admixture events. A full inspection of these statistics reveals that widely used models of human history fail to predict simple patterns of linkage disequilibrium. To jointly capture the information contained in classical and novel statistics, we implemented a tractable likelihood-based inference framework for demographic history. Using this approach, we show that human evolutionary models that include archaic admixture in Africa, Asia, and Europe provide a much better description of patterns of genetic diversity across the human genome. We estimate that an unidentified, deeply diverged population admixed with modern humans within Africa both before and after the split of African and Eurasian populations, contributing 4−8% genetic ancestry to individuals in world-wide populations.

93: Rapid, one-step generation of biallelic conditional gene knockouts
Posted to bioRxiv 01 Jun 2016

Rapid, one-step generation of biallelic conditional gene knockouts
3,096 downloads genetics

Amanda Andersson-Rolf, Roxana C. Mustata, Alessandra Merenda, Sajith Perera, Tiago Grego, Jihoon Kim, Katie Andrews, Juergen Fink, William C. Skarnes, Bon-Kyoung Koo

Loss-of-function studies are key to investigate gene function and CRISPR technology has made genome editing widely accessible in model organisms and cells. However, conditional gene inactivation in diploid cells is still difficult to achieve. Here, we present CRISPR-FLIP, a strategy that provides an efficient, rapid, and scalable method for bi-allelic conditional gene knockouts in diploid cells by co-delivery of CRISPR/Cas9 and a universal conditional intron cassette.

94: Genome-wide association analyses of risk tolerance and risky behaviors in over 1 million individuals identify hundreds of loci and shared genetic influences
Posted to bioRxiv 08 Feb 2018

Genome-wide association analyses of risk tolerance and risky behaviors in over 1 million individuals identify hundreds of loci and shared genetic influences
3,075 downloads genetics

Richard Karlsson Linnér, Pietro Biroli, Edward Kong, S Fleur W Meddens, Robbee Wedow, Mark Alan Fontana, Maël Lebreton, Abdel Abdellaoui, Anke R Hammerschlag, Michel G. Nivard, Aysu Okbay, Cornelius A Rietveld, Pascal N Timshel, Stephen P Tino, Maciej Trzaskowski, Ronald de Vlaming, Christian L Zünd, Yanchun Bao, Laura Buzdugan, Ann H Caplin, Chia-Yen Chen, Peter Eibich, Pierre Fontanillas, Juan R Gonzalez, Peter K Joshi, Ville Karhunen, Aaron Kleinman, Remy Z Levin, Christina M Lill, Gerardus A Meddens, Gerard Muntané, Sandra Sanchez-Roige, Frank J van Rooij, Erdogan Taskesen, Yang Wu, Futao Zhang, 23andMe Research Team, eQTLgen Consortium, International Cannabis Consortium, Psychiatric Genomics Consortium, Social Science Genetic Association Consortium,, Adam Auton, Jason D. Boardman, David W Clark, Andrew Conlin, Conor C Dolan, Urs Fischbacher, Patrick JF Groenen, Kathleen Mullan Harris, Gregor Hasler, Albert Hofman, Mohammad A Ikram, Sonia Jain, Ronald C Kessler, Maarten Kooyman, James MacKillop, Minna Männikkö, Carlos Morcillo-Suarez, Matthew B McQueen, Klaus M Schmidt, Melissa C Smart, Matthias Sutter, A Roy Thurik, Andre G Uitterlinden, Jon White, Harriet de Wit, Jian Yang, Lars Bertram, Dorret Boomsma, Tõnu Esko, Ernst Fehr, David A. Hinds, Magnus Johannesson, Meena Kumari, David Laibson, Patrik K.E. Magnusson, Michelle N Meyer, Arcadi Navarro, Abraham A Palmer, Tune H. Pers, Danielle Posthuma, Daniel Schunk, Murray B Stein, Rauli Svento, Henning Tiemeier, Paul RHJ Timmers, Patrick Turley, Robert J Ursano, Gert G Wagner, James F Wilson, Jacob Gratten, James J Lee, David Cesarini, Daniel J Benjamin, Philipp D Koellinger, Jonathan P Beauchamp

Humans vary substantially in their willingness to take risks. In a combined sample of over one million individuals, we conducted genome-wide association studies (GWAS) of general risk tolerance, adventurousness, and risky behaviors in the driving, drinking, smoking, and sexual domains. We identified 611 approximately independent genetic loci associated with at least one of our phenotypes, including 124 with general risk tolerance. We report evidence of substantial shared genetic influences across general risk tolerance and risky behaviors: 72 of the 124 general risk tolerance loci contain a lead SNP for at least one of our other GWAS, and general risk tolerance is moderately to strongly genetically correlated (|rˆg| ~ 0.25 to 0.50) with a range of risky behaviors. Bioinformatics analyses imply that genes near general-risk-tolerance-associated SNPs are highly expressed in brain tissues and point to a role for glutamatergic and GABAergic neurotransmission. We find no evidence of enrichment for genes previously hypothesized to relate to risk tolerance.

95: Are drug targets with genetic support twice as likely to be approved? Revised estimates of the impact of genetic support for drug mechanisms on the probability of drug approval.
Posted to bioRxiv 08 Jan 2019

Are drug targets with genetic support twice as likely to be approved? Revised estimates of the impact of genetic support for drug mechanisms on the probability of drug approval.
3,066 downloads genetics

Emily A King, J Wade Davis, Jacob F Degner

Despite strong vetting for disease activity, only 10% of candidate new molecular entities in early stage clinical trials are eventually approved. Analyzing historical pipeline data, Nelson et al. 2015 (Nat. Genet.) concluded pipeline drug targets with human genetic evidence of disease association are twice as likely to lead to approved drugs. Taking advantage of recent clinical development advances and rapid growth in GWAS datasets, we extend the original work using updated data, test whether genetic evidence predicts future successes and introduce statistical models adjusting for target and indication-level properties. Our work confirms drugs with genetically supported targets were more likely to be successful in Phases II and III. When causal genes are clear (Mendelian traits and GWAS associations linked to coding variants), we find the use of human genetic evidence increases approval from Phase I by greater than two-fold, and, for Mendelian associations, the positive association holds prospectively. Our findings suggest investments into genomics and genetics are likely to be beneficial to companies deploying this strategy.

96: Phenome-wide association studies (PheWAS) across large "real-world data" population cohorts support drug target validation
Posted to bioRxiv 13 Nov 2017

Phenome-wide association studies (PheWAS) across large "real-world data" population cohorts support drug target validation
3,062 downloads genetics

Dorothée Diogo, Chao Tian, Christopher S. Franklin, Mervi Alanne-Kinnunen, Michael March, Chris C. A. Spencer, Ciara Vangjeli, Michael E Weale, Hannele Mattsson, Elina Kilpeläinen, Patrick M.A. Sleiman, Dermot F Reilly, Joshua McElwee, Joseph C. Maranville, Arnaub K Chatterjee, Aman Bhandari, the 23andMe Research Team, Mary-Pat Reeve, Janna Hutz, Nan Bing, Sally John, Daniel MacArthur, Veikko Salomaa, Samuli Ripatti, Hakon Hakonarson, Mark J. Daly, Aarno Palotie, David Hinds, Peter Donnelly, Caroline S. Fox, Aaron Day-Williams, Robert M. Plenge, Heiko Runz

Phenome-wide association studies (PheWAS), which assess whether a genetic variant is associated with multiple phenotypes across a phenotypic spectrum, have been proposed as a possible aid to drug development through elucidating mechanisms of action, identifying alternative indications, or predicting adverse drug events (ADEs). Here, we evaluate whether PheWAS can inform target validation during drug development. We selected 25 single nucleotide polymorphisms (SNPs) linked through genome-wide association studies (GWAS) to 19 candidate drug targets for common disease therapeutic indications. We independently interrogated these SNPs through PheWAS in four large real-world data cohorts (23andMe, UK Biobank, FINRISK, CHOP) for association with a total of 1,892 binary endpoints. We then conducted meta-analyses for 145 harmonized disease endpoints in up to 697,815 individuals and joined results with summary statistics from 57 published GWAS. Our analyses replicate 70% of known GWAS associations and identify 10 novel associations with study-wide significance after multiple test correction (P<1.8x10-6; out of 72 novel associations with FDR<0.1). By leveraging directionality and point estimate of the effect sizes, we describe new associations that may predict ADEs, e.g., acne, high cholesterol, gout and gallstones for rs738409 (p.I148M) in PNPLA3; or asthma for rs1990760 (p.T946A) in IFIH1. We further propose how quantitative estimates of genetic safety/efficacy profiles can be used to help prioritize candidate targets for a specific indication. Our results demonstrate PheWAS as a powerful addition to the toolkit for drug discovery.

97: Genetics of 38 blood and urine biomarkers in the UK Biobank
Posted to bioRxiv 05 Jun 2019

Genetics of 38 blood and urine biomarkers in the UK Biobank
3,062 downloads genetics

Nicholas A Sinnott-Armstrong, Yosuke Tanigawa, David Amar, Nina J Mars, Matthew Aguirre, Guhan R. Venkataraman, Michael Wainberg, Hanna M Ollila, James P. Pirruccello, Junyang Qian, Anna A Shcherbina, FinnGen, Fatima Rodriguez, Themistocles L Assimes, Vineeta Agarwala, Robert Tibshirani, Trevor Hastie, Samuli Ripatti, Jonathan K Pritchard, Mark Daly, Manuel A Rivas

Clinical laboratory tests are a critical component of the continuum of care and provide a means for rapid diagnosis and monitoring of chronic disease. In this study, we systematically evaluated the genetic basis of 38 blood and urine laboratory tests measured in 358,072 participants in the UK Biobank and identified 1,857 independent loci associated with at least one laboratory test, including 488 large-effect protein truncating, missense, and copy-number variants. We tested these loci for enrichment in specific single cell types in kidney, liver, and pancreas relevant to disease aetiology. We then causally linked the biomarkers to medically relevant phenotypes through genetic correlation and Mendelian randomization. Finally, we developed polygenic risk scores (PRS) for each biomarker and built multi-PRS models using all 38 PRSs simultaneously. We found substantially improved prediction of incidence in FinnGen (n=135,500) with the multi-PRS relative to single-disease PRSs for renal failure, myocardial infarction, liver fat percentage, and alcoholic cirrhosis. Together, our results show the genetic basis of these biomarkers, which tissues contribute to the biomarker function, the causal influences of the biomarkers, and how we can use this to predict disease.

98: Estimate of disease heritability using 7.4 million familial relationships inferred from electronic health records
Posted to bioRxiv 28 Jul 2016

Estimate of disease heritability using 7.4 million familial relationships inferred from electronic health records
3,058 downloads genetics

Fernanda Polubriaginof, Rami Vanguri, Kayla Quinnies, Gillian M. Belbin, Alexandre Yahi, Hojjat Salmasian, Tal Lorberbaum, Victor Nwankwo, Li Li, Mark Shervey, Patricia Glowe, Iuliana Ionita-Laza, Mary Simmerling, George Hripcsak, Suzanne Bakken, David Goldstein, Krzysztof Kiryluk, Eimear E Kenny, Joel Dudley, David K Vawdrey, Nicholas Tatonetti

Heritability is essential for understanding the biological causes of disease, but requires laborious patient recruitment and phenotype ascertainment. Electronic health records (EHR) passively capture a wide range of clinically relevant data and provide a novel resource for studying the heritability of traits that are not typically accessible. EHRs contain next-of-kin information collected via patient emergency contact forms, but until now, these data have gone unused in research. We mined emergency contact data at three academic medical centers and identified millions of familial relationships while maintaining patient privacy. Identified relationships were consistent with genetically-derived relatedness. We used EHR data to compute heritability estimates for 500 disease phenotypes. Overall, estimates were consistent with literature and between sites. Inconsistencies were indicative of limitations and opportunities unique to EHR research. These analyses provide a novel validation of the use of EHRs for genetics and disease research.

99: Genome-wide Enhancer Maps Differ Significantly in Genomic Distribution, Evolution, and Function
Posted to bioRxiv 15 Aug 2017

Genome-wide Enhancer Maps Differ Significantly in Genomic Distribution, Evolution, and Function
3,041 downloads genetics

Mary Lauren Benton, Sai Charan Talipineni, Dennis Kostka, John A Capra

Non-coding gene regulatory enhancers are essential to transcription in mammalian cells. As a result, a large variety of experimental and computational strategies have been developed to identify cis-regulatory enhancer sequences. In practice, most studies consider enhancers identified by only a single method, and the concordance of enhancers identified by different methods has not been comprehensively evaluated. Here, we assess the similarities of enhancer sets identified by ten representative strategies in four biological contexts and evaluate the robustness of downstream conclusions to the choice of identification strategy. All pairs of enhancer sets we evaluated overlap significantly more than expected by chance; however, we also found significant dissimilarity between enhancer sets in their genomic characteristics, evolutionary conservation, and association with functional loci within each context. We find most regions identified as enhancers are supported by only one method. The disagreement is sufficient to influence interpretation of GWAS SNPs and eQTL, and to lead to disparate conclusions about enhancer biology and disease mechanisms. We also find only limited evidence that regions identified by multiple enhancer identification methods are better candidates than those identified by a single method. Our results highlight the inherent complexity of enhancer biology and argue that current approaches have yet to adequately account for enhancer diversity. As a result, we cannot recommend the use of any single enhancer identification strategy in isolation. To facilitate assessment of enhancer diversity on studies' conclusions, we developed creDB, a database of enhancer annotations designed to integrate into bioinformatics workflows. While our findings highlight a major challenge to mapping the genetic architecture of complex disease and interpreting regulatory variants found in patient genomes, a systematic understanding of similarities and differences in enhancer identification methodology will ultimately enable robust inferences about gene regulatory sequences.

100: Genome-wide analysis of over 106,000 individuals identifies 9 neuroticism-associated loci
Posted to bioRxiv 20 Nov 2015

Genome-wide analysis of over 106,000 individuals identifies 9 neuroticism-associated loci
3,022 downloads genetics

Daniel J Smith

Neuroticism is a personality trait of fundamental importance for psychological wellbeing and public health. It is strongly associated with major depressive disorder (MDD) and several other psychiatric conditions. Although neuroticism is heritable, attempts to identify the alleles involved in previous studies have been limited by relatively small sample sizes and heterogeneity in the measurement of neuroticism. Here we report a genome-wide association study of neuroticism in 91,370 participants of the UK Biobank cohort and a combined meta-analysis which includes a further 6,659 participants from the Generation Scotland Scottish Family Health Study (GS:SFHS) and 8,687 participants from a QIMR Berghofer Medical Research Institute (QIMR) cohort. All participants were assessed using the same neuroticism instrument, the Eysenck Personality Questionnaire-Revised (EPQ-R-S) Short Form Neuroticism scale. We found a SNP-based heritability estimate for neuroticism of approximately 15% (SE = 0.7%). Meta-analysis identified 9 novel loci associated with neuroticism. The strongest evidence for association was at a locus on chromosome 8 (p = 1.5x10-15) spanning 4 Mb and containing at least 36 genes. Other associated loci included interesting candidate genes on chromosome 1 (GRIK3, glutamate receptor ionotropic kainate 3), chromosome 4 (KLHL2, Kelch-like protein 2), chromosome 17 (CRHR1, corticotropin-releasing hormone receptor 1 and MAPT, microtubule-associated protein Tau), and on chromosome 18 (CELF4, CUGBP elav-like family member 4). We found no evidence for genetic differences in the common allelic architecture of neuroticism by sex. By comparing our findings with those of the Psychiatric Genetics Consortia, we identified a strong genetic correlation between neuroticism and MDD (0.64) and a less strong but significant genetic correlation with schizophrenia (0.22), although not with bipolar disorder. Polygenic risk scores derived from the primary UK Biobank sample captured about 1% of the variance in neuroticism in independent samples. Overall, our findings confirm a polygenic basis for neuroticism and substantial shared genetic architecture between neuroticism and MDD. The identification of 9 new neuroticism-associated loci will drive forward future work on the neurobiology of neuroticism and related phenotypes.

