61: A reference panel of 64,976 haplotypes for genotype imputation
Posted to bioRxiv 23 Dec 2015

A reference panel of 64,976 haplotypes for genotype imputation
4,497 downloads genetics

Shane McCarthy, Sayantan Das, Warren Kretzschmar, Olivier Delaneau, Andrew R. Wood, Alexander Teumer, Hyun Min Kang, Christian Fuchsberger, Petr Danecek, Kevin Sharp, Yang Luo, Carlo Sidore, Alan Kwong, Nicholas Timpson, Seppo Koskinen, Scott Vrieze, Laura J Scott, He Zhang, Anubha Mahajan, Jan Veldink, Ulrike Peters, Carlos Pato, Cornelia Van Duijn, Christopher E Gillies, Ilaria Gandin, Massimo Mezzavilla, Arthur Gilly, Massimiliano Cocca, Michela Traglia, Andrea Angius, Jeffrey Barrett, Dorret I. Boomsma, Kari Branham, Gerome Breen, Chad Brummet, Fabio Busonero, Hariy Campbell, Andrew Chan, Sai Chen, Emily Chew, Francis S. Collins, Laura Corbin, George Davey Smith, George Dedoussis, Marcus Dorr, Aliki-Eleni Farmaki, Luigi Ferrucci, Lukas Forer, Ross M Fraser, Stacey Gabriel, Shawn Levy, Leif Groop, Tabitha Harrison, Andrew Hattersley, Oddgeir L Holmen, Kristian Hveem, Matthias Kretzler, James Lee, Matt McGue, Thomas Meitinger, David Melzer, Josine Min, Karen L. Mohlke, John Vincent, Matthias Nauck, Deborah Nickerson, Aarno Palotie, Michele Pato, Nicola Pirastu, Melvin Mclnnis, Brent Richards, Cinzia Sala, Veikko Salomaa, David Schlessinger, Sebastian Schoenheer, P Eline Slagboom, Kerrin Small, Timothy Spector, Dwight Stambolian, Marcus Tuke, Jaakko Tuomilehto, Leonard Van den Berg, Wouter Van Rheenen, Uwe Volker, Cisca Wijmenga, Daniela Toniolo, Eleftheria Zeggini, Paolo Gasparini, Matthew G. Sampson, James F Wilson, Timothy Frayling, Paul de Bakker, Morris A. Swertz, Steven McCarroll, Charles Kooperberg, Annelot Dekker, David Altshuler, Cristen Wilier, William Iacono, Samuli Ripatti, Nicole Soranzo, Klaudia Walter, Anand Swaroop, Francesco Cucca, Carl Anderson, Michael Boehnke, Mark I McCarthy, Richard Durbin, Gonçalo Abecasis, Jonathan Marchini

We describe a reference panel of 64,976 human haplotypes at 39,235,157 SNPs constructed using whole genome sequence data from 20 studies of predominantly European ancestry. Using this resource leads to accurate genotype imputation at minor allele frequencies as low as 0.1%, a large increase in the number of SNPs tested in association studies and can help to discover and refine causal loci. We describe remote server resources that allow researchers to carry out imputation and phasing consistently and efficiently.

62: Human genetics and clinical aspects of neurodevelopmental disorders
Posted to bioRxiv 18 Nov 2013

Human genetics and clinical aspects of neurodevelopmental disorders
4,495 downloads genetics

Gholson J. Lyon, Jason O’Rawe

There are ~12 billion nucleotides in every cell of the human body, and there are ~25-100 trillion cells in each human body. Given somatic mosaicism, epigenetic changes and environmental differences, no two human beings are the same, particularly as there are only ~7 billion people on the planet. One of the next great challenges for studying human genetics will be to acknowledge and embrace complexity. Every human is unique, and the study of human disease phenotypes (and phenotypes in general) will be greatly enriched by moving from a deterministic to a more stochastic/probabilistic model. The dichotomous distinction between simple and complex diseases is completely artificial, and we argue instead for a model that considers a spectrum of diseases that are variably manifesting in each person. The rapid adoption of whole genome sequencing (WGS) and the Internet-mediated networking of people promise to yield more insight into this century-old debate. Comprehensive ancestry tracking and detailed family history data, when combined with WGS or at least cascade-carrier screening, might eventually facilitate a degree of genetic prediction for some diseases in the context of their familial and ancestral etiologies. However, it is important to remain humble, as our current state of knowledge is not yet sufficient, and in principle, any number of nucleotides in the genome, if mutated or modified in a certain way and at a certain time and place, might influence some phenotype during embryogenesis or postnatal life.

63: Parallel paleogenomic transects reveal complex genetic history of early European farmers
Posted to bioRxiv 06 Mar 2017

Parallel paleogenomic transects reveal complex genetic history of early European farmers
4,492 downloads genetics

Mark Lipson, Anna Szécsényi-Nagy, Swapan Mallick, Annamária Pósa, Balázs Stégmár, Victoria Keerl, Nadin Rohland, Kristin Stewardson, Matthew Ferry, Megan Michel, Jonas Oppenheimer, Nasreen Broomandkhoshbacht, Eadaoin Harney, Susanne Nordenfelt, Bastien Llamas, Balázs Gusztáv Mende, Kitti Köhler, Krisztián Oross, Mária Bondár, Tibor Marton, Anett Osztás, János Jakucs, Tibor Paluch, Ferenc Horváth, Piroska Csengeri, Judit Koós, Katalin Sebők, Alexandra Anders, Pál Raczky, Judit Regenye, Judit P. Barna, Szilvia Fábián, Gábor Serlegi, Zoltán Toldi, Emese Gyöngyvér Nagy, János Dani, Erika Molnár, György Pálfi, László Márk, Béla Melegh, Zsolt Bánfai, László Domboróczki, Javier Fernández-Eraso, José Antonio Mujika-Alustiza, Carmen Alonso Fernández, Javier Jiménez Echevarría, Ruth Bollongino, Jörg Orschiedt, Kerstin Schierhold, Harald Meller, Alan Cooper, Joachim Burger, Eszter Bánffy, Kurt W. Alt, Carles Lalueza-Fox, Wolfgang Haak, David Reich

Ancient DNA studies have established that European Neolithic populations were descended from Anatolian migrants who received a limited amount of admixture from resident hunter-gatherers. Many open questions remain, however, about the spatial and temporal dynamics of population interactions and admixture during the Neolithic period. Using the highest-resolution genome-wide ancient DNA data set assembled to date --- a total of 177 samples, 127 newly reported here, from the Neolithic and Chalcolithic of Hungary (6000-2900 BCE, n = 98), Germany (5500-3000 BCE, n = 42), and Spain (5500-2200 BCE, n = 37) --- we investigate the population dynamics of Neolithization across Europe. We find that genetic diversity was shaped predominantly by local processes, with varied sources and proportions of hunter-gatherer ancestry among the three regions and through time. Admixture between groups with different ancestry profiles was pervasive and resulted in observable population transformation across almost all cultural transitions. Our results shed new light on the ways that gene flow reshaped European populations throughout the Neolithic period and demonstrate the potential of time-series-based sampling and modeling approaches to elucidate multiple dimensions of historical population interactions.

64: Fulfilling the promise of Mendelian randomization
Posted to bioRxiv 16 Apr 2015

Fulfilling the promise of Mendelian randomization
4,488 downloads genetics

Joseph K. Pickrell

Many important questions in medicine involve questions about causality, For example, do low levels of high-density lipoproteins (HDL) cause heart disease? Does high body mass index (BMI) cause type 2 diabetes? Or are these traits simply correlated in the population for other reasons? A popular approach to answering these problems using human genetics is called "Mendelian randomization". We discuss the prospects and limitations of this approach, and some ways forward.

65: Distinguishing genetic correlation from causation across 52 diseases and complex traits
Posted to bioRxiv 18 Oct 2017

Distinguishing genetic correlation from causation across 52 diseases and complex traits
4,372 downloads genetics

Luke J. O’Connor, Alkes L. Price

Mendelian randomization (MR) is widely used to identify causal relationships among heritable traits, but can be confounded by genetic correlations reflecting shared etiology. We propose a model in which a latent causal variable mediates the genetic correlation between two traits. Under the latent causal variable (LCV) model, trait 1 is fully genetically causal for trait 2 if it is perfectly genetically correlated with the latent variable, implying that the entire genetic component of trait 1 is causal for trait 2; it is partially genetically causal for trait 2 if the latent variable has a high genetic correlation with the latent variable, implying that part of the genetic component of trait 1 is causal for trait 2. To quantify the degree of partial genetic causality, we define the genetic causality proportion (gcp). We fit this model using mixed fourth moments E(α21α1α2) and E(α22α1α2) of marginal effect sizes for each trait, exploiting the fact that if trait 1 is causal for trait 2 then SNPs with large effects on trait 1 (large E(α21)) will have correlated effects on trait 2 (large E(α1α2)), but not vice versa. We performed simulations under a wide range of genetic architectures and determined that LCV, unlike state-of-the-art MR methods, produced well-calibrated false positive rates and reliable gcp estimates in the presence of genetic correlations and asymmetric genetic architectures; we also determined that LCV is well-powered to detect a causal effect. We applied LCV to GWAS summary statistics for 52 traits (average N=331k), identifying partially or fully genetically causal effects (1% FDR) for 59 pairs of traits, including 30 pairs of traits with high gcp estimates (gcp>0.6). Results consistent with the published literature included causal effects on myocardial infarction (MI) for LDL, triglycerides and BMI. Novel findings included an effect of LDL on bone mineral density, consistent with clinical trials of statins in osteoporosis. These results demonstrate that it is possible to distinguish between correlation and causation using genetic data.

66: Understanding 6th-Century Barbarian Social Organization and Migration through Paleogenomics
Posted to bioRxiv 20 Feb 2018

Understanding 6th-Century Barbarian Social Organization and Migration through Paleogenomics
4,369 downloads genetics

Carlos Eduardo G. Amorim, Stefania Vai, Cosimo Posth, Alessandra Modi, István Koncz, Susanne Hakenbeck, Maria Cristina La Rocca, Balazs Mende, Dean Bobo, Walter Pohl, Luisella Pejrani Baricco, Elena Bedini, Paolo Francalacci, Caterina Giostra, Tivadar Vida, Daniel Winger, Uta von Freeden, Silvia Ghirotto, Martina Lari, Guido Barbujani, Johannes Krause, David Caramelli, Patrick J Geary, Krishna R Veeramah

Despite centuries of research, much about the barbarian migrations that took place between the fourth and sixth centuries in Europe remains hotly debated. To better understand this key era that marks the dawn of modern European societies, we obtained ancient genomic DNA from 63 samples from two cemeteries (from Hungary and Northern Italy) that have been previously associated with the Longobards, a barbarian people that ruled large parts of Italy for over 200 years after invading from Pannonia in 568 CE. Our dense cemetery-based sampling revealed that each cemetery was primarily organized around one large pedigree, suggesting that biological relationships played an important role in these early Medieval societies. Moreover, we identified genetic structure in each cemetery involving at least two groups with different ancestry that were very distinct in terms of their funerary customs. Finally, our data was consistent with the proposed long-distance migration from Pannonia to Northern Italy.

67: Regulatory variants explain much more heritability than coding variants across 11 common diseases
Posted to bioRxiv 21 Apr 2014

Regulatory variants explain much more heritability than coding variants across 11 common diseases
4,367 downloads genetics

Alexander Gusev, S. Hong Lee, Benjamin Neale, Gosia Trynka, Bjarni J. Vilhjálmsson, Hilary Finucane, Han Xu, Chongzhi Zang, Stephan Ripke, Eli Stahl, Schizophrenia Working Group of the Psychiatric Genomics Consortium, SWE-SCZ Consortium, Anna K Kähler, Christina M Hultman, Shaun M Purcell, Steven A. McCarroll, Mark Daly, Bogdan Pasaniuc, Patrick F Sullivan, Naomi R. Wray, Soumya Raychaudhuri, Alkes L. Price

Common variants implicated by genome-wide association studies (GWAS) of complex diseases are known to be enriched for coding and regulatory variants. We applied methods to partition the heritability explained by genotyped SNPs (h2g) across functional categories (while accounting for shared variance due to linkage disequilibrium) to genotype and imputed data for 11 common diseases. DNaseI Hypersensitivity Sites (DHS) from 218 cell-types, spanning 16% of the genome, explained an average of 79% of h2g (5.1× enrichment; P < 10−20); further enrichment was observed at enhancer and cell-type specific DHS elements. The enrichments were much smaller in analyses that did not use imputed data or were restricted to GWAS- associated SNPs. In contrast, coding variants, spanning 1% of the genome, explained only 8% of h2g (13.8× enrichment; P = 5 × 10−4). We replicated these findings but found no significant contribution from rare coding variants in an independent schizophrenia cohort genotyped on GWAS and exome chips.

68: Comparative ACE2 variation and primate COVID-19 risk
Posted to bioRxiv 11 Apr 2020

Comparative ACE2 variation and primate COVID-19 risk
4,361 downloads genetics

Amanda D. Melin, Mareike C. Janiak, Frank Marrone, Paramjit S. Arora, James P. Higham

The emergence of the novel coronavirus SARS-CoV-2, which in humans is highly infectious and leads to the potentially fatal disease COVID-19, has caused tens of thousands of deaths and huge global disruption. The viral infection may also represent an existential threat to our closest living relatives, the nonhuman primates, many of which have already been reduced to small and endangered populations. The virus engages the host cell receptor, angiotensin‐converting enzyme‐2 (ACE2), through the receptor binding domain (RBD) on the spike protein. The contact surface of ACE2 displays amino acid residues that are critical for virus recognition, and variations at these critical residues are likely to modulate infection susceptibility across species. While infection studies have shown that rhesus macaques exposed to the virus develop COVID-19-like symptoms, the susceptibility of other nonhuman primates is unknown. Here, we show that all apes, including chimpanzees, bonobos, gorillas, and orangutans, and all African and Asian monkeys (catarrhines), exhibit the same set of twelve key amino acid residues as human ACE2. Monkeys in the Americas, and some tarsiers, lemurs and lorisoids, differ at significant contact residues, and protein modeling predicts that these differences should greatly reduce the binding affinity of the ACE2 for the virus, hence moderating their susceptibility for infection. Other lemurs are predicted to be closer to catarrhines in their susceptibility. Our study suggests that apes and African and Asian monkeys, as well as some lemurs are all likely to be highly susceptible to SARS-CoV-2, representing a critical threat to their survival. Urgent actions may be necessary to limit their exposure to humans. ### Competing Interest Statement The authors have declared no competing interest.

69: How to identify the best index case in families with hereditary breast and ovarian cancer
Posted to bioRxiv 23 Jan 2019

How to identify the best index case in families with hereditary breast and ovarian cancer
4,306 downloads genetics

Margot J. Wyrwoll, Lea Fuchs, Daniel E.J. Waschk

To date, a disease-causing mutation can be found in approximately 15-30% of families with hereditary breast and ovarian cancer and still more than half of the cases remain unsolved. Usually it is intended to perform genetic analyses in the family member with the most severe phenotype, which, however, is not always possible. Moreover, no standard criteria have been established to define the person who is most suitable for genetic testing within a family: the best index case . This study now establishes clinical selection criteria to identify the best index case in families with hereditary breast and ovarian cancer and analyses the impact on genetic testing. 130 patients who presented at our department from 2016 to 2018 were divided into two groups. In group A, genetic analyses were performed in the best index case (N = 98). In group B, at least one family member had a more severe phenotype compared to the person who was tested (N = 32). The mutation detection rate was significantly higher for group A compared to group B (64.3% vs. 32.0%, p = 0.034), even though there was no significant difference of calculated mutation carrier risks between these groups. Furthermore, the mutation detection rate in group A was notably higher compared to the results of previous studies. We conclude that the mutation detection rate in families with hereditary breast and ovarian cancer can be improved by identifying the best index case for genetic testing according to the clinical selection criteria reported here and suggest that these can be used as a guideline for genetic counseling.

70: Population history of the Sardinian people inferred from whole-genome sequencing
Posted to bioRxiv 07 Dec 2016

Population history of the Sardinian people inferred from whole-genome sequencing
4,224 downloads genetics

Charleston W. K. Chiang, Joseph H. Marcus, Carlo Sidore, Hussein Al-Asadi, Magdalena Zoledziewska, Maristella Pitzalis, Fabio Busonero, Andrea Maschio, Giorgio Pistis, Maristella Steri, Andrea Angius, Kirk E Lohmueller, Goncalo R. Abecasis, David Schlessinger, Francesco Cucca, John Novembre

The population of the Mediterranean island of Sardinia has made important contributions to genome-wide association studies of traits and diseases. The history of the Sardinian population has also been the focus of much research, and in recent ancient DNA (aDNA) studies, Sardinia has provided unique insight into the peopling of Europe and the spread of agriculture. In this study, we analyze whole-genome sequences of 3,514 Sardinians to address hypotheses regarding the founding of Sardinia and its relation to the peopling of Europe, including examining fine-scale substructure, population size history, and signals of admixture. We find the population of the mountainous Gennargentu region shows elevated genetic isolation with higher levels of ancestry associated with mainland Neolithic farmers and depleted ancestry associated with more recent Bronze Age Steppe migrations on the mainland. Notably, the Gennargentu region also has elevated levels of pre-Neolithic hunter-gatherer ancestry and increased affinity to Basque populations. Further, allele sharing with pre-Neolithic and Neolithic mainland populations is larger on the X chromosome compared to the autosome, providing evidence for a sex-biased demographic history in Sardinia. These results give new insight to the demography of ancestral Sardinians and help further the understanding of sharing of disease risk alleles between Sardinia and mainland populations.

71: Transcriptome-wide association studies: opportunities and challenges
Posted to bioRxiv 20 Oct 2017

Transcriptome-wide association studies: opportunities and challenges
4,179 downloads genetics

Michael Wainberg, Nasa Sinnott-Armstrong, Nicholas Mancuso, Alvaro N Barbeira, David A Knowles, David Golan, Raili Ermel, Arno Ruusalepp, Thomas Quertermous, Ke Hao, Johan LM Björkegren, Hae Kyung Im, Bogdan Pasaniuc, Manuel A. Rivas, Anshul Kundaje

Transcriptome-wide association studies (TWAS) integrate GWAS and expression quantitative trait locus (eQTL) datasets to discover candidate causal gene-trait associations. We integrate multi-tissue expression panels and summary GWAS for LDL cholesterol and Crohn's disease to show that TWAS are highly vulnerable to discovering non-causal genes, because variants at a single GWAS hit locus are often eQTLs for multiple genes. TWAS exhibit acute instability when the tissue of the expression panel is changed: candidate causal genes that are TWAS hits in one tissue are usually no longer hits in another, due to lack of expression or strong eQTLs, while non-causal genes at the same loci remain. While TWAS is statistically valid when used as a weighted burden test to identify trait-associated loci, it is invalid to interpret TWAS associations as causal genes because the false discovery rate for TWAS causal gene discovery is not only high, but unquantifiable. More broadly, our results showcase limitations of using expression variation across individuals to determine causal genes at GWAS loci.

72: Genomic SEM Provides Insights into the Multivariate Genetic Architecture of Complex Traits
Posted to bioRxiv 21 Apr 2018

Genomic SEM Provides Insights into the Multivariate Genetic Architecture of Complex Traits
4,167 downloads genetics

Andrew D Grotzinger, Mijke Rhemtulla, Ronald de Vlaming, Stuart J. Ritchie, Travis T. Mallard, W. David Hill, Hill F. Ip, Andrew McIntosh, Ian J. Deary, Philipp D Koellinger, K. Paige Harden, Michel G. Nivard, Elliot M Tucker-Drob

Methods for using GWAS to estimate genetic correlations between pairwise combinations of traits have produced 'atlases' of genetic architecture. Genetic atlases reveal pervasive pleiotropy, and genome-wide significant loci are often shared across different phenotypes. We introduce genomic structural equation modeling (Genomic SEM), a multivariate method for analyzing the joint genetic architectures of complex traits. Using formal methods for modeling covariance structure, Genomic SEM synthesizes genetic correlations and SNP-heritabilities inferred from GWAS summary statistics of individual traits from samples with varying and unknown degrees of overlap. Genomic SEM can be used to identify variants with effects on general dimensions of cross-trait liability, boost power for discovery, and calculate more predictive polygenic scores. Finally, Genomic SEM can be used to identify loci that cause divergence between traits, aiding the search for what uniquely differentiates highly correlated phenotypes. We demonstrate several applications of Genomic SEM, including a joint analysis of GWAS summary statistics from five genetically correlated psychiatric traits. We identify 27 independent SNPs not previously identified in the univariate GWASs, 5 of which have been reported in other published GWASs of the included traits. Polygenic scores derived from Genomic SEM consistently outperform polygenic scores derived from GWASs of the individual traits. Genomic SEM is flexible, open ended, and allows for continuous innovations in how multivariate genetic architecture is modeled.

73: CRISPR-Cas9 Gene Editing in Lizards Through Microinjection of Unfertilized Oocytes
Posted to bioRxiv 31 Mar 2019

CRISPR-Cas9 Gene Editing in Lizards Through Microinjection of Unfertilized Oocytes
4,128 downloads genetics

Ashley M. Rasys, Sungdae Park, Rebecca E. Ball, Aaron J. Alcala, James D. Lauderdale, Douglas B. Menke

CRISPR-cas mediated gene editing has enabled the direct manipulation of gene function in many species. However, the reproductive biology of reptiles presents unique barriers for the use of this technology, and there are currently no reptiles with effective methods for targeted mutagenesis. Here we present a new approach that enables the efficient production of CRISPR-cas induced mutations in Anolis lizards, an important model for studies of reptile evolution and development.

74: Genome-wide meta-analysis of depression identifies 102 independent variants and highlights the importance of the prefrontal brain regions.
Posted to bioRxiv 09 Oct 2018

Genome-wide meta-analysis of depression identifies 102 independent variants and highlights the importance of the prefrontal brain regions.
4,109 downloads genetics

David M. Howard, Mark J. Adams, Toni-Kim Clarke, Jonathan D. Hafferty, Jude Gibson, Masoud Shirali, Jonathan R. I. Coleman, Saskia P Hagenaars, Joey Ward, Eleanor M. Wigmore, Clara Alloza, Xueyi Shen, Miruna C. Barbu, Eileen Y. Xu, Heather Whalley, Riccardo E. Marioni, David J. Porteous, Gail Davies, Ian J. Deary, Gibran Hemani, Klaus Berger, Henning Teismann, Rajesh Rawal, Volker Arolt, Bernhard T. Baune, Udo Dannlowski, Katharina Domschke, Chao Tian, David A. Hinds, 23andMe Research Team, Major Depressive Disorder Working Group of the Psychiatric Genomics Consortium, Maciej Trzaskowski, Enda M. Byrne, Stephan Ripke, Daniel J. Smith, Patrick F Sullivan, Naomi R. Wray, Gerome Breen, Cathryn M. Lewis, Andrew McIntosh

Major depression is a debilitating psychiatric illness that is typically associated with low mood, anhedonia and a range of comorbidities. Depression has a heritable component that has remained difficult to elucidate with current sample sizes due to the polygenic nature of the disorder. To maximise sample size, we meta-analysed data on 807,553 individuals (246,363 cases and 561,190 controls) from the three largest genome-wide association studies of depression. We identified 102 independent variants, 269 genes, and 15 gene-sets associated with depression, including both genes and gene-pathways associated with synaptic structure and neurotransmission. Further evidence of the importance of prefrontal brain regions in depression was provided by an enrichment analysis. In an independent replication sample of 1,306,354 individuals (414,055 cases and 892,299 controls), 87 of the 102 associated variants were significant following multiple testing correction. Based on the putative genes associated with depression this work also highlights several potential drug repositioning opportunities. These findings advance our understanding of the complex genetic architecture of depression and provide several future avenues for understanding aetiology and developing new treatment approaches.

75: The druggable genome and support for target identification and validation in drug development
Posted to bioRxiv 26 Jul 2016

The druggable genome and support for target identification and validation in drug development
4,083 downloads genetics

Chris Finan, Anna Gaulton, Felix A. Kruger, Tom Lumbers, Tina Shah, Jorgen Engmann, Luana Galver, Ryan Kelley, Anneli Karlsson, Rita Santos, John P. Overington, Aroon D. Hingorani, Juan P Casas

Target identification (identifying the correct drug targets for each disease) and target validation (demonstrating the effect of target perturbation on disease biomarkers and disease end-points) are essential steps in drug development. We showed previously that biomarker and disease endpoint associations of single nucleotide polymorphisms (SNPs) in a gene encoding a drug target accurately depict the effect of modifying the same target with a pharmacological agent; others have shown that genomic support for a target is associated with a higher rate of drug development success. To delineate drug development (including repurposing) opportunities arising from this paradigm, we connected complex disease- and biomarker-associated loci from genome wide association studies (GWAS) to an updated set of genes encoding druggable human proteins, to compounds with bioactivity against these targets and, where these were licensed drugs, to clinical indications. We used this set of genes to inform the design of a new genotyping array, to enable druggable genome-wide association studies for drug target selection and validation in human disease.

76: Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types
Posted to bioRxiv 25 Jan 2017

Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types
4,070 downloads genetics

Hilary K. Finucane, Yakir A. Reshef, Verneri Anttila, Kamil Slowikowski, Alexander Gusev, Andrea Byrnes, Steven Gazal, Po-Ru Loh, Caleb Lareau, Noam Shoresh, Giulio Genovese, Arpiar Saunders, Evan Z. Macosko, Samuela Pollack, The Brainstorm Consortium, John R.B. Perry, Jason D. Buenrostro, Bradley E. Bernstein, Soumya Raychaudhuri, Steven McCarroll, Benjamin Neale, Alkes L. Price

Genetics can provide a systematic approach to discovering the tissues and cell types relevant for a complex disease or trait. Identifying these tissues and cell types is critical for following up on non-coding allelic function, developing ex-vivo models, and identifying therapeutic targets. Here, we analyze gene expression data from several sources, including the GTEx and PsychENCODE consortia, together with genome-wide association study (GWAS) summary statistics for 48 diseases and traits with an average sample size of 169,331, to identify disease-relevant tissues and cell types. We develop and apply an approach that uses stratified LD score regression to test whether disease heritability is enriched in regions surrounding genes with the highest specific expression in a given tissue. We detect tissue-specific enrichments at FDR < 5% for 34 diseases and traits across a broad range of tissues that recapitulate known biology. In our analysis of traits with observed central nervous system enrichment, we detect an enrichment of neurons over other brain cell types for several brain-related traits, enrichment of inhibitory over excitatory neurons for bipolar disorder but excitatory over inhibitory neurons for schizophrenia and body mass index, and enrichments in the cortex for schizophrenia and in the striatum for migraine. In our analysis of traits with observed immunological enrichment, we identify enrichments of T cells for asthma and eczema, B cells for primary biliary cirrhosis, and myeloid cells for Alzheimer's disease, which we validated with independent chromatin data. Our results demonstrate that our polygenic approach is a powerful way to leverage gene expression data for interpreting GWAS signal.

77: Trans effects on gene expression can drive omnigenic inheritance
Posted to bioRxiv 24 Sep 2018

Trans effects on gene expression can drive omnigenic inheritance
4,069 downloads genetics

Xuanyao Liu, Yang I. Li, Jonathan K. Pritchard

Early genome-wide association studies (GWAS) led to the surprising discovery that, for typical complex traits, the most significant genetic variants contribute only a small fraction of the estimated heritability. Instead, it has become clear that a huge number of common variants, each with tiny effects, explain most of the heritability. Previously, we argued that these patterns conflict with standard conceptual models, and that new models are needed. Here we provide a formal model in which genetic contributions to complex traits can be partitioned into direct effects from core genes, and indirect effects from peripheral genes acting as trans-regulators. We argue that the central importance of peripheral genes is a direct consequence of the large contribution of trans-acting variation to gene expression variation. In particular, we propose that if the core genes for a trait are co-regulated - as seems likely - then the effects of peripheral variation can be amplified by these co-regulated networks such that nearly all of the genetic variance is driven by peripheral genes. Thus our model proposes a framework for understanding key features of the architecture of complex traits.

78: GeneWeld: a method for efficient targeted integration directed by short homology
Posted to bioRxiv 03 Oct 2018

GeneWeld: a method for efficient targeted integration directed by short homology
4,064 downloads genetics

Wesley A. Wierson, Jordan M. Welker, Maira P. Almeida, Carla M. Mann, Dennis A. Webster, Melanie E. Torrie, Trevor J. Weiss, Macy K. Vollbrecht, Merrina Lan, Kenna C. McKeighan, Jacklyn Levey, Zhitao Ming, Alec Wehmeier, Christopher S. Mikelson, Jeffrey A. Haltom, Kristen M. Kwan, Chi-Bin Chien, Darius Balciunas, Maura McGrail, Karl J Clark, Beau R. Webber, Branden Moriarity, Staci L. Solin, Daniel F. Carlson, Drena L. Dobbs, Maura McGrail, Jeffrey J Essner

Choices for genome engineering and integration involve high efficiency with little or no target specificity or high specificity with low activity. Here, we describe a targeted integration strategy, called GeneWeld, and a vector series for gene tagging, pGTag (plasmids for Gene Tagging), which promote highly efficient and precise targeted integration in zebrafish embryos, pig fibroblasts, and human cells utilizing the CRISPR/Cas9 system. Our work demonstrates that in vivo targeting of a genomic locus of interest with CRISPR/Cas9 and a donor vector containing as little as 24 to 48 base pairs of homology directs precise and efficient knock-in when the homology arms are exposed with a double strand break in vivo . Our results suggest that the length of homology is not important in the design of knock-in vectors but rather how the homology is presented to a double strand break in the genome. Given our results targeting multiple loci in different species, we expect the accompanying protocols, vectors, and web interface for homology arm design to help streamline gene targeting and applications in CRISPR and TALEN compatible systems.

79: Re-evaluation of SNP heritability in complex human traits
Posted to bioRxiv 09 Sep 2016

Re-evaluation of SNP heritability in complex human traits
4,005 downloads genetics

Doug Speed, Na Cai, The UCLEB Consortium, Michael R. Johnson, Sergey Nejentsev, David J Balding

SNP heritability, the proportion of phenotypic variance explained by SNPs, has been reported for many hundreds of traits. Its estimation requires strong prior assumptions about the distribution of heritability across the genome, but the assumptions in current use have not been thoroughly tested. By analyzing imputed data for a large number of human traits, we empirically derive a model that more accurately describes how heritability varies with minor allele frequency, linkage disequilibrium and genotype certainty. Across 19 traits, our improved model leads to estimates of common SNP heritability on average 43% (SD 3) higher than those obtained from the widely-used software GCTA, and 25% (SD 2) higher than those from the recently-proposed extension GCTA-LDMS. Previously, DNaseI hypersensitivity sites were reported to explain 79% of SNP heritability; using our improved heritability model their estimated contribution is only 24%.

80: The Genetic History of France
Posted to bioRxiv 23 Jul 2019

The Genetic History of France
3,991 downloads genetics

Aude Saint Pierre, Joanna Giemza, Matilde Karakachoff, Isabel Alves, Philippe Amouyel, Jean-François Dartigues, Christophe Tzourio, Martial Monteil, Pilar Galan, Serge Hercberg, Richard Redon, Emmanuelle Génin, Christian Dina

The study of the genetic structure of different countries within Europe has provided significant insights into their demographic history and their actual stratification. Although France occupies a particular location at the end of the European peninsula and at the crossroads of migration routes, few population genetic studies have been conducted so far with genome-wide data. In this study, we analyzed SNP-chip genetic data from 2 184 individuals born in France who were enrolled in two independent population cohorts. Using FineStructure, six different genetic clusters of individuals were found that were very consistent between the two cohorts. These clusters match extremely well the geography and overlap with historical and linguistic divisions of France. By modeling the relationship between genetics and geography using EEMS software, we were able to detect gene flow barriers that are similar in the two cohorts and corresponds to major French rivers or mountains. Estimations of effective population sizes using IBDNe program also revealed very similar patterns in both cohorts with a rapid increase of effective population sizes over the last 150 generations similar to what was observed in other European countries. A marked bottleneck is also consistently seen in the two datasets starting in the fourteenth century when the Black Death raged in Europe. In conclusion, by performing the first exhaustive study of the genetic structure of France, we fill a gap in the genetic studies in Europe that would be useful to medical geneticists but also historians and archeologists.

