Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 92,643 bioRxiv papers from 395,587 authors.
Most downloaded bioRxiv papers, all time
in category genetics
4,686 results found. For more information, click each entry to expand.
3,622 downloads genetics
Doruk Beyter, Helga Ingimundardottir, Hannes P. Eggertsson, Eythor Bjornsson, Snaedis Kristmundsdottir, Svenja Mehringer, Hakon Jonsson, Marteinn T Hardarson, Droplaug N Magnusdottir, Ragnar P. Kristjansson, Sigurjon A Gudjonsson, Sverrir T Sverrisson, Guillaume Holley, Gudmundur Eyjolfsson, Isleifur Olafsson, Olof Sigurdardottir, Gisli Masson, Unnur Thorsteinsdottir, Daniel F. Gudbjartsson, Patrick Sulem, Olafur T Magnusson, Bjarni V. Halldorsson, Kari Stefansson
Long-read sequencing (LRS) promises to improve characterization of structural variants (SVs), a major source of genetic diversity. We generated LRS data on 1,817 Icelanders using Oxford Nanopore Technologies, and identified a median of 23,111 autosomal structural variants per individual (a median of 11,506 insertions and 11,576 deletions), spanning cumulatively a median of 9.9 Mb. We found that rare SVs are larger in size than common ones and are more likely to impact protein function. We discovered an association with a rare deletion of the first exon of PCSK9 . Carriers of this deletion have 0.93 mmol/L (1.36 sd) lower LDL cholesterol levels than the population average (p-value = 2.4·10−22). We show that SVs can be accurately characterized at population scale using long read sequence data in a genomewide non-targeted fashion and how these variants impact disease.
3,593 downloads genetics
Genetic variants in functional regions of the genome are enriched for complex trait heritability. Here, we introduce a new method for polygenic prediction, LDpred-funct, that leverages trait-specific functional priors to increase prediction accuracy. We fit priors using the recently developed baseline-LD model, which includes coding, conserved, regulatory and LD-related annotations. We analytically estimate posterior mean causal effect sizes and then use cross-validation to regularize these estimates, improving prediction accuracy for sparse architectures. LDpred-funct attained higher prediction accuracy than other polygenic prediction methods in simulations using real genotypes. We applied LDpred-funct to predict 21 highly heritable traits in the UK Biobank. We used association statistics from British-ancestry samples as training data (avg N=373K) and samples of other European ancestries as validation data (avg N=22K), to minimize confounding. LDpred-funct attained a +4.6% relative improvement in average prediction accuracy (avg prediction R2=0.144; highest R2=0.413 for height) compared to SBayesR (the best method that does not incorporate functional information). For height, meta-analyzing training data from UK Biobank and 23andMe cohorts (total N=1107K; higher heritability in UK Biobank cohort) increased prediction R2 to 0.431. Our results show that incorporating functional priors improves polygenic prediction accuracy, consistent with the functional architecture of complex traits. ### Competing Interest Statement N.F and A.A. work for 23andMe.
3,568 downloads genetics
Rare diseases and their underlying molecular causes are often poorly studied, posing challenges for patient diagnosis and prognosis. The development of next-generation sequencing and its decreasing costs promises to alleviate such issues by supplying personal genomic information at a moderate price. Here, we used crowdfunding as an alternative funding source to sequence the genome of Lil BUB, a celebrity cat affected by rare disease phenotypes characterized by supernumerary digits, osteopetrosis and dwarfism, all phenotypic traits that also occur in human patients. We discovered that Lil BUB is affected by two distinct mutations: a heterozygous mutation in the limb enhancer of the Sonic hedgehog gene, previously associated with polydactyly in Hemingway cats; and a novel homozygous frameshift deletion affecting the TNFRSF11A (RANK) gene, which has been linked to osteopetrosis in humans. We communicated the progress of this project to a large online audience, detailing the 'inner workings' of personalized whole genome sequencing with the aim of improving genetic literacy. Our results highlight the importance of genomic analysis in the identification of disease-causing mutations and support crowdfunding as a means to fund low-budget projects and as a platform for scientific communication.
3,510 downloads genetics
We compared whole-exome sequencing (WES) and whole-genome sequencing (WGS) in six unrelated individuals. In the regions targeted by WES capture (81.5% of the consensus coding genome), the mean numbers of single-nucleotide variants (SNVs) and small insertions/deletions (indels) detected per sample were 84,192 and 13,325, respectively, for WES, and 84,968 and 12,702, respectively, for WGS. For both SNVs and indels, the distributions of coverage depth, genotype quality, and minor read ratio were more uniform for WGS than for WES. After filtering, a mean of 74,398 (95.3%) high-quality (HQ) SNVs and 9,033 (70.6%) HQ indels were called by both platforms. A mean of 105 coding HQ SNVs and 32 indels were identified exclusively by WES, whereas 692 HQ SNVs and 105 indels were identified exclusively by WGS. We Sanger sequenced a random selection of these exclusive variants. For SNVs, the proportion of false-positive variants was higher for WES (78%) than for WGS (17%). The estimated mean number of real coding SNVs (656, ~3% of all coding HQ SNVs) identified by WGS and missed by WES was greater than the number of SNVs identified by WES and missed by WGS (26). For indels, the proportions of false-positive variants were similar for WES (44%) and WGS (46%). Finally, WES was not reliable for the detection of copy number variations, almost all of which extended beyond the targeted regions. Although currently more expensive, WGS is more powerful than WES for detecting potential disease-causing mutations within WES regions, particularly those due to SNVs.
3,503 downloads genetics
Bayazit Yunusbayev, Mait Metspalu, Ene Metspalu, Albert Valeev, Sergei Litvinov, Ruslan Valiev, Vita Akhmetova, Elena Balanovska, Oleg Balanovsky, Shahlo Turdikulova, Dilbar Dalimova, Pagbajabyn Nymadawa, Ardeshir Bahmanimehr, Hovhannes Sahakyan, Kristiina Tambets, Sardana Fedorova, Nikolay Barashkov, Irina Khidiatova, Evelin Mihailov, Rita Khusainova, Larisa Damba, Miroslava Derenko, Boris Malyarchuk, Ludmila Osipova, Mikhail Voevoda, Levon Yepiskoposyan, Toomas Kivisild, Elza Khusnutdinova, Richard Villems
The Turkic peoples represent a diverse collection of ethnic groups defined by the Turkic languages. These groups have dispersed across a vast area, including Siberia, Northwest China, Central Asia, East Europe, the Caucasus, Anatolia, the Middle East, and Afghanistan. The origin and early dispersal history of the Turkic peoples is disputed, with candidates for their ancient homeland ranging from the Transcaspian steppe to Manchuria in Northeast Asia. Previous genetic studies have not identified a clear-cut unifying genetic signal for the Turkic peoples, which lends support for language replacement rather than demic diffusion as the model for the Turkic language's expansion. We addressed the genetic origin of 373 individuals from 22 Turkic-speaking populations, representing their current geographic range, by analyzing genome-wide high-density genotype data. Most of the Turkic peoples studied, except those in Central Asia, genetically resembled their geographic neighbors, in agreement with the elite dominance model of language expansion. However, western Turkic peoples sampled across West Eurasia shared an excess of long chromosomal tracts that are identical by descent (IBD) with populations from present-day South Siberia and Mongolia (SSM), an area where historians center a series of early Turkic and non-Turkic steppe polities. The observed excess of long chromosomal tracts IBD (>1cM) between populations from SSM and Turkic peoples across West Eurasia was statistically significant. Finally, we used the ALDER method and inferred admixture dates (~9th-17th centuries) that overlap with the Turkic migrations of the 5th-16th centuries. Thus, our results indicate historical admixture among Turkic peoples, and the recent shared ancestry with modern populations in SSM supports one of the hypothesized homelands for their nomadic Turkic and related Mongolic ancestors.
3,488 downloads genetics
Hailieng Huang, Ming Fang, Luke Jostins, Maša Umićević Mirkov, Gabrielle Boucher, Carl A. Anderson, Vibeke Andersen, Isabelle Cleynen, Adrian Cortes, François Crins, Mauro D’Amato, Valérie Deffontaine, Julia Dimitrieva, Elisa Docampo, Mahmoud Elansary, Kyle Kai-How Farh, Andre Franke, Ann-Stephan Gori, Philippe Goyette, Jonas Halfvarson, Talin Haritunians, Jo Knight, Ian C Lawrance, Charlie W Lees, Edouard Louis, Rob Mariman, Theo Meuwissen, Myriam Mni, Yukihide Momozawa, Miles Parkes, Sarah L. Spain, Emilie Théâtre, Gosia Trynka, Jack Satsangi, Suzanne van Sommeren, Severine Vermeire, Ramnik J. Xavier, International IBD Genetics Consortium, Rinse K Weersma, Richard H Duerr, Christopher G. Mathew, John D Rioux, Dermot P.B. McGovern, Judy H Cho, Michel Georges, Mark J. Daly, Jeffrey C Barrett
Inflammatory bowel disease (IBD) is a chronic gastrointestinal inflammatory disorder that affects millions worldwide. Genome-wide association studies (GWAS) have identified 200 IBD-associated loci, but few have been conclusively resolved to specific functional variants. Here we report fine-mapping of 94 IBD loci using high-density genotyping in 67,852 individuals. Of the 139 independent associations identified in these regions, 18 were pinpointed to a single causal variant with >95% certainty, and an additional 27 associations to a single variant with >50% certainty. These 45 variants are significantly enriched for protein-coding changes (n=13), direct disruption of transcription factor binding sites (n=3) and tissue specific epigenetic marks (n=10), with the latter category showing enrichment in specific immune cells among associations stronger in CD and gut mucosa among associations stronger in UC. The results of this study suggest that high-resolution, fine-mapping in large samples can convert many GWAS discoveries into statistically convincing causal variants, providing a powerful substrate for experimental elucidation of disease mechanisms.
3,459 downloads genetics
In order to infer that a single-nucleotide polymorphism (SNP) either affects a phenotype or is linkage disequilibrium with a causal site, we must have some assurance that any SNP-phenotype correlation is not the result of confounding with environmental variables that also affect the trait. In this work we study the properties of LD Score regression, a recently developed method for using summary statistics from genome-wide association studies (GWAS) to ensure that confounding does not inflate the number of false positives. We do not treat the effects of genetic variation as a random variable and thus are able to obtain results about the unbiasedness of this method. We demonstrate that LD Score regression can produce estimates of confounding at null SNPs that are unbiased or conservative under fairly general conditions. This robustness holds in the case of the parent genotype affecting the offspring phenotype through some environmental mechanism, despite the resulting correlation over SNPs between LD Scores and the degree of confounding. Additionally, we demonstrate that LD Score regression can produce reasonably robust estimates of the genetic correlation, even when its estimates of the genetic covariance and the two univariate heritabilities are substantially biased.
3,422 downloads genetics
Ashley L. Lennox, Ruiji Jiang, Lindsey Suit, Brieana Fregeau, Charles J. Sheehan, Kimberly A. Aldinger, Ching Moey, Iryna Lobach, Ghayda Mirzaa, Alexandra Afenjar, Dusica Babovic-Vuksanovic, Stéphane Bézieau, Patrick R. Blackburn, Jens Bunt, Lydie Burglen, Perrine Charles, Brian H.Y. Chung, Benjamin Cogné, Suzanne DeBrosse, Nataliya Di Donato, Laurence Faivre, Delphine Héron, A Micheil Innes, Bertrand Isidor, Bethany L. Johnson-Kerner, Boris Keren, Amy Kimball, Eric W Klee, Paul Kuentz, Sébastien Küry, Dominique Martin-Coignard, Cyril Mignot, Noriko Miyake, Caroline Nava, Mathilde Nizon, Diana Rodriguez, Lot Snijders Blok, Christel Thauvin-Robinet, Julien Thevenon, Marie Vincent, Alban Ziegler, William Dobyns, Linda J Richards, A. James Barkovich, Stephen N. Floor, Debra L. Silver, Elliott H. Sherr
De novo germline mutations in the RNA helicase DDX3X account for 1-3% of unexplained intellectual disability (ID) cases in females, and are associated with autism, brain malformations, and epilepsy. Yet, the developmental and molecular mechanisms by which DDX3X mutations impair brain function are unknown. Here we use human and mouse genetics, and cell biological and biochemical approaches to elucidate mechanisms by which pathogenic DDX3X variants disrupt brain development. We report the largest clinical cohort to date with DDX3X mutations (n=78), demonstrating a striking correlation between recurrent dominant missense mutations, polymicrogyria, and the most severe clinical outcomes. We show that Ddx3x controls cortical development by regulating neuronal generation and migration. Severe DDX3X missense mutations profoundly disrupt RNA helicase activity and induce ectopic RNA-protein granules and aberrant translation in neural progenitors and neurons. Together, our study demonstrates novel mechanisms underlying DDX3X syndrome, and highlights roles for RNA-protein aggregates in the pathogenesis of neurodevelopmental disease.
3,414 downloads genetics
We learn about population history and underlying evolutionary biology through patterns of genetic polymorphism. Many approaches to reconstruct evolutionary histories focus on a limited number of informative statistics describing distributions of allele frequencies or patterns of linkage disequilibrium. We show that many commonly used statistics are part of a broad family of two-locus moments whose expectation can be computed jointly and rapidly under a wide range of scenarios, including complex multi-population demographies with continuous migration and admixture events. A full inspection of these statistics reveals that widely used models of human history fail to predict simple patterns of linkage disequilibrium. To jointly capture the information contained in classical and novel statistics, we implemented a tractable likelihood-based inference framework for demographic history. Using this approach, we show that human evolutionary models that include archaic admixture in Africa, Asia, and Europe provide a much better description of patterns of genetic diversity across the human genome. We estimate that an unidentified, deeply diverged population admixed with modern humans within Africa both before and after the split of African and Eurasian populations, contributing 4 - 8% genetic ancestry to individuals in world-wide populations. Author Summary Throughout human history, populations have expanded and contracted, split and merged, and ex-changed migrants. Because these events affected genetic diversity, we can learn about human history by comparing predictions from evolutionary models to genetic data. Here, we show how to rapidly compute such predictions for a wide range of diversity measures within and across populations under complex demographic scenarios. While widely used models of human history accurately predict common measures of diversity, we show that they strongly underestimate the co-occurence of low frequency mutations within human populations in Asia, Europe, and Africa. Models allowing for archaic admixture, the relatively recent mixing of human populations with deeply diverged human lineages, resolve this discrepancy. We use such models to infer demographic models that include both recent and ancient features of human history. We recover the well-characterized admixture of Neanderthals in Eurasian populations, as well as admixture from an as-yet unknown diverged human population within Africa, further suggesting that admixture with deeply diverged lineages occurred multiple times in human history. By simultaneously testing model predictions for a broad range of diversity statistics, we can assess the robustness of common evolutionary models, identify missing historical events, and build more informed models of human demography.
3,385 downloads genetics
We present vectors for producing multiple CRISPR gRNAs from a single RNA polymerase II or III transcript in Drosophila. The system, which is based on liberation of gRNAs by processing of flanking tRNAs, permits highly efficient multiplexing of Cas9-based mutagenesis. We also demonstrate that the tRNA-gRNA system markedly increases the efficacy of conditional gene disruption by Cas9 and can promote editing by the recently discovered RNA-guided endonuclease Cpf1.
3,372 downloads genetics
It has been recognized that the Merle coat pattern is not only a visually interesting feature, but it also exerts an important biological role, in terms of hearing and vision impairments. In 2006, the Merle (M) locus was mapped to the SILV gene with a SINE element in it, and the inserted retroelement was proven causative to the Merle phenotype. Mapping of the M locus was a genetic breakthrough and many breeders started implementing SILV SINE testing in their breeding programs. Unfortunately, the situation turned out complicated as genotypes of Merle tested individuals did not always correspond to expected phenotypes, sometimes with undesired health consequences in offspring. Two variants of SILV SINE, allelic to the wild type sequence, have been described so far - Mc and M. Here we report a significantly larger portfolio of existing Merle alleles (Mc, Mc+, Ma, Ma+, M, Mh) in Merle dogs, which are associated with unique coat color features and stratified health impairment risk. The refinement of allelic identification was made possible by systematic, detailed observation of Merle phenotypes in a cohort of 181 dogs from known Merle breeds, by many breeders worldwide, and the use of advanced molecular technology enabling the discrimination of individual Merle alleles with significantly higher precision than previously available. We also show that mosaicism of Merle alleles is an unexpectedly frequent phenomenon, which was identified in 30 out of 181 (16.6%). dogs in our study group. Importantly, not only major alleles, but also minor Merle alleles can be inherited by the offspring. Thus, mosaic findings cannot be neglected and must be reported to the breeder in their whole extent. In light of negative health consequences that may be attributed to certain Merle breeding strategies, we strongly advocate implementation of the refined Merle allele testing for all dogs of Merle breeds to help the breeders in selection of suitable mating partners and production of healthy offspring.
3,365 downloads genetics
Grade of membership models, also known as "admixture models", "topic models" or "Latent Dirichlet Allocation", are a generalization of cluster models that allow each sample to have membership in multiple clusters. These models are widely used in population genetics to model admixed individuals who have ancestry from multiple "populations", and in natural language processing to model documents having words from multiple "topics". Here we illustrate the potential for these models to cluster samples of RNA-seq gene expression data, measured on either bulk samples or single cells. We also provide methods to help interpret the clusters, by identifying genes that are distinctively expressed in each cluster. By applying these methods to several example RNA-seq applications we demonstrate their utility in identifying and summarizing structure and heterogeneity. Applied to data from the GTEx project on 51 human tissues, the approach highlights similarities among biologically-related tissues and identifies distinctively-expressed genes that recapitulate known biology. Applied to single-cell expression data from mouse preimplantation embryos, the approach highlights both discrete and continuous variation through early embryonic development stages, and highlights genes involved in a variety of relevant processes - from germ cell development, through compaction and morula formation, to the formation of inner cell mass and trophoblast at the blastocyte stage. The methods are implemented in the Bioconductor package CountClust.
3,349 downloads genetics
The type II CRISPR/Cas system has recently emerged as a powerful method to manipulate the genomes of various organisms. Here, we report a novel toolbox for high efficiency genome engineering of Drosophila melanogaster consisting of transgenic Cas9 lines and versatile guide RNA (gRNA) expression plasmids. Systematic evaluation reveals Cas9 lines with ubiquitous or germline restricted patterns of activity. We also demonstrate differential activity of the same gRNA expressed from different U6 snRNA promoters, with the previously untested U6:3 promoter giving the most potent effect. Choosing an appropriate combination of Cas9 and gRNA allows targeting of essential and non-essential genes with transmission rates ranging from 25% - 100%. We also provide evidence that our optimized CRISPR/Cas tools can be used for offset nicking-based mutagenesis and, in combination with oligonucleotide donors, to precisely edit the genome by homologous recombination with efficiencies that do not require the use of visible markers. Lastly, we demonstrate a novel application of CRISPR/Cas-mediated technology in revealing loss-of-function phenotypes in somatic cells following efficient biallelic targeting by Cas9 expressed in a ubiquitous or tissue-restricted manner. In summary, our CRISPR/Cas tools will facilitate the rapid evaluation of mutant phenotypes of specific genes and the precise modification of the genome with single nucleotide precision. Our results also pave the way for high throughput genetic screening with CRISPR/Cas.
3,347 downloads genetics
This study investigates the creation of polygenic scores (PGS)s for human population research. PGSs are a linear, usually weighted, combination of risk alleles that estimate the cumulative genetic risk of an individual for a particular trait. While conceptually simple, there are numerous ways to estimate PGSs, not all achieving the same end goals. In this paper, we systematically investigate the impact of four key decisions in the building of PGSs from published genome-wide association meta-analysis results: 1) whether to use single nucleotide polymorphisms (SNPs) assessed by imputation, 2) criteria for selecting which SNPs to include in the score, 3) whether to account for linkage disequilibrium (LD), and 4) if accounting for LD, which type of method best captures the correlation structure among SNPs (i.e. clumping vs. pruning). Using the Health and Retirement Study (HRS), a nationally representative, population-based longitudinal panel study of Americans over the age of 50, we examine the predictive ability as well as the variability and co-variability in PGSs arising from these different estimation approaches. We examine four traits with large published and replicated genome-wide association studies (height, body mass index, educational attainment, and depression). Our central finding demonstrates PGSs that include all available SNPs either explain the most amount of variation in an outcome or are not significantly different than the PGSs that does. Thus, for reproducibility through rigor and transparency, we recommend that researchers include a PGS with all available SNPs as a reference, and provide substantial justification for using alternative methods.
3,337 downloads genetics
Dorothée Diogo, Chao Tian, Christopher S. Franklin, Mervi Alanne-Kinnunen, Michael March, Chris C. A. Spencer, Ciara Vangjeli, Michael E Weale, Hannele Mattsson, Elina Kilpeläinen, Patrick M.A. Sleiman, Dermot F Reilly, Joshua McElwee, Joseph C. Maranville, Arnaub K Chatterjee, Aman Bhandari, the 23andMe Research Team, Mary-Pat Reeve, Janna Hutz, Nan Bing, Sally John, Daniel MacArthur, Veikko Salomaa, Samuli Ripatti, Hakon Hakonarson, Mark J. Daly, Aarno Palotie, David Hinds, Peter Donnelly, Caroline S. Fox, Aaron Day-Williams, Robert M. Plenge, Heiko Runz
Phenome-wide association studies (PheWAS), which assess whether a genetic variant is associated with multiple phenotypes across a phenotypic spectrum, have been proposed as a possible aid to drug development through elucidating mechanisms of action, identifying alternative indications, or predicting adverse drug events (ADEs). Here, we evaluate whether PheWAS can inform target validation during drug development. We selected 25 single nucleotide polymorphisms (SNPs) linked through genome-wide association studies (GWAS) to 19 candidate drug targets for common disease therapeutic indications. We independently interrogated these SNPs through PheWAS in four large real-world data cohorts (23andMe, UK Biobank, FINRISK, CHOP) for association with a total of 1,892 binary endpoints. We then conducted meta-analyses for 145 harmonized disease endpoints in up to 697,815 individuals and joined results with summary statistics from 57 published GWAS. Our analyses replicate 70% of known GWAS associations and identify 10 novel associations with study-wide significance after multiple test correction (P<1.8x10-6; out of 72 novel associations with FDR<0.1). By leveraging directionality and point estimate of the effect sizes, we describe new associations that may predict ADEs, e.g., acne, high cholesterol, gout and gallstones for rs738409 (p.I148M) in PNPLA3; or asthma for rs1990760 (p.T946A) in IFIH1. We further propose how quantitative estimates of genetic safety/efficacy profiles can be used to help prioritize candidate targets for a specific indication. Our results demonstrate PheWAS as a powerful addition to the toolkit for drug discovery.
3,330 downloads genetics
Protein-truncating variants can have profound effects on gene function and are critical for clinical genome interpretation and generating therapeutic hypotheses, but their relevance to medical phenotypes has not been systematically assessed. We characterized the effect of 18,228 protein-truncating variants across 135 phenotypes from the UK Biobank and found 27 associations between medical phenotypes and protein-truncating variants in genes outside the major histocompatibility complex. We performed phenome-wide analyses and directly measured the effect of homozygous carriers, commonly referred to as "human knockouts," across medical phenotypes for genes implicated to be protective against disease or associated with at least one phenotype in our study and found several genes with strong pleiotropic or non-additive effects. Our results illustrate the importance of protein-truncating variants in a variety of diseases.
3,311 downloads genetics
Robert Karlsson, Pietro Biroli, Edward Kong, S Fleur W Meddens, Robbee Wedow, Mark Alan Fontana, Maël Lebreton, Abdel Abdellaoui, Anke R Hammerschlag, Michel G. Nivard, Aysu Okbay, Cornelius A. Rietveld, Pascal N Timshel, Stephen P Tino, Maciej Trzaskowski, Ronald de Vlaming, Christian L Zünd, Yanchun Bao, Laura Buzdugan, Ann H Caplin, Chia-Yen Chen, Peter Eibich, Pierre Fontanillas, Juan R Gonzalez, Peter K Joshi, Ville Karhunen, Aaron Kleinman, Remy Z Levin, Christina M. Lill, Gerardus A Meddens, Gerard Muntané, Sandra Sanchez-Roige, Frank J van Rooij, Erdogan Taskesen, Yang Wu, Futao Zhang, 23andMe Research Team, eQTLgen Consortium, International Cannabis Consortium, Psychiatric Genomics Consortium, Social Science Genetic Association Consortium,, Adam Auton, Jason Boardman, David W Clark, Andrew Conlin, Conor C Dolan, Urs Fischbacher, Patrick JF Groenen, Kathleen Mullan Harris, Gregor Hasler, Albert Hofman, Mohammad A Ikram, Sonia Jain, Ronald C Kessler, Maarten Kooyman, James MacKillop, Minna Männikkö, Carlos Morcillo-Suarez, Matthew B. McQueen, Klaus M Schmidt, Melissa C Smart, Matthias Sutter, A Roy Thurik, Andre G Uitterlinden, Jon White, Harriet de Wit, Jian Yang, Lars Bertram, Dorret Boomsma, Tõnu Esko, Ernst Fehr, David A. Hinds, Magnus Johannesson, Meena Kumari, David Laibson, Patrik K.E. Magnusson, Michelle N Meyer, Arcadi Navarro, Abraham A. Palmer, Tune H Pers, Danielle Posthuma, Daniel Schunk, Murray B. Stein, Rauli Svento, Henning Tiemeier, Paul RHJ Timmers, Patrick Turley, Robert J Ursano, Gert G Wagner, James F Wilson, Jacob Gratten, James J Lee, David Cesarini, Daniel J Benjamin, Philipp D Koellinger, Tõnu Esko
Humans vary substantially in their willingness to take risks. In a combined sample of over one million individuals, we conducted genome-wide association studies (GWAS) of general risk tolerance, adventurousness, and risky behaviors in the driving, drinking, smoking, and sexual domains. We identified 611 approximately independent genetic loci associated with at least one of our phenotypes, including 124 with general risk tolerance. We report evidence of substantial shared genetic influences across general risk tolerance and risky behaviors: 72 of the 124 general risk tolerance loci contain a lead SNP for at least one of our other GWAS, and general risk tolerance is moderately to strongly genetically correlated (|rˆg| ~ 0.25 to 0.50) with a range of risky behaviors. Bioinformatics analyses imply that genes near general-risk-tolerance-associated SNPs are highly expressed in brain tissues and point to a role for glutamatergic and GABAergic neurotransmission. We find no evidence of enrichment for genes previously hypothesized to relate to risk tolerance.
3,308 downloads genetics
The t-SNE (t-distributed stochastic neighbor embedding) is a new dimension reduction and visualization technique for high-dimensional data. t-SNE is rarely applied to human genetic data, even though it is commonly used in other data-intensive biological fields, such as single-cell genomics. We explore the applicability of t-SNE to human genetic data and make these observations: (i) similar to previously used dimension reduction techniques such as principal component analysis (PCA), t-SNE is able to separate samples from different continents; (ii) unlike PCA, t-SNE is more robust with respect to the presence of outliers; (iii) t-SNE is able to display both continental and sub-continental patterns in a single plot. We conclude that the ability for t-SNE to reveal population stratification at different scales could be useful for human genetic association studies.
3,296 downloads genetics
Previous studies of the genetic landscape of Ireland have suggested homogeneity, with population substructure undetectable using single-marker methods. Here we have harnessed the haplotype-based method fineSTRUCTURE in an Irish genome-wide SNP dataset, identifying 23 discrete genetic clusters which segregate with geographical provenance. Cluster diversity is pronounced in the west of Ireland but reduced in the east where older structure has been eroded by historical migrations. Accordingly, when populations from the neighbouring island of Britain are included, a west-east cline of Celtic-British ancestry is revealed along with a particularly striking correlation between haplotypes and geography across both islands. A strong relationship is revealed between subsets of Northern Irish and Scottish populations, where discordant genetic and geographic affinities reflect major migrations in recent centuries. Additionally, Irish genetic proximity of all Scottish samples likely reflects older strata of communication across the narrowest inter-island crossing. Using GLOBETROTTER we detected Irish admixture signals from Britain and Europe and estimated dates for events consistent with the historical migrations of the Norse-Vikings, the Anglo-Normans and the British Plantations. The influence of the former is greater than previously estimated from Y chromosome haplotypes. In all, we paint a new picture of the genetic landscape of Ireland, revealing structure which should be considered in the design of studies examining rare genetic variation and its association with traits.
3,256 downloads genetics
Patrick F Sullivan, Arpana Agrawal, Cynthia M. Bulik, Ole A Andreassen, Anders D Børglum, Gerome Breen, Sven Cichon, Howard J Edenberg, Stephen V. Faraone, Joel Gelernter, CA Mathews, Caroline M Nievergelt, Jordan W Smoller, Michael C O’Donovan, for the Psychiatric Genomics Consortium
The Psychiatric Genomics Consortium (PGC) is the largest consortium in the history of psychiatry. In the past decade, this global effort has delivered a rapidly increasing flow of new knowledge about the fundamental basis of common psychiatric disorders, particularly given its dedication to rapid progress and open science. The PGC has recently commenced a program of research designed to deliver “actionable” findings - genomic results that (a) reveal the fundamental biology, (b) inform clinical practice, and (c) deliver new therapeutic targets. This is the central idea of the PGC: to convert the family history risk factor into biologically, clinically, and therapeutically meaningful insights. The emerging findings suggest that we are entering into a phase of accelerated translation of genetic discoveries to impact psychiatric practice within a precision medicine framework.
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!