21: The Genomic Formation of South and Central Asia
Posted to bioRxiv 31 Mar 2018

The Genomic Formation of South and Central Asia
2,116 downloads genomics

Vagheesh M Narasimhan, Nick J Patterson, Priya Moorjani, Iosif Lazaridis, Lipson Mark, Swapan Mallick, Nadin Rohland, Rebecca Bernardos, Alexander M Kim, Nathan Nakatsuka, Inigo Olalde, Alfredo Coppa, James Mallory, Vyacheslav Moiseyev, Janet Monge, Luca M Olivieri, Nicole Adamski, Nasreen Broomandkhoshbacht, Francesca Candilio, Olivia Cheronet, Brendan J Culleton, Matthew Ferry, Daniel Fernandes, Beatriz Gamarra, Daniel Gaudio, Mateja Hajdinjak, Eadaoin Harney, Thomas K Harper, Denise Keating, Ann-Marie Lawson, Megan Michel, Mario Novak, Jonas Oppenheimer, Niraj Rai, Kendra Sirak, Viviane Slon, Kristin Stewardson, Zhao Zhang, Gaziz Akhatov, Anatoly N Bagashev, Baurzhan Baitanayev, Gian Luca Bonora, Tatiana Chikisheva, Anatoly Derevianko, Enshin Dmitry, Katerina Douka, Nadezhda Dubova, Andrey Epimakhov, Suzanne Freilich, Dorian Fuller, Alexander Goryachev, Andrey Gromov, Bryan Hanks, Margaret Judd, Erlan Kazizov, Aleksander Khokhlov, Egor Kitov, Elena Kupriyanova, Pavel Kuznetsov, Donata Luiselli, Farhad Maksudov, Chris Meiklejohn, Deborah C Merrett, Roberto Micheli, Oleg Mochalov, Zahir Muhammed, Samridin Mustafakulov, Ayushi Nayak, Rykun M Petrovna, Davide Pettner, Richard Potts, Dmitry Razhev, Stefania Sarno, Kulyan Sikhymbaevae, Sergey M Slepchenko, Nadezhda Stepanova, Svetlana Svyatko, Sergey Vasilyev, Massimo Vidale, Dima Voyakin, Antonina Yermolayeva, Alisa Zubova, Vasant S Shinde, Carles Lalueza-Fox, Matthias Meyer, David Anthony, Nicole Boivin, Kumarasmy Thangaraj, Douglas Kennett, Michael Frachetti, Ron Pinhasi, David Reich

The genetic formation of Central and South Asian populations has been unclear because of an absence of ancient DNA. To address this gap, we generated genome-wide data from 362 ancient individuals, including the first from eastern Iran, Turan (Uzbekistan, Turkmenistan, and Tajikistan), Bronze Age Kazakhstan, and South Asia. Our data reveal a complex set of genetic sources that ultimately combined to form the ancestry of South Asians today. We document a southward spread of genetic ancestry from the Eurasian Steppe, correlating with the archaeologically known expansion of pastoralist sites from the Steppe to Turan in the Middle Bronze Age (2300-1500 BCE). These Steppe communities mixed genetically with peoples of the Bactria Margiana Archaeological Complex (BMAC) whom they encountered in Turan (primarily descendants of earlier agriculturalists of Iran), but there is no evidence that the main BMAC population contributed genetically to later South Asians. Instead, Steppe communities integrated farther south throughout the 2nd millennium BCE, and we show that they mixed with a more southern population that we document at multiple sites as outlier individuals exhibiting a distinctive mixture of ancestry related to Iranian agriculturalists and South Asian hunter-gathers. We call this group Indus Periphery because they were found at sites in cultural contact with the Indus Valley Civilization (IVC) and along its northern fringe, and also because they were genetically similar to post-IVC groups in the Swat Valley of Pakistan. By co-analyzing ancient DNA and genomic data from diverse present-day South Asians, we show that Indus Periphery-related people are the single most important source of ancestry in South Asia — consistent with the idea that the Indus Periphery individuals are providing us with the first direct look at the ancestry of peoples of the IVC — and we develop a model for the formation of present-day South Asians in terms of the temporally and geographically proximate sources of Indus Periphery-related, Steppe, and local South Asian hunter-gatherer-related ancestry. Our results show how ancestry from the Steppe genetically linked Europe and South Asia in the Bronze Age, and identifies the populations that almost certainly were responsible for spreading Indo-European languages across much of Eurasia.

22: Daisyfield gene drive systems harness repeated genomic elements as a generational clock to limit spread
Posted to bioRxiv 06 Feb 2017

Daisyfield gene drive systems harness repeated genomic elements as a generational clock to limit spread
1,974 downloads synthetic biology

John Min, Charleston Noble, Devora Najjar, Kevin M Esvelt

Methods of altering wild populations are most useful when inherently limited to local geographic areas. Here we describe a novel form of gene drive based on the introduction of multiple copies of an engineered 'daisy' sequence into repeated elements of the genome. Each introduced copy encodes guide RNAs that target one or more engineered loci carrying the CRISPR nuclease gene and the desired traits. When organisms encoding a drive system are released into the environment, each generation of mating with wild-type organisms will reduce the average number of the guide RNA elements per 'daisyfield' organism by half, serving as a generational clock. The loci encoding the nuclease and payload will exhibit drive only as long as a single copy remains, placing an inherent limit on the extent of spread.

23: Efficient de novo assembly of eleven human genomes using PromethION sequencing and a novel nanopore toolkit
Posted to bioRxiv 26 Jul 2019

Efficient de novo assembly of eleven human genomes using PromethION sequencing and a novel nanopore toolkit
1,929 downloads bioinformatics

Kishwar Shafin, Trevor Pesout, Ryan Lorig-Roach, Marina Haukness, Hugh E Olsen, Colleen Bosworth, Joel Armstrong, Kristof Tigyi, Nicholas Maurer, Sergey Koren, Fritz J. Sedlazeck, Tobias Marschall, Simon Mayes, Vania Costa, Justin M Zook, Kelvin J Liu, Duncan Kilburn, Melanie Sorensen, Katy M Munson, Mitchell R. Vollger, Evan E Eichler, Sofie Salama, David Haussler, Richard E. Green, Mark Akeson, Adam Phillippy, Karen H Miga, Paolo Carnevali, Miten Jain, Benedict Paten

Present workflows for producing human genome assemblies from long-read technologies have cost and production time bottlenecks that prohibit efficient scaling to large cohorts. We demonstrate an optimized PromethION nanopore sequencing method for eleven human genomes. The sequencing, performed on one machine in nine days, achieved an average 63x coverage, 42 Kb read N50, 90% median read identity and 6.5x coverage in 100 Kb+ reads using just three flow cells per sample. To assemble these data we introduce new computational tools: Shasta - a de novo long read assembler, and MarginPolish & HELEN - a suite of nanopore assembly polishing algorithms. On a single commercial compute node Shasta can produce a complete human genome assembly in under six hours, and MarginPolish & HELEN can polish the result in just over a day, achieving 99.9% identity (QV30) for haploid samples from nanopore reads alone. We evaluate assembly performance for diploid, haploid and trio-binned human samples in terms of accuracy, cost, and time and demonstrate improvements relative to current state-of-the-art methods in all areas. We further show that addition of proximity ligation (Hi-C) sequencing yields near chromosome-level scaffolds for all eleven genomes.

24: ORANGE: A CRISPR/Cas9-based genome editing toolbox for epitope tagging of endogenous proteins in neurons
Posted to bioRxiv 19 Jul 2019

ORANGE: A CRISPR/Cas9-based genome editing toolbox for epitope tagging of endogenous proteins in neurons
1,907 downloads neuroscience

Jelmer Willems, Arthur P.H. de Jong, Nicky Scheefhals, Harold D MacGillavry

The correct subcellular distribution of protein complexes establishes the complex morphology of neurons and is fundamental to their functioning. Thus, determining the dynamic distribution of proteins is essential to understand neuronal processes. Fluorescence imaging, in particular super-resolution microscopy, has become invaluable to investigate subcellular protein distribution. However, these approaches suffer from the limited ability to efficiently and reliably label endogenous proteins. We developed ORANGE: an Open Resource for the Application of Neuronal Genome Editing, that mediates targeted genomic integration of fluorescent tags in neurons. This toolbox includes a knock-in library for in-depth investigation of endogenous protein distribution, and a detailed protocol explaining how knock-in can be developed for novel targets. In combination with super-resolution microscopy, ORANGE revealed the dynamic nanoscale organization of endogenous neuronal signaling molecules, synaptic scaffolding proteins, and neurotransmitter receptors. Thus, ORANGE enables quantitation of expression and distribution for virtually any protein in neurons at high resolution and will significantly further our understanding of neuronal cell biology.

25: A tutorial on Gaussian process regression: Modelling, exploring, and exploiting functions
Posted to bioRxiv 19 Dec 2016

A tutorial on Gaussian process regression: Modelling, exploring, and exploiting functions
1,898 downloads animal behavior and cognition

Eric Schulz, Maarten Speekenbrink, Andreas Krause

This tutorial introduces the reader to Gaussian process regression as an expressive tool to model, actively explore and exploit unknown functions. Gaussian process regression is a powerful, non-parametric Bayesian approach towards regression problems that can be utilized in exploration and exploitation scenarios. This tutorial aims to provide an accessible introduction to these techniques. We will introduce Gaussian processes which generate distributions over functions used for Bayesian non-parametric regression, and demonstrate their use in applications and didactic examples including simple regression problems, a demonstration of kernel-encoded prior assumptions and compositions, a pure exploration scenario within an optimal design framework, and a bandit-like exploration-exploitation scenario where the goal is to recommend movies. Beyond that, we describe a situation modelling risk-averse exploration in which an additional constraint (not to sample below a certain threshold) needs to be accounted for. Lastly, we summarize recent psychological experiments utilizing Gaussian processes. Software and literature pointers are also provided.

26: In Situ Transcriptome Accessibility Sequencing (INSTA-seq)
Posted to bioRxiv 05 Aug 2019

In Situ Transcriptome Accessibility Sequencing (INSTA-seq)
1,806 downloads genomics

Daniel Furth, Victor Hatini, Je H. Lee

Subcellular RNA localization regulates spatially polarized cellular processes, but unbiased investigation of its control in vivo remains challenging. Current hybridization-based methods cannot differentiate small regulatory variants, while in situ sequencing is limited by short reads. We solved these problems using a bidirectional sequencing chemistry to efficiently image transcript-specific barcode in situ, which are then extracted and assembled into longer reads using NGS. In the Drosophila retina, genes regulating eye development and cytoskeletal organization were enriched compared to methods using extracted RNA. We therefore named our method In Situ Transcriptome Accessibility sequencing (INSTA-seq). Sequencing reads terminated near 3' UTR cis-motifs (e.g. Zip48C, stau), revealing RNA-protein interactions. Additionally, Act5C polyadenylation isoforms retaining zipcode motifs were selectively localized to the optical stalk, consistent with their biology. Our platform provides a powerful way to visualize any RNA variants or protein interactions in situ to study their regulation in animal development.

27: Unsupervised identification of the internal states that shape natural behavior
Posted to bioRxiv 03 Jul 2019

Unsupervised identification of the internal states that shape natural behavior
1,740 downloads neuroscience

Adam J. Calhoun, Jonathan W. Pillow, Mala Murthy

Internal states can shape stimulus responses and decision-making, but we lack methods to identify internal states and how they evolve over time. To address this gap, we have developed an unsupervised method to identify internal states from behavioral data, and have applied it to the study of a dynamic social interaction. During courtship, Drosophila melanogaster males pattern their songs using feedback cues from their partner. Our model uncovers three latent states underlying this behavior, and is able to predict the moment-to-moment variation in natural song patterning decisions. These distinct behavioral states correspond to different sensorimotor strategies, each of which is characterized by different mappings from feedback cues to song modes. Using the model, we show that a pair of neurons previously thought to be command neurons for song production are sufficient to drive switching between states. Our results reveal how animals compose behavior from previously unidentified internal states, a necessary step for quantitative descriptions of animal behavior that link environmental cues, internal needs, neuronal activity, and motor outputs.

28: H3K4me3 is neither instructive for, nor informed by, transcription.
Posted to bioRxiv 19 Jul 2019

H3K4me3 is neither instructive for, nor informed by, transcription.
1,682 downloads genomics

Struan C. Murray, Philipp Lorenz, Francoise Howe, Meredith Wouters, Thomas Brown, Shidong Xi, Harry Fischl, Walaa Khushaim, Joseph Regish Rayappu, Andrew Angel, Jane Mellor

H3K4me3 is a near-universal histone modification found predominantly at the 5' region of genes, with a well-documented association with gene activity. H3K4me3 has been ascribed roles as both an instructor of gene expression and also a downstream consequence of expression, yet neither has been convincingly proven on a genome-wide scale. Here we test these relationships using a combination of bioinformatics, modelling and experimental data from budding yeast in which the levels of H3K4me3 have been massively ablated. We find that loss of H3K4me3 has no effect on the levels of nascent transcription or transcript in the population. Moreover, we observe no change in the rates of transcription initiation, elongation, mRNA export or turnover, or in protein levels, or cell-to-cell variation of mRNA. Loss of H3K4me3 also has no effect on the large changes in gene expression patterns that follow galactose induction. Conversely, loss of RNA polymerase from the nucleus has no effect on the pattern of H3K4me3 deposition and little effect on its levels, despite much larger changes to other chromatin features. Furthermore, large genome-wide changes in transcription, both in response to environmental stress and during metabolic cycling, are not accompanied by corresponding changes in H3K4me3. Thus, despite the correlation between H3K4me3 and gene activity, neither appear to be necessary to maintain levels of the other, nor to influence their changes in response to environmental stimuli. When we compare gene classes with very different levels of H3K4me3 but highly similar transcription levels we find that H3K4me3-marked genes are those whose expression is unresponsive to environmental changes, and that their histones are less acetylated and dynamically turned-over. Constitutive genes are generally well-expressed, which may alone explain the correlation between H3K4me3 and gene expression, while the biological role of H3K4me3 may have more to do with this distinction in gene class.

29: A single cell framework for multi-omic analysis of disease identifies malignant regulatory signatures in mixed phenotype acute leukemia
Posted to bioRxiv 09 Jul 2019

A single cell framework for multi-omic analysis of disease identifies malignant regulatory signatures in mixed phenotype acute leukemia
1,667 downloads genomics

Jeffrey M. Granja, Sandy Klemm, Lisa M McGinnis, Arwa S Kathiria, Anja Mezger, Benjamin Parks, Eric Gars, Michaela Liedtke, Grace X.Y. Zheng, Howard Y. Chang, Ravindra Majeti, William J. Greenleaf

In order to identify the molecular determinants of human diseases, such as cancer, that arise from a diverse range of tissue, it is necessary to accurately distinguish normal and pathogenic cellular programs. Here we present a novel approach for single-cell multi-omic deconvolution of healthy and pathological molecular signatures within phenotypically heterogeneous malignant cells. By first creating immunophenotypic, transcriptomic and epigenetic single-cell maps of hematopoietic development from healthy peripheral blood and bone marrow mononuclear cells, we identify cancer-specific transcriptional and chromatin signatures from single cells in a cohort of mixed phenotype acute leukemia (MPAL) clinical samples. MPALs are a high-risk subtype of acute leukemia characterized by a heterogeneous malignant cell population expressing both myeloid and lymphoid lineage-specific markers. Our results reveal widespread heterogeneity in the pathogenetic gene regulatory and expression programs across patients, yet relatively consistent changes within patients even across malignant cells occupying diverse portions of the hematopoietic lineage. An integrative analysis of transcriptomic and epigenetic maps identifies 91,601 putative gene-regulatory interactions and classifies a number of transcription factors that regulate leukemia specific genes, including RUNX1-linked regulatory elements proximal to CD69. This work provides a template for integrative, multi-omic analysis for the interpretation of pathogenic molecular signatures in the context of developmental origin.

30: Using DeepLabCut for 3D markerless pose estimation across species and behaviors
Posted to bioRxiv 24 Nov 2018

Using DeepLabCut for 3D markerless pose estimation across species and behaviors
1,603 downloads neuroscience

Tanmay Nath, Alexander Mathis, An Chi Chen, Amir Patel, Matthias Bethge, Mackenzie W. Mathis

Noninvasive behavioral tracking of animals during experiments is crucial to many scientific pursuits. Extracting the poses of animals without using markers is often essential for measuring behavioral effects in biomechanics, genetics, ethology & neuroscience. Yet, extracting detailed poses without markers in dynamically changing backgrounds has been challenging. We recently introduced an open source toolbox called DeepLabCut that builds on a state-of-the-art human pose estimation algorithm to allow a user to train a deep neural network using limited training data to precisely track user-defined features that matches human labeling accuracy. Here, with this paper we provide an updated toolbox that is self contained within a Python package that includes new features such as graphical user interfaces and active-learning based network refinement. Lastly, we provide a step-by-step guide for using DeepLabCut.

31: Clonal replacement of tumor-specific T cells following PD-1 blockade
Posted to bioRxiv 24 May 2019

Clonal replacement of tumor-specific T cells following PD-1 blockade
1,559 downloads immunology

Kathryn E Yost, Ansuman T. Satpathy, Daniel K. Wells, Yanyan Qi, Chunlin Wang, Robin Kageyama, Katherine McNamara, Jeffrey M. Granja, Kavita Y. Sarin, Ryanne A. Brown, Rohit K. Gupta, Christina Curtis, Samantha L. Bucktrout, Mark M. Davis, Anne Lynn S. Chang, Howard Y. Chang

Immunotherapies that block inhibitory checkpoint receptors on T cells have transformed the clinical care of cancer patients. However, which tumor-specific T cells are mobilized following checkpoint blockade remains unclear. Here, we performed paired single-cell RNA- and T cell receptor (TCR)- sequencing on 79,046 cells from site-matched tumors from patients with basal cell carcinoma (BCC) or squamous cell carcinoma (SCC) pre- and post-anti-PD-1 therapy. Tracking TCR clones and transcriptional phenotypes revealed a coupling of tumor-recognition, clonal expansion, and T cell dysfunction: the T cell response to treatment was accompanied by clonal expansions of CD8+CD39+ T cells, which co-expressed markers of chronic T cell activation and exhaustion. However, this expansion did not derive from pre-existing tumor infiltrating T cell clones; rather, it comprised novel clonotypes, which were not previously observed in the same tumor. Clonal replacement of T cells was preferentially observed in exhausted CD8+ T cells, compared to other distinct T cell phenotypes, and was evident in BCC and SCC patients. These results, enabled by single-cell multi-omic profiling of clinical samples, demonstrate that pre-existing tumor-specific T cells may be limited in their capacity for re-invigoration, and that the T cell response to checkpoint blockade relies on the expansion of a distinct repertoire of T cell clones that may have just recently entered the tumor.

32: Linking transcriptome and chromatin accessibility in nanoliter droplets for single-cell sequencing
Posted to bioRxiv 04 Jul 2019

Linking transcriptome and chromatin accessibility in nanoliter droplets for single-cell sequencing
1,499 downloads genomics

Song Chen, Blue Lake, Kun Zhang

Linked profiling of transcriptome and chromatin accessibility from single cells can provide unprecedented insights into cellular status. Here we developed a droplet-based Single- Nucleus chromatin Accessibility and mRNA Expression sequencing (SNARE-seq) assay, that we used to profile neonatal and adult mouse cerebral cortices. To demonstrate the strength of single-cell dual-omics profiling, we reconstructed transcriptome and epigenetic landscapes of cell types, uncovered lineage-specific accessible sites, and connected dynamics of promoter accessibility with transcription during neurogenesis.

33: Massively parallel single-cell chromatin landscapes of human immune cell development and intratumoral T cell exhaustion
Posted to bioRxiv 18 Apr 2019

Massively parallel single-cell chromatin landscapes of human immune cell development and intratumoral T cell exhaustion
1,410 downloads genomics

Ansuman T. Satpathy, Jeffrey M. Granja, Kathryn E Yost, Yanyan Qi, Francesca Meschi, Geoffrey P McDermott, Brett N Olsen, Maxwell R. Mumbach, Sarah E Pierce, M. Ryan Corces, Preyas Shah, Jason C. Bell, Darisha Jhutty, Corey M Nemec, Jean Wang, Li Wang, Yifeng Yin, Paul G Giresi, Anne Lynn S. Chang, Grace X Y Zheng, William J. Greenleaf, Howard Y. Chang

Understanding complex tissues requires single-cell deconstruction of gene regulation with precision and scale. Here we present a massively parallel droplet-based platform for mapping transposase-accessible chromatin in tens of thousands of single cells per sample (scATAC-seq). We obtain and analyze chromatin profiles of over 200,000 single cells in two primary human systems. In blood, scATAC-seq allows marker-free identification of cell type-specific cis- and trans-regulatory elements, mapping of disease-associated enhancer activity, and reconstruction of trajectories of differentiation from progenitors to diverse and rare immune cell types. In basal cell carcinoma, scATAC-seq reveals regulatory landscapes of malignant, stromal, and immune cell types in the tumor microenvironment. Moreover, scATAC-seq of serial tumor biopsies before and after PD-1 blockade allows identification of chromatin regulators and differentiation trajectories of therapy-responsive intratumoral T cell subsets, revealing a shared regulatory program driving CD8+ T cell exhaustion and CD4+ T follicular helper cell development. We anticipate that droplet-based single-cell chromatin accessibility will provide a broadly applicable means of identifying regulatory factors and elements that underlie cell type and function.

34: Insights into human genetic variation and population history from 929 diverse genomes
Posted to bioRxiv 27 Jun 2019

Insights into human genetic variation and population history from 929 diverse genomes
1,405 downloads genomics

Anders Bergström, Shane A McCarthy, Ruoyun Hui, Mohamed A. Almarri, Qasim Ayub, Petr Danecek, Yuan Chen, Sabine Felkel, Pille Hallast, Jack Kamm, Hélène Blanché, Jean-François Deleuze, Howard Cann, Swapan Mallick, David Reich, Manjinder S Sandhu, Pontus Skoglund, Aylwyn Scally, Yali Xue, Richard Durbin, Chris Tyler-Smith

Genome sequences from diverse human groups are needed to understand the structure of genetic variation in our species and the history of, and relationships between, different populations. We present 929 high-coverage genome sequences from 54 diverse human populations, 26 of which are physically phased using linked-read sequencing. Analyses of these genomes reveal an excess of previously undocumented private genetic variation in southern and central Africa and in Oceania and the Americas, but an absence of fixed, private variants between major geographical regions. We also find deep and gradual population separations within Africa, contrasting population size histories between hunter-gatherer and agriculturalist groups in the last 10,000 years, a potentially major population growth episode after the peopling of the Americas, and a contrast between single Neanderthal but multiple Denisovan source populations contributing to present-day human populations. We also demonstrate benefits to the study of population relationships of genome sequences over ascertained array genotypes. These genome sequences are freely available as a resource with no access or analysis restrictions.

35: Sex Chromosome Dosage Effects On Gene Expression In Humans
Posted to bioRxiv 14 May 2017

Sex Chromosome Dosage Effects On Gene Expression In Humans
1,404 downloads genomics

Armin Raznahan, Neelroop Parikshak, Vijayendran Chandran, Jonathan Blumenthal, Liv Clasen, Aaron Alexander-Bloch, Andrew Zinn, Danny Wangsa, Jasen Wise, Declan Murphy, Patrick Bolton, Thomas Ried, Judith Ross, Jay Giedd, Daniel Geschwind

A fundamental question in the biology of sex-differences has eluded direct study in humans: how does sex chromosome dosage (SCD) shape genome function? To address this, we developed a systematic map of SCD effects on gene function by analyzing genome-wide expression data in humans with diverse sex chromosome aneuploidies (XO, XXX, XXY, XYY, XXYY). For sex chromosomes, we demonstrate a pattern of obligate dosage sensitivity amongst evolutionarily preserved X-Y homologs, and update prevailing theoretical models for SCD compensation by detecting X-linked genes whose expression increases with decreasing X- and/or Y-chromosome dosage. We further show that SCD-sensitive sex chromosome genes regulate specific co-expression networks of SCD-sensitive autosomal genes with critical cellular functions and a demonstrable potential to mediate previously documented SCD effects on disease. Our findings detail wide-ranging effects of SCD on genome function with implications for human phenotypic variation.

36: The Repertoire of Mutational Signatures in Human Cancer
Posted to bioRxiv 15 May 2018

The Repertoire of Mutational Signatures in Human Cancer
1,380 downloads cancer biology

Ludmil Alexandrov, Jaegil Kim, Nicholas J Haradhvala, Mi Ni Huang, Alvin W T Ng, Yang Wu, Arnoud Boot, Kyle R Covington, Dmitry A. Gordenin, Erik Bergstrom, S. M. Ashiqul Islam, Nuria Lopez-Bigas, Leszek J. Klimczak, John R McPherson, Sandro Morganella, Radhakrishnan Sabarinathan, David A Wheeler, Ville Mustonen, PCAWG Mutational Signatures Working Group, Gad Getz, Steven G. Rozen, Michael R. Stratton

Somatic mutations in cancer genomes are caused by multiple mutational processes each of which generates a characteristic mutational signature. Using 84,729,690 somatic mutations from 4,645 whole cancer genome and 19,184 exome sequences encompassing most cancer types we characterised 49 single base substitution, 11 doublet base substitution, four clustered base substitution, and 17 small insertion and deletion mutational signatures. The substantial dataset size compared to previous analyses enabled discovery of new signatures, separation of overlapping signatures and decomposition of signatures into components that may represent associated, but distinct, DNA damage, repair and/or replication mechanisms. Estimation of the contribution of each signature to the mutational catalogues of individual cancer genomes revealed associations with exogenous and endogenous exposures and defective DNA maintenance processes. However, many signatures are of unknown cause. This analysis provides a systematic perspective on the repertoire of mutational processes contributing to the development of human cancer including a comprehensive reference set of mutational signatures in human cancer.

37: Comparative evidence for the independent evolution of hair and sweat gland traits in primates
Posted to bioRxiv 29 Sep 2018

Comparative evidence for the independent evolution of hair and sweat gland traits in primates
1,377 downloads evolutionary biology

Yana G Kamberov, Samantha M Guhan, Alessandra DeMarchis, Judy Jiang, Sara Sherwood Wright, Bruce A Morgan, Pardis C Sabeti, Clifford J. Tabin, Daniel E Lieberman

Humans differ in many respects from other primates, but perhaps no derived human feature is more striking than our naked skin. Long purported to be adaptive, our species' unique external appearance is characterized by changes in both the patterning of hair follicles and eccrine sweat glands, producing decreased hair cover and increased sweat gland density. Despite the conspicuousness of these features and their potential evolutionary importance, there is a lack of clarity regarding how they evolved within the primate lineage. We thus collected and quantified the density of hair follicles and eccrine sweat glands from five regions of the skin in three species of primates: macaque, chimpanzee and human. Although human hair cover is greatly attenuated relative to that of our close relatives, we find that humans have a chimpanzee-like hair density that is significantly lower than that of macaques. In contrast, eccrine gland density is on average 10-fold higher in humans compared to chimpanzees and macaques, whose density is strikingly similar. Our findings suggest that a decrease in hair density in the ancestors of humans and apes was followed by an increase in eccrine gland density and a reduction in fur cover in humans. This work answers long-standing questions about the traits that make human skin unique and substantiates a model in which the evolution of expanded eccrine gland density was exclusive to the human lineage.

38: Very rare pathogenic genetic variants detected by SNP-chips are usually false positives: implications for direct-to-consumer genetic testing
Posted to bioRxiv 09 Jul 2019

Very rare pathogenic genetic variants detected by SNP-chips are usually false positives: implications for direct-to-consumer genetic testing
1,366 downloads genetics

Michael N Weedon, Leigh Jackson, Jamie W Harrison, Kate S Ruth, Jessica Tyrrell, Andrew T Hattersley, Caroline F Wright

Objective: To determine the diagnostic accuracy of SNP-chips frequently used by direct-to-consumer genetic testing companies for genotyping rare genetic variants. Methods: We assessed the diagnostic accuracy of genotypes from SNP-chips (index test) with next generation sequencing data (reference test) in 49,908 individuals recruited to UK Biobank. We compared the genotyping accuracy of SNP-chip variants covered by the next generation sequencing data by allele frequency. We further used the ClinVar database to select rare pathogenic variants in the BRCA1 and BRCA2 genes as an exemplar for detailed analysis. Cancer registry data was gathered for BRCA-related cancers (breast, ovarian, prostate and pancreatic) across all participants. Results: SNP-chip genotype accuracy is high overall, but the likelihood of a true positive result reduces substantially with decreasing allele frequency. The sensitivity, specificity, positive predictive (PPV) and negative predictive value (NPV) for heterozygous genotypes are all >99% for 108,574 single nucleotide variants directly genotyped by the UK Biobank SNP-chips. However, for variants with a frequency <0.001% in UK Biobank the PPV is very low, and only 16% of 4,711 heterozygote genotypes from the SNP-chip confirm with sequencing data. For pathogenic variants in the BRCA1 and BRCA2 genes, the overall performance metrics of the SNP-chips in UK Biobank are: sensitivity 34.6%, specificity 98.3%, PPV 4.2% and NPV 99.9%. Rates of BRCA-related cancers in individuals with a positive SNP-chip result are similar to age-matched controls (OR 1.28, P=0.07, 95% CI: 0.98-1.67), while sequence-positive individuals have a significantly increased risk (OR 3.73, P=3.5x10-12, 95% CI: 2.57-5.40). Discussion: SNP-chips are extremely unreliable for genotyping very rare pathogenic variants and should not be used to guide health decisions without validation.

39: Benchmarking Single-Cell RNA Sequencing Protocols for Cell Atlas Projects
Posted to bioRxiv 13 May 2019

Benchmarking Single-Cell RNA Sequencing Protocols for Cell Atlas Projects
1,335 downloads genomics

Elisabetta Mereu, Atefeh Lafzi, Catia Moutinho, Christoph Ziegenhain, Davis J. MacCarthy, Adrian Alvarez, Eduard Batlle, Sagar, Dominic Grün, Julia K. Lau, Stéphane Boutet, Chad Sanada, Aik Ooi, Robert C. Jones, Kelly Kaihara, Chris Brampton, Yasha Talaga, Yohei Sasagawa, Kaori Tanaka, Tetsutaro Hayashi, Itoshi Nikaido, Cornelius Fischer, Sascha Sauer, Timo Trefzer, Christian Conrad, Xian Adiconis, Lan T. Nguyen, Aviv Regev, Joshua Z Levin, Aleksandar Janjic, Lucas E. Wange, Johannes W. Bagnoli, Swati Parekh, Wolfgang Enard, Marta Gut, Rickard Sandberg, Ivo G Gut, Oliver Stegle, Holger Heyn

Single-cell RNA sequencing (scRNA-seq) is the leading technique for charting the molecular properties of individual cells. The latest methods are scalable to thousands of cells, enabling in-depth characterization of sample composition without prior knowledge. However, there are important differences between scRNA-seq techniques, and it remains unclear which are the most suitable protocols for drawing cell atlases of tissues, organs and organisms. We have generated benchmark datasets to systematically evaluate techniques in terms of their power to comprehensively describe cell types and states. We performed a multi-center study comparing 13 commonly used single-cell and single-nucleus RNA-seq protocols using a highly heterogeneous reference sample resource. Comparative and integrative analysis at cell type and state level revealed marked differences in protocol performance, highlighting a series of key features for cell atlas projects. These should be considered when defining guidelines and standards for international consortia, such as the Human Cell Atlas project.

40: Distinct RhoGEFs activate apical and junctional actomyosin contractility under control of G proteins during epithelial morphogenesis
Posted to bioRxiv 04 Mar 2019

Distinct RhoGEFs activate apical and junctional actomyosin contractility under control of G proteins during epithelial morphogenesis
1,257 downloads developmental biology

Alain Garcia De Las Bayonas, Jean-Marc Philippe, Annemarie Lellouch, Thomas Lecuit

Small RhoGTPases and Myosin-II direct cell shape changes and movements during tissue morphogenesis. Their activities are tightly regulated in space and time to specify the desired pattern of contractility that supports tissue morphogenesis. This is expected to stem from polarized surface stimuli and from polarized signaling processing inside cells. We examined this general problem in the context of cell intercalation that drives extension of the Drosophila ectoderm. In the ectoderm, G protein coupled receptors (GPCRs) and their downstream heterotrimeric G proteins (Gα and Gβγ) activate Rho1 both medial-apically, where it exhibits pulsed dynamics, and at junctions, where its activity is planar polarized (Kerridge et al., 2016; Munjal et al., 2015). However, the mechanisms responsible for polarizing Rho1 activity are unclear. In particular, it is unknown how Rho1 activity is controlled at junctions. We report a division of labor in the mechanisms of Rho1 activation in that distinct guanine exchange factors (GEFs), that serve as activators of Rho1, operate in these distinct cellular compartments. RhoGEF2 acts uniquely to activate medial-apical Rho1. Although RhoGEF2 is recruited both medial-apically and at junctions by Gα12/13-GTP, also called Concertina (Cta) in Drosophila, its activity is restricted to the medial-apical compartment. Furthermore, we characterize a novel RhoGEF, p114RhoGEF/Wireless (Wrl), and report its requirement for cell intercalation in the extending ectoderm. p114RhoGEF/Wireless activates Rho1 specifically at junctions. Strikingly it is restricted to adherens junctions and is under Gβ13F/Gγ1 control. Gβ13F/Gγ1 activates junctional Rho1 and exerts quantitative control over planar polarization of Rho1. In particular, overexpression of Gβ13F/Gγ1 leads to hyper planar polarization of Rho1 and MyoII. Finally, we found that p114RhoGEF/Wireless is absent in the mesoderm, arguing for a tissue-specific control over junctional Rho1 activity. These results shed light on the mechanisms of polarization of Rho1 activity in different cellular compartments and reveal that distinct GEFs are sensitive tuning parameters of cell contractility in remodeling epithelia.

