1: The Genomic Formation of South and Central Asia
Posted to bioRxiv 31 Mar 2018

The Genomic Formation of South and Central Asia
3,997 downloads genomics

Vagheesh M Narasimhan, Nick Patterson, Priya Moorjani, Iosif Lazaridis, Mark Lipson, Swapan Mallick, Nadin Rohland, Rebecca Bernardos, Alexander M Kim, Nathan Nakatsuka, Iñigo Olalde, Alfredo Coppa, James Mallory, Vyacheslav Moiseyev, Janet Monge, Luca M Olivieri, Nicole Adamski, Nasreen Broomandkhoshbacht, Francesca Candilio, Olivia Cheronet, Brendan J Culleton, Matthew Ferry, Daniel Fernandes, Beatriz Gamarra, Daniel Gaudio, Mateja Hajdinjak, Éadaoin Harney, Thomas K Harper, Denise Keating, Ann Marie Lawson, Megan Michel, Mario Novak, Jonas Oppenheimer, Niraj Rai, Kendra Sirak, Viviane Slon, Kristin Stewardson, Zhao Zhang, Gaziz Akhatov, Anatoly N Bagashev, Bauryzhan Baitanayev, Gian Luca Bonora, Tatiana Chikisheva, Anatoly Derevianko, Enshin Dmitry, Katerina Douka, Nadezhda Dubova, Andrey Epimakhov, Suzanne Freilich, Dorian Fuller, Alexander Goryachev, Andrey Gromov, Bryan Hanks, Margaret Judd, Erlan Kazizov, Aleksander Khokhlov, Egor Kitov, Elena Kupriyanova, Pavel Kuznetsov, Donata Luiselli, Farhod Maksudov, Christopher Meiklejohn, Deborah Merrett, Roberto Micheli, Oleg Mochalov, Zahir Muhammed, Samariddin Mustafokulov, Ayushi Nayak, Rykun M Petrovna, Davide Pettener, Richard Potts, Dmitry Razhev, Stefania Sarno, Kulyan Sikhymbaeva, Sergey M Slepchenko, Nadezhda Stepanova, Svetlana Svyatko, Sergey Vasilyev, Massimo Vidale, Dmitriy Voyakin, Antonina Yermolayeva, Alisa Zubova, Vasant S Shinde, Carles Lalueza-Fox, Matthias Meyer, David Anthony, Nicole Boivin, Kumarasamy Thangaraj, Douglas J. Kennett, Michael Frachetti, Ron Pinhasi, David Reich

The genetic formation of Central and South Asian populations has been unclear because of an absence of ancient DNA. To address this gap, we generated genome-wide data from 362 ancient individuals, including the first from eastern Iran, Turan (Uzbekistan, Turkmenistan, and Tajikistan), Bronze Age Kazakhstan, and South Asia. Our data reveal a complex set of genetic sources that ultimately combined to form the ancestry of South Asians today. We document a southward spread of genetic ancestry from the Eurasian Steppe, correlating with the archaeologically known expansion of pastoralist sites from the Steppe to Turan in the Middle Bronze Age (2300-1500 BCE). These Steppe communities mixed genetically with peoples of the Bactria Margiana Archaeological Complex (BMAC) whom they encountered in Turan (primarily descendants of earlier agriculturalists of Iran), but there is no evidence that the main BMAC population contributed genetically to later South Asians. Instead, Steppe communities integrated farther south throughout the 2nd millennium BCE, and we show that they mixed with a more southern population that we document at multiple sites as outlier individuals exhibiting a distinctive mixture of ancestry related to Iranian agriculturalists and South Asian hunter-gathers. We call this group Indus Periphery because they were found at sites in cultural contact with the Indus Valley Civilization (IVC) and along its northern fringe, and also because they were genetically similar to post-IVC groups in the Swat Valley of Pakistan. By co-analyzing ancient DNA and genomic data from diverse present-day South Asians, we show that Indus Periphery-related people are the single most important source of ancestry in South Asia — consistent with the idea that the Indus Periphery individuals are providing us with the first direct look at the ancestry of peoples of the IVC — and we develop a model for the formation of present-day South Asians in terms of the temporally and geographically proximate sources of Indus Periphery-related, Steppe, and local South Asian hunter-gatherer-related ancestry. Our results show how ancestry from the Steppe genetically linked Europe and South Asia in the Bronze Age, and identifies the populations that almost certainly were responsible for spreading Indo-European languages across much of Eurasia.

2: A guide to performing Polygenic Risk Score analyses
Posted to bioRxiv 14 Sep 2018

A guide to performing Polygenic Risk Score analyses
1,942 downloads genomics

Shing Wan Choi, Timothy Mak, Paul F O'Reilly

The application of polygenic risk scores (PRS) has become routine in genetic epidemiological studies. Among a range of applications, PRS are commonly used to assess shared aetiology among different phenotypes and to evaluate the predictive power of genetic data, while they are also now being exploited as part of study design, in which experiments are performed on individuals, or their biological samples (eg. tissues, cells), at the tails of the PRS distribution and contrasted. As GWAS sample sizes increase and PRS become more powerful, they are also set to play a key role in personalised medicine. Despite their growing application and importance, there are limited guidelines for performing PRS analyses, which can lead to inconsistency between studies and misinterpretation of results. Here we provide detailed guidelines for performing polygenic risk score analyses relevant to different methods for their calculation, outlining standard quality control steps and offering recommendations for best-practice. We also discuss different methods for the calculation of PRS, common misconceptions regarding the interpretation of results and future challenges.

3: Comprehensive integration of single cell data
Posted to bioRxiv 02 Nov 2018

Comprehensive integration of single cell data
1,931 downloads genomics

Tim Stuart, Andrew Butler, Paul Hoffman, Christoph Hafemeister, Efthymia Papalexi, William M. Mauck, Marlon Stoeckius, Peter Smibert, Rahul Satija

Single cell transcriptomics (scRNA-seq) has transformed our ability to discover and annotate cell types and states, but deep biological understanding requires more than a taxonomic listing of clusters. As new methods arise to measure distinct cellular modalities, including high-dimensional immunophenotypes, chromatin accessibility, and spatial positioning, a key analytical challenge is to integrate these datasets into a harmonized atlas that can be used to better understand cellular identity and function. Here, we develop a computational strategy to "anchor" diverse datasets together, enabling us to integrate and compare single cell measurements not only across scRNA-seq technologies, but different modalities as well. After demonstrating substantial improvement over existing methods for data integration, we anchor scRNA-seq experiments with scATAC-seq datasets to explore chromatin differences in closely related interneuron subsets, and project single cell protein measurements onto a human bone marrow atlas to annotate and characterize lymphocyte populations. Lastly, we demonstrate how anchoring can harmonize in-situ gene expression and scRNA-seq datasets, allowing for the transcriptome-wide imputation of spatial gene expression patterns, and the identification of spatial relationships between mapped cell types in the visual cortex. Our work presents a strategy for comprehensive integration of single cell data, including the assembly of harmonized references, and the transfer of information across datasets. Availability: Installation instructions, documentation, and tutorials are available at: https://www.satijalab.org/seurat

4: Sex Chromosome Dosage Effects On Gene Expression In Humans
Posted to bioRxiv 14 May 2017

Sex Chromosome Dosage Effects On Gene Expression In Humans
1,893 downloads genomics

Armin Raznahan, Neelroop Parikshak, Vijayendran Chandran, Jonathan Blumenthal, Liv Clasen, Aaron Alexander-Bloch, Andrew Zinn, Danny Wangsa, Jasen Wise, Declan Murphy, Patrick Bolton, Thomas Ried, Judith Ross, Jay Giedd, Daniel Geschwind

A fundamental question in the biology of sex-differences has eluded direct study in humans: how does sex chromosome dosage (SCD) shape genome function? To address this, we developed a systematic map of SCD effects on gene function by analyzing genome-wide expression data in humans with diverse sex chromosome aneuploidies (XO, XXX, XXY, XYY, XXYY). For sex chromosomes, we demonstrate a pattern of obligate dosage sensitivity amongst evolutionarily preserved X-Y homologs, and update prevailing theoretical models for SCD compensation by detecting X-linked genes whose expression increases with decreasing X- and/or Y-chromosome dosage. We further show that SCD-sensitive sex chromosome genes regulate specific co-expression networks of SCD-sensitive autosomal genes with critical cellular functions and a demonstrable potential to mediate previously documented SCD effects on disease. Our findings detail wide-ranging effects of SCD on genome function with implications for human phenotypic variation.

5: Quantifying the tradeoff between sequencing depth and cell number in single-cell RNA-seq
Posted to bioRxiv 09 Sep 2019

Quantifying the tradeoff between sequencing depth and cell number in single-cell RNA-seq
1,875 downloads genomics

Valentine Svensson, Eduardo da Veiga Beltrame, Lior Pachter

The allocation of a sequencing budget when designing single cell RNA-seq experiments requires consideration of the tradeoff between number of cells sequenced and the read depth per cell. One approach to the problem is to perform a power analysis for a univariate objective such as differential expression. However, many of the goals of single-cell analysis requires consideration of the multivariate structure of gene expression, such as clustering. We introduce an approach to quantifying the impact of sequencing depth and cell number on the estimation of a multivariate generative model for gene expression that is based on error analysis in the framework of a variational autoencoder. We find that at shallow depths, the marginal benefit of deeper sequencing per cell significantly outweighs the benefit of increased cell numbers. Above about 15,000 reads per cell the benefit of increased sequencing depth is minor. Code for the workflow reproducing the results of the paper is available at https://github.com/pachterlab/SBP_2019/.

6: Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression
Posted to bioRxiv 14 Mar 2019

Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression
1,701 downloads genomics

Christoph Hafemeister, Rahul Satija

Single-cell RNA-seq (scRNA-seq) data exhibits significant cell-to-cell variation due to technical factors, including the number of molecules detected in each cell, which can confound biological heterogeneity with technical effects. To address this, we present a modeling framework for the normalization and variance stabilization of molecular count data from scRNA-seq experiments. We propose that the Pearson residuals from 'regularized negative binomial regression', where cellular sequencing depth is utilized as a covariate in a generalized linear model, successfully remove the influence of technical characteristics from downstream analyses while preserving biological heterogeneity. Importantly, we show that an unconstrained negative binomial model may overfit scRNA-seq data, and overcome this by pooling information across genes with similar abundances to obtain stable parameter estimates. Our procedure omits the need for heuristic steps including pseudocount addition or log-transformation, and improves common downstream analytical tasks such as variable gene selection, dimensional reduction, and differential expression. Our approach can be applied to any UMI-based scRNA-seq dataset and is freely available as part of the R package sctransform (https://github.com/ChristophH/sctransform), with a direct interface to our single-cell toolkit Seurat.

7: A molecular cell atlas of the human lung from single cell RNA sequencing
Posted to bioRxiv 27 Aug 2019

A molecular cell atlas of the human lung from single cell RNA sequencing
1,565 downloads genomics

Kyle J Travaglini, Ahmad N Nabhan, Lolita Penland, Rahul Sinha, Astrid Gillich, Rene V Sit, Stephen Chang, Stephanie D Conley, Yasuo Mori, Jun Seita, Gerald J. Berry, Joseph B Shrager, Ross J Metzger, Christin S Kuo, Norma Neff, Irving L Weissman, Stephen R. Quake, Mark A Krasnow

Although single cell RNA sequencing studies have begun providing compendia of cell expression profiles, it has proven more difficult to systematically identify and localize all molecular cell types in individual organs to create a full molecular cell atlas. Here we describe droplet- and plate-based single cell RNA sequencing applied to ~70,000 human lung and blood cells, combined with a multi-pronged cell annotation approach, which have allowed us to define the gene expression profiles and anatomical locations of 58 cell populations in the human lung, including 41 of 45 previously known cell types or subtypes and 14 new ones. This comprehensive molecular atlas elucidates the biochemical functions of lung cell types and the cell-selective transcription factors and optimal markers for making and monitoring them; defines the cell targets of circulating hormones and predicts local signaling interactions including sources and targets of chemokines in immune cell trafficking and expression changes on lung homing; and identifies the cell types directly affected by lung disease genes. Comparison to mouse identified 17 molecular types that appear to have been gained or lost during lung evolution and others whose expression profiles have been substantially altered, revealing extensive plasticity of cell types and cell-type-specific gene expression during organ evolution including expression switches between cell types. This lung atlas provides the molecular foundation for investigating how lung cell identities, functions, and interactions are achieved in development and tissue engineering and altered in disease and evolution.

8: A single-cell and single-nucleus RNA-seq toolbox for fresh and frozen human tumors
Posted to bioRxiv 12 Sep 2019

A single-cell and single-nucleus RNA-seq toolbox for fresh and frozen human tumors
1,511 downloads genomics

Michal Slyper, Caroline B. M. Porter, Orr Ashenberg, Julia Waldman, Eugene Drokhlyansky, Isaac Wakiro, Christopher Smillie, Gabriela Smith-Rosario, Jingyi Wu, Danielle Dionne, Sébastien Vigneau, Judit Jané-Valbuena, Sara Napolitano, Mei-Ju Su, Anand G. Patel, Asa Karlstrom, Simon Gritsch, Masashi Nomura, Avinash Waghray, Satyen H. Gohil, Alexander M. Tsankov, Livnat Jerby-Arnon, Ofir Cohen, Johanna Klughammer, Yanay Rosen, Joshua Gould, Bo Li, Lan Nguyen, Catherine J Wu, Benjamin Izar, Rizwan Haq, F. Stephen Hodi, Charles H. Yoon, Aaron N. Hata, Suzanne J. Baker, Mario L. Suvà, Raphael Bueno, Elizabeth H. Stover, Ursula A. Matulonis, Michael R. Clay, Micheal A. Dyer, Natalie B. Collins, Nikhil Wagle, Asaf Rotem, Bruce E. Johnson, Orit Rozenblatt-Rosen, Aviv Regev

Single cell genomics is essential to chart the complex tumor ecosystem. While single cell RNA-Seq (scRNA-Seq) profiles RNA from cells dissociated from fresh tumor tissues, single nucleus RNA-Seq (snRNA-Seq) is needed to profile frozen or hard-to-dissociate tumors. Each strategy requires modifications to fit the unique characteristics of different tissue and tumor types, posing a barrier to adoption. Here, we developed a systematic toolbox for profiling fresh and frozen clinical tumor samples using scRNA-Seq and snRNA-Seq, respectively. We tested eight tumor types of varying tissue and sample characteristics (resection, biopsy, ascites, and orthotopic patient-derived xenograft): lung cancer, metastatic breast cancer, ovarian cancer, melanoma, neuroblastoma, pediatric sarcoma, glioblastoma, pediatric high-grade glioma, and chronic lymphocytic leukemia. Analyzing 212,498 cells and nuclei from 39 clinical samples, we evaluated protocols by cell quality, recovery rate, and cellular composition. We optimized protocols for fresh tissue dissociation for different tumor types using a decision tree to account for the technical and biological variation between clinical samples. We established methods for nucleus isolation from OCT embedded and fresh-frozen tissues, with an optimization matrix varying mechanical force, buffer, and detergent. scRNA-Seq and snRNA-Seq from matched samples recovered the same cell types and intrinsic expression profiles, but at different proportions. Our work provides direct guidance across a broad range of tumors, including criteria for testing and selecting methods from the toolbox for other tumors, thus paving the way for charting tumor atlases.

9: Variation across 141,456 human exomes and genomes reveals the spectrum of loss-of-function intolerance across human protein-coding genes
Posted to bioRxiv 28 Jan 2019

Variation across 141,456 human exomes and genomes reveals the spectrum of loss-of-function intolerance across human protein-coding genes
1,384 downloads genomics

Konrad Karczewski, Laurent C Francioli, Grace Tiao, Beryl B Cummings, Jessica Alföldi, Qingbo Wang, Ryan L Collins, Kristen M Laricchia, Andrea Ganna, Daniel P. Birnbaum, Laura D Gauthier, Harrison Brand, Matthew Solomonson, Nicholas A Watts, Daniel Rhodes, Moriel Singer-Berk, Eleina M England, Eleanor G Seaby, Jack A. Kosmicki, Raymond K Walters, Katherine Tashman, Yossi Farjoun, Eric Banks, Timothy Poterba, Arcturus Wang, Cotton Seed, Nicola Whiffin, Jessica X Chong, Kaitlin E. Samocha, Emma Pierce-Hoffman, Zachary Zappala, Anne H. O’Donnell-Luria, Eric Vallabh Minikel, Ben Weisburd, Monkol Lek, James S Ware, Christopher Vittal, Irina M Armean, Louis Bergelson, Kristian Cibulskis, Kristen M Connolly, Miguel Covarrubias, Stacey Donnelly, Steven Ferriera, Stacey Gabriel, Jeff Gentry, Namrata Gupta, Thibault Jeandet, Diane Kaplan, Christopher Llanwarne, Ruchi Munshi, Sam Novod, Nikelle Petrillo, David Roazen, Valentin Ruano-Rubio, Andrea Saltzman, Molly Schleicher, Jose Soto, Kathleen Tibbetts, Charlotte Tolonen, Gordon Wade, Michael E. Talkowski, The Genome Aggregation Database Consortium, Benjamin M Neale, Mark J. Daly, Daniel G. MacArthur

Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes critical for an organism’s function will be depleted for such variants in natural populations, while non-essential genes will tolerate their accumulation. However, predicted loss-of-function (pLoF) variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes. Here, we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence pLoF variants in this cohort after filtering for sequencing and annotation artifacts. Using an improved human mutation rate model, we classify human protein-coding genes along a spectrum representing tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve gene discovery power for both common and rare diseases.

10: The enteric nervous system of the human and mouse colon at a single-cell resolution
Posted to bioRxiv 28 Aug 2019

The enteric nervous system of the human and mouse colon at a single-cell resolution
1,312 downloads genomics

Eugene Drokhlyansky, Christopher S. Smillie, Nicholas Van Wittenberghe, Maria Ericsson, Gabriel K. Griffin, Danielle Dionne, Michael S Cuoco, Max N. Goder-Reiser, Tatyana Sharova, Andrew J. Aguirre, Genevieve M. Boland, Daniel Graham, Orit Rozenblatt-Rosen, Ramnik J. Xavier, Aviv Regev

As the largest branch of the autonomic nervous system, the enteric nervous system (ENS) controls the entire gastrointestinal tract, but remains incompletely characterized. Here, we develop RAISIN RNA-seq, which enables the capture of intact single nuclei along with ribosome-bound mRNA, and use it to profile the adult mouse and human colon to generate a reference map of the ENS at a single-cell resolution. This map reveals an extraordinary diversity of neuron subsets across intestinal locations, ages, and circadian phases, with conserved transcriptional programs that are shared between human and mouse. These data suggest possible revisions to the current model of peristalsis and molecular mechanisms that may allow enteric neurons to orchestrate tissue homeostasis, including immune regulation and stem cell maintenance. Human enteric neurons specifically express risk genes for neuropathic, inflammatory, and extra-intestinal diseases with concomitant gut dysmotility. Our study therefore provides a roadmap to understanding the ENS in health and disease.

11: Hackflex: low cost Illumina sequencing library construction for high sample counts
Posted to bioRxiv 23 Sep 2019

Hackflex: low cost Illumina sequencing library construction for high sample counts
1,247 downloads genomics

Daniela Gaio, Joyce To, Michael Liu, Leigh Monahan, Kay Anantanawat, Aaron E Darling

We developed Hackflex, a low-cost method for the production of Illumina-compatible sequencing libraries that allows up to 11 times more libraries for high-throughput Illumina sequencing to be generated at a fixed cost. We call this new method Hackflex. Quality of library preparation was tested by constructing libraries from E. coli MG1655 genomic DNA using either Hackflex, standard Nextera Flex or a variation of standard Nextera Flex in which the bead-linked transposase is diluted prior to use. We demonstrated that Hackflex can produce high quality libraries and yields a highly uniform coverage, equivalent to the standard Nextera Flex kit. Using Hackflex, we were able to achieve a per sample reagent cost of library prep of A$8.66, which is 8.23 times lower than the Standard Nextera Flex protocol at advertised retail price. An additional simple modification to the protocol enables a further price reduction of up to 11 fold or about A$6.50/sample. This method will allow researchers to construct more libraries within a given budget, thereby yielding more data and facilitating research programs where sequencing large numbers of libraries is beneficial.

12: Single cell ATAC-seq in human pancreatic islets and deep learning upscaling of rare cells reveals cell-specific type 2 diabetes regulatory signatures
Posted to bioRxiv 07 Sep 2019

Single cell ATAC-seq in human pancreatic islets and deep learning upscaling of rare cells reveals cell-specific type 2 diabetes regulatory signatures
1,073 downloads genomics

Vivek Rai, Daniel X. Quang, Michael R. Erdos, Darren A. Cusanovich, Riza M. Daza, Narisu Narisu, Luli S. Zou, John P. Didion, Yuanfang Guan, Jay Shendure, Stephen C.J. Parker, Francis S. Collins

Objective Type 2 diabetes (T2D) is a complex disease characterized by pancreatic islet dysfunction, insulin resistance, and disruption of blood glucose levels. Genome wide association studies (GWAS) have identified >400 independent signals that encode genetic predisposition. More than 90% of the associated single nucleotide polymorphisms (SNPs) localize to non-coding regions and are enriched in chromatin-defined islet enhancer elements, indicating a strong transcriptional regulatory component to disease susceptibility. Pancreatic islets are a mixture of cell types that express distinct hormonal programs, and so each cell type may contribute differentially to the underlying regulatory processes that modulate T2D-associated transcriptional circuits. Existing chromatin profiling methods such as ATAC-seq and DNase-seq, applied to islets in bulk, produce aggregate profiles that mask important cellular and regulatory heterogeneity. Methods We present genome-wide single cell chromatin accessibility profiles in >1,600 cells derived from a human pancreatic islet sample using single-cell-combinatorial-indexing ATAC-seq (sci-ATAC-seq). We also developed a deep learning model based on the U-Net architecture to accurately predict open chromatin peak calls in rare cell populations. Results We show that sci-ATAC-seq profiles allow us to deconvolve alpha, beta, and delta cell populations and identify cell-type-specific regulatory signatures underlying T2D. Particularly, we find that T2D GWAS SNPs are significantly enriched in beta cell-specific and cross cell-type shared islet open chromatin, but not in alpha or delta cell-specific open chromatin. We also demonstrate, using less abundant delta cells, that deep-learning models can improve signal recovery and feature reconstruction of rarer cell populations. Finally, we use co-accessibility measures to nominate the cell-specific target genes at 104 non-coding T2D GWAS signals. Conclusions Collectively, we identify the islet cell type of action across genetic signals of T2D predisposition and provide higher-resolution mechanistic insights into genetically encoded risk pathways.

13: Single-cell RNA-sequencing reveals profibrotic roles of distinct epithelial and mesenchymal lineages in pulmonary fibrosis
Posted to bioRxiv 06 Sep 2019

Single-cell RNA-sequencing reveals profibrotic roles of distinct epithelial and mesenchymal lineages in pulmonary fibrosis
1,016 downloads genomics

Arun C Habermann, Austin J Gutierrez, Linh T Bui, Stephanie L Yahn, Nichelle I Winters, Carla L Calvi, Lance M Peter, Mei-I Chung, Chase J Taylor, Christopher Jetter, Latha Raju, Jamie Roberson, Guixiao Ding, Lori Wood, Jennifer MS Sucre, Bradley W Richmond, Ana P Serezani, Wyatt J McDonnell, Simon B Mallal, Matthew J Bacchetta, James E Loyd, Ciara M Shaver, Lorraine B. Ware, Ross Bremner, Rajat Walia, Timothy S Blackwell, Nicholas E Banovich, Jonathan A Kropski

Pulmonary fibrosis is a form of chronic lung disease characterized by pathologic epithelial remodeling and accumulation of extracellular matrix. In order to comprehensively define the cell types, mechanisms and mediators driving fibrotic remodeling in lungs with pulmonary fibrosis, we performed single-cell RNA-sequencing of single-cell suspensions from 10 non-fibrotic control and 20 PF lungs. Analysis of 114,396 cells identified 31 distinct cell types. We report a remarkable shift in epithelial cell phenotypes occurs in the peripheral lung in PF, and identify several previously unrecognized epithelial cell phenotypes including a KRT5-/KRT17+, pathologic ECM-producing epithelial cell population that was highly enriched in PF lungs. Multiple fibroblast subtypes were observed to contribute to ECM expansion in a spatially-discrete manner. Together these data provide high-resolution insights into the complexity and plasticity of the distal lung epithelium in human disease, and indicate a diversity of epithelial and mesenchymal cells contribute to pathologic lung fibrosis.

14: Template plasmid integration in germline genome-edited cattle.
Posted to bioRxiv 28 Jul 2019

Template plasmid integration in germline genome-edited cattle.
1,012 downloads genomics

Alexis Norris, Stella S. Lee, Kevin J. Greenlees, Daniel A. Tadesse, Mayumi F Miller, Heather Lombardi

We analyzed publicly available whole genome sequencing data from cattle which were germline genome-edited to introduce polledness. Our analysis discovered the unintended heterozygous integration of the plasmid and a second copy of the repair template sequence, at the target site. Our finding underscores the importance of employing screening methods suited to reliably detect the unintended integration of plasmids and multiple template copies.

15: Cardelino: Integrating whole exomes and single-cell transcriptomes to reveal phenotypic impact of somatic variants
Posted to bioRxiv 10 Sep 2018

Cardelino: Integrating whole exomes and single-cell transcriptomes to reveal phenotypic impact of somatic variants
1,009 downloads genomics

Davis McCarthy, Raghd Rostom, Yuanhua Huang, Daniel J Kunz, Petr Danecek, Marc Jan Bonder, Tzachi Hagai, HipSci Consortium, Wenyi Wang, Daniel J Gaffney, Benjamin D Simons, Oliver Stegle, Sarah A Teichmann

Decoding the clonal substructures of somatic tissues sheds light on cell growth, development and differentiation in health, ageing and disease. DNA-sequencing, either using bulk or using single-cell assays, has enabled the reconstruction of clonal trees from frequency and co-occurrence patterns of somatic variants. However, approaches to systematically characterize phenotypic and functional variations between individual clones are not established. Here we present cardelino (https://github.com/PMBio/cardelino), a computational method for inferring the clone of origin of individual cells that have been assayed using single-cell RNA-seq (scRNA-seq). After validating our model using simulations, we apply cardelino to matched scRNA-seq and exome sequencing data from 32 human dermal fibroblast lines, identifying hundreds of differentially expressed genes between cells from different somatic clones. These genes are frequently enriched for cell cycle and proliferation pathways, indicating a key role for cell division genes in non-neutral somatic evolution.

16: Systematic comparative analysis of single cell RNA-sequencing methods
Posted to bioRxiv 09 May 2019

Systematic comparative analysis of single cell RNA-sequencing methods
977 downloads genomics

Jiarui Ding, Xian Adiconis, Sean K Simmons, Monika S. Kowalczyk, Cynthia C. Hession, Nemanja D. Marjanovic, Travis K Hughes, Marc H Wadsworth, Tyler Burks, Lan T. Nguyen, John Y. H. Kwon, Boaz Barak, William Ge, Amanda J. Kedaigle, Shaina Carroll, Shuqiang Li, Nir Hacohen, Orit Rozenblatt-Rosen, Alex K Shalek, Alexandra-Chloé Villani, Aviv Regev, Joshua Z Levin

A multitude of single-cell RNA sequencing methods have been developed in recent years, with dramatic advances in scale and power, and enabling major discoveries and large scale cell mapping efforts. However, these methods have not been systematically and comprehensively benchmarked. Here, we directly compare seven methods for single cell and/or single nucleus profiling from three types of samples -- cell lines, peripheral blood mononuclear cells and brain tissue -- generating 36 libraries in six separate experiments in a single center. To analyze these datasets, we developed and applied scumi, a flexible computational pipeline that can be used for any scRNA-seq method. We evaluated the methods for both basic performance and for their ability to recover known biological information in the samples. Our study will help guide experiments with the methods in this study as well as serve as a benchmark for future studies and for computational algorithm development.

17: Benchmarking Single-Cell RNA Sequencing Protocols for Cell Atlas Projects
Posted to bioRxiv 13 May 2019

Benchmarking Single-Cell RNA Sequencing Protocols for Cell Atlas Projects
897 downloads genomics

Elisabetta Mereu, Atefeh Lafzi, Catia Moutinho, Christoph Ziegenhain, Davis J. MacCarthy, Adrian Alvarez, Eduard Batlle, Sagar, Dominic Grün, Julia K. Lau, Stéphane C Boutet, Chad Sanada, Aik Ooi, Robert C. Jones, Kelly Kaihara, Chris Brampton, Yasha Talaga, Yohei Sasagawa, Kaori Tanaka, Tetsutaro Hayashi, Itoshi Nikaido, Cornelius Fischer, Sascha Sauer, Timo Trefzer, Christian Conrad, Xian Adiconis, Lan T. Nguyen, Aviv Regev, Joshua Z Levin, Swati Parekh, Aleksandar Janjic, Lucas E. Wange, Johannes W. Bagnoli, Wolfgang Enard, Ivo G Gut, Rickard Sandberg, Ivo Gut, Oliver Stegle, Holger Heyn

Single-cell RNA sequencing (scRNA-seq) is the leading technique for charting the molecular properties of individual cells. The latest methods are scalable to thousands of cells, enabling in-depth characterization of sample composition without prior knowledge. However, there are important differences between scRNA-seq techniques, and it remains unclear which are the most suitable protocols for drawing cell atlases of tissues, organs and organisms. We have generated benchmark datasets to systematically evaluate techniques in terms of their power to comprehensively describe cell types and states. We performed a multi-center study comparing 13 commonly used single-cell and single-nucleus RNA-seq protocols using a highly heterogeneous reference sample resource. Comparative and integrative analysis at cell type and state level revealed marked differences in protocol performance, highlighting a series of key features for cell atlas projects. These should be considered when defining guidelines and standards for international consortia, such as the Human Cell Atlas project.

18: Massively parallel single-cell chromatin landscapes of human immune cell development and intratumoral T cell exhaustion
Posted to bioRxiv 18 Apr 2019

Massively parallel single-cell chromatin landscapes of human immune cell development and intratumoral T cell exhaustion
884 downloads genomics

Ansuman T. Satpathy, Jeffrey M. Granja, Kathryn E Yost, Yanyan Qi, Francesca Meschi, Geoffrey P McDermott, Brett N Olsen, Maxwell R. Mumbach, Sarah E Pierce, M. Ryan Corces, Preyas Shah, Jason C. Bell, Darisha Jhutty, Corey M Nemec, Jean Wang, Li Wang, Yifeng Yin, Paul G Giresi, Anne Lynn S. Chang, Grace X.Y. Zheng, William J. Greenleaf, Howard Y. Chang

Understanding complex tissues requires single-cell deconstruction of gene regulation with precision and scale. Here we present a massively parallel droplet-based platform for mapping transposase-accessible chromatin in tens of thousands of single cells per sample (scATAC-seq). We obtain and analyze chromatin profiles of over 200,000 single cells in two primary human systems. In blood, scATAC-seq allows marker-free identification of cell type-specific cis- and trans-regulatory elements, mapping of disease-associated enhancer activity, and reconstruction of trajectories of differentiation from progenitors to diverse and rare immune cell types. In basal cell carcinoma, scATAC-seq reveals regulatory landscapes of malignant, stromal, and immune cell types in the tumor microenvironment. Moreover, scATAC-seq of serial tumor biopsies before and after PD-1 blockade allows identification of chromatin regulators and differentiation trajectories of therapy-responsive intratumoral T cell subsets, revealing a shared regulatory program driving CD8+ T cell exhaustion and CD4+ T follicular helper cell development. We anticipate that droplet-based single-cell chromatin accessibility will provide a broadly applicable means of identifying regulatory factors and elements that underlie cell type and function.

19: Architectural RNA is required for heterochromatin organization
Posted to bioRxiv 27 Sep 2019

Architectural RNA is required for heterochromatin organization
823 downloads genomics

Jitendra Thakur, He Fang, Trizia Llagas, Christine M. Disteche, Steven Henikoff

In addition to its known roles in protein synthesis and enzyme catalysis, RNA has been proposed to stabilize higher-order chromatin structure. To distinguish presumed architectural roles of RNA from other functions, we applied a ribonuclease digestion strategy to our CUT&RUN in situ chromatin profiling method (CUT&RUN.RNase). We find that depletion of RNA compromises association of the murine nucleolar protein Nucleophosmin with pericentric heterochromatin and alters the chromatin environment of CCCTC-binding factor (CTCF) bound regions. Strikingly, we find that RNA maintains the integrity of both constitutive (H3K9me3 marked) and facultative (H3K27me3 marked) heterochromatic regions as compact domains, but only moderately stabilizes euchromatin. To establish the specificity of heterochromatin stabilization by RNA, we performed CUT&RUN on cells deleted for the Firre long non-coding RNA and observed disruption of H3K27me3 domains on several chromosomes. We conclude that RNA maintains local and global chromatin organization by acting as a structural scaffold for heterochromatic domains.

20: Ancient DNA reconstructs the genetic legacies of pre-contact Puerto Rico communities
Posted to bioRxiv 12 Sep 2019

Ancient DNA reconstructs the genetic legacies of pre-contact Puerto Rico communities
771 downloads genomics

Maria A. Nieves-Colón, William J. Pestle, Austin W Reynolds, Bastien Llamas, Constanza de la Fuente, Kathleen Fowler, Katherine Skerry, Edwin Crespo-Torres, Carlos D Bustamante, Anne C. Stone

Indigenous peoples have occupied the island of Puerto Rico since at least 3000 B.C. Due to the demographic shifts that occurred after European contact, the origin(s) of these ancient populations, and their genetic relationship to present-day islanders, are unclear. We use ancient DNA to characterize the population history and genetic legacies of pre-contact Indigenous communities from Puerto Rico. Bone, tooth and dental calculus samples were collected from 124 individuals from three pre-contact archaeological sites: Tibes, Punta Candelero and Paso del Indio. Despite poor DNA preservation, we used target enrichment and high-throughput sequencing to obtain complete mitochondrial genomes (mtDNA) from 45 individuals and autosomal genotypes from two individuals. We found a high proportion of Native American mtDNA haplogroups A2 and C1 in the pre-contact Puerto Rico sample (40% and 44%, respectively). This distribution, as well as the haplotypes represented, support a primarily Amazonian South American origin for these populations, and mirrors the Native American mtDNA diversity patterns found in present-day islanders. Three mtDNA haplotypes from pre-contact Puerto Rico persist among Puerto Ricans and other Caribbean islanders, indicating that present-day populations are reservoirs of pre-contact mtDNA diversity. Lastly, we find similarity in autosomal ancestry patterns between pre-contact individuals from Puerto Rico and the Bahamas, suggesting a shared component of Indigenous Caribbean ancestry with close affinity to South American populations. Our findings contribute to a more complete reconstruction of pre-contact Caribbean population history and explore the role of Indigenous peoples in shaping the biocultural diversity of present-day Puerto Ricans and other Caribbean islanders.

