Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 83,779 bioRxiv papers from 360,790 authors.
Most downloaded bioRxiv papers, since beginning of last month
in category genomics
5,304 results found. For more information, click each entry to expand.
804 downloads genomics
Irma Karabegović, Eliana Portilla-Fernandez, Yang Li, Jiantao Ma, Silvana C.E. Maas, Daokun Sun, Emily A. Hu, Brigitte Kühnel, Yan Zhang, Srikant Ambatipudi, Giovanni Fiorito, Jian Huang, Juan E Castillo-Fernandez, Kerri L. Wiggins, Niek de Klein, Sara Grioni, Brenton R. Swenson, Silvia Polidoro, Jorien L. Treur, Cyrille Cuenin, Pei-Chien Tsai, Ricardo Costeira, Veronique Chajes, Kim Braun, Niek Verweij, Anja Kretschmer, Lude Franke, Joyce B.J. van Meurs, André G. Uitterlinden, Robert J. de Knegt, M. Arfan Ikram, Abbas Dehghan, Annette Peters, Ben Schöttker, Sina A. Gharib, Nona Sotoodehnia, Jordana T. Bell, Paul Elliott, Paolo Vineis, Caroline Relton, Zdenko Herceg, Hermann Brenner, Melanie Waldenberger, Casey M. Rebholz, Trudy Voortman, Qiuwei Pan, Myriam Fornage, Daniel Levy, Manfred Kayser, Mohsen Ghanbari
Coffee and tea are extensively consumed beverages worldwide. Observational studies have shown contradictory findings for the association between consumption of these beverages and different health outcomes. Epigenetics is suggested as a mechanism mediating the effects of dietary and lifestyle factors on disease onset. We conducted epigenome-wide association studies (EWAS) on coffee and tea consumptions in 15,789 participants of European and African-American ancestries from 15 cohorts. EWAS meta-analysis revealed 11 CpG sites significantly associated with coffee consumption (P-value <1.1*10-7), nine of them annotated to the genes AHRR, F2RL3, FLJ43663, HDAC4, GFI1 and PHGDH, and two CpGs suggestively associated with tea consumption (P-value<5.0*10-6). Among these, cg14476101 was significantly associated with expression of its annotated gene PHGDH and risk of fatty liver disease. Knockdown of PHGDH expression in liver cells showed a correlation with expression levels of lipid-associated genes, suggesting a role of PHGDH in hepatic-lipid metabolism. Collectively, this study indicates that coffee consumption is associated with differential DNA methylation levels at multiple CpGs, and that coffee-associated epigenetic variations may explain the mechanism of action of coffee consumption in conferring disease risk. ### Competing Interest Statement The authors have declared no competing interest.
797 downloads genomics
The entry of SARS-CoV-2 into host cells is dependent upon angiotensin-converting enzyme 2 (ACE2), which serves as a functional attachment receptor for the viral spike glycoprotein, and the serine protease TMPRSS2 which allows fusion of the viral and host cell membranes. We devised a quantitative measure to estimate genetic determinants of ACE2 and TMPRSS2 expression and applied this measure to >2,500 individuals. Our data show significant variability in genetic determinants of ACE2 and TMPRSS2 expression among individuals and between populations, and demonstrate a genetic predisposition for lower expression levels of both key viral entry genes in African populations. These data suggest that genetic factors might lead to lower susceptibility for SARS-CoV-2 infection in African populations and that host genetics might help explain inter-individual variability in disease susceptibility and severity of COVID-19. ### Competing Interest Statement
793 downloads genomics
Large-scale sequencing of RNAs from individual cells can reveal patterns of gene, isoform and allelic expression across cell types and states. However, current single-cell RNA-sequencing (scRNA-seq) methods have limited ability to count RNAs at allele- and isoform resolution, and long-read sequencing techniques lack the depth required for large-scale applications across cells. Here, we introduce Smart-seq3 that combines full-length transcriptome coverage with a 5' unique molecular identifier (UMI) RNA counting strategy that enabled in silico reconstruction of thousands of RNA molecules per cell. Importantly, a large portion of counted and reconstructed RNA molecules could be directly assigned to specific isoforms and allelic origin, and we identified significant transcript isoform regulation in mouse strains and human cell types. Moreover, Smart-seq3 showed a dramatic increase in sensitivity and typically detected thousands more genes per cell than Smart-seq2. Altogether, we developed a short-read sequencing strategy for single-cell RNA counting at isoform and allele-resolution applicable to large-scale characterization of cell types and states across tissues and organisms.
786 downloads genomics
qpAdm is a statistical tool for studying the ancestry of populations with histories that involve admixture between two or more source populations. Using qpAdm, it is possible to identify plausible models of admixture that fit the population history of a group of interest and to calculate the relative proportion of ancestry that can be ascribed to each source population in the model. Although qpAdm is widely used in studies of population history of human (and non-human) groups, relatively little has been done to assess its performance. We performed a simulation study to assess the behavior of qpAdm under various scenarios in order to identify areas of potential weakness and establish recommended best practices for use. We find that qpAdm is a robust tool that yields accurate results in many cases, including when data coverage is low, there are high rates of missing data or ancient DNA damage, or when diploid calls cannot be made. However, we caution against co-analyzing ancient and present-day data, the inclusion of an extremely large number of reference populations in a single model, and analyzing population histories involving extended periods of gene flow. We provide a user guide suggesting best practices for the use of qpAdm. ### Competing Interest Statement The authors have declared no competing interest.
784 downloads genomics
Daniel Taliun, Daniel N. Harris, Michael D Kessler, Jedidiah Carlson, Zachary A. Szpiech, Raul Torres, Sarah A. Gagliano Taliun, André Corvelo, Stephanie M Gogarten, Hyun Min Kang, Achilleas N Pitsillides, Jonathon LeFaive, Seung-been Lee, Xiaowen Tian, Brian L. Browning, Sayantan Das, Anne-Katrin Emde, Wayne E. Clarke, Douglas P. Loesch, Amol C. Shetty, Thomas W Blackwell, Quenna Wong, François Aguet, Christine Albert, Alvaro Alonso, Kristin G. Ardlie, Stella Aslibekyan, Paul L. Auer, John Barnard, R. Graham Barr, Lewis C. Becker, Rebecca L Beer, Emelia J. Benjamin, Lawrence F Bielak, John Blangero, Michael Boehnke, Donald W Bowden, Jennifer A Brody, Esteban G. Burchard, Brian E. Cade, James F. Casella, Brandon Chalazan, Yii-Der Ida Chen, Michael H Cho, Seung Hoan Choi, Mina K. Chung, Clary B. Clish, Adolfo Correa, Joanne E. Curran, Brian Custer, Dawood Darbar, Michelle Daya, Mariza de Andrade, Dawn L DeMeo, Susan K Dutcher, Patrick T. Ellinor, Leslie S Emery, Diane Fatkin, Lukas Forer, Myriam Fornage, Nora Franceschini, Christian Fuchsberger, Stephanie M Fullerton, Soren Germer, Mark T Gladwin, Daniel J Gottlieb, Xiuqing Guo, Michael E Hall, Jiang He, Nancy L. Heard-Costa, Susan R. Heckbert, Marguerite R Irvin, Jill M Johnsen, Andrew D. Johnson, Sharon LR Kardia, Tanika Kelly, Shannon Kelly, Eimear E Kenny, Douglas P. Kiel, Robert Klemmer, Barbara A Konkle, Charles Kooperberg, Anna Köttgen, Leslie A Lange, Jessica Lasky-Su, Daniel Levy, Xihong Lin, Keng-Han Lin, Chunyu Liu, Ruth J.F. Loos, Lori Garman, Robert Gerszten, Steven A. Lubitz, Kathryn L. Lunetta, Angel C.Y. Mak, Ani Manichaikul, Alisa K Manning, Rasika A. Mathias, David D McManus, Stephen T McGarvey, James B. Meigs, Deborah A Meyers, Julie L Mikulla, Mollie A Minear, Braxton Mitchell, Sanghamitra Mohanty, May E. Montasser, Courtney Montgomery, Alanna C. Morrison, Joanne M Murabito, Andrea Natale, Pradeep Natarajan, Sarah C. Nelson, Kari E. North, Jeffrey R. O’Connell, Nicholette D Palmer, Nathan Pankratz, Gina M Peloso, Patricia A. Peyser, Wendy S. Post, Bruce M. Psaty, DC Rao, Susan Redline, Alexander P. Reiner, Dan Roden, Jerome I. Rotter, Ingo Ruczinski, Chloé Sarnowski, Sebastian Schoenherr, Jeong-Sun Seo, Sudha Seshadri, Vivien A Sheehan, M. Benjamin Shoemaker, Albert V Smith, Nicholas L. Smith, Jennifer A. Smith, Nona Sotoodehnia, Adrienne M. Stilp, Weihong Tang, Kent D Taylor, Marilyn Telen, Timothy A. Thornton, Russell P. Tracy, David J. Van Den Berg, Ramachandran S Vasan, Karine A Viaud-Martinez, Scott Vrieze, Daniel E Weeks, Bruce S. Weir, Scott T Weiss, Lu-Chen Weng, Cristen J. Willer, Yingze Zhang, Xutong Zhao, Donna K. Arnett, Allison E Ashley-Koch, Kathleen C Barnes, Eric Boerwinkle, Stacey Gabriel, Richard Gibbs, Kenneth M Rice, Stephen S. Rich, Edwin Silverman, Pankaj Qasba, Weiniu Gan, Trans-Omics for Precision Medicine (TOPMed) Program, TOPMed Population Genetics Working Group, George J Papanicolaou, Deborah A. Nickerson, Sharon R. Browning, Michael C. Zody, Sebastian Zöllner, James G Wilson, L. Adrienne Cupples, Cathy C Laurie, Cashell E Jaquish, Ryan D Hernandez, Timothy D. O’Connor, Goncalo Abecasis
The Trans-Omics for Precision Medicine (TOPMed) program seeks to elucidate the genetic architecture and disease biology of heart, lung, blood, and sleep disorders, with the ultimate goal of improving diagnosis, treatment, and prevention. The initial phases of the program focus on whole genome sequencing of individuals with rich phenotypic data and diverse backgrounds. Here, we describe TOPMed goals and design as well as resources and early insights from the sequence data. The resources include a variant browser, a genotype imputation panel, and sharing of genomic and phenotypic data via dbGaP. In 53,581 TOPMed samples, >400 million single-nucleotide and insertion/deletion variants were detected by alignment with the reference genome. Additional novel variants are detectable through assembly of unmapped reads and customized analysis in highly variable loci. Among the >400 million variants detected, 97% have frequency <1% and 46% are singletons. These rare variants provide insights into mutational processes and recent human evolutionary history. The nearly complete catalog of genetic variation in TOPMed studies provides unique opportunities for exploring the contributions of rare and non-coding sequence variants to phenotypic variation. Furthermore, combining TOPMed haplotypes with modern imputation methods improves the power and extends the reach of nearly all genome-wide association studies to include variants down to ~0.01% in frequency.
762 downloads genomics
The recent pandemic of coronavirus disease 2019 (COVID-19) is caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). COVID-19 was first reported in China (December 2019) and now prevalent in ~170 countries across the globe. Entry of SARS-CoV-2 into mammalian cells require the binding of viral Spike (S) proteins to the ACE2 (angiotensin converting enzyme 2) receptor. Once entered the S protein is primed by a specialised serine protease, TMPRSS2 (Transmembrane Serine Protease 2) in the host cell. Importantly, beside respiratory symptoms, consistent with other common respiratory virus infection when patients become viraemic, a significant number of COVID-19 patients also develop liver comorbidities. We explored if specific target cell-type in the mammalian liver, could be implicated in disease pathophysiology other than the general deleterious response to cytokine storms. Here we employed single-cell RNA-seq (scRNA-seq) to survey the human liver and identified potentially implicated liver cell-type for viral ingress. We report the co-expression of ACE2 and TMPRSS2 in a TROP2+ liver progenitor population. Importantly, we fail to detect the expression of ACE2 in hepatocyte or any other liver (immune and stromal) cell types. These results indicated that in COVID-19 associated liver dysfunction and cell death, viral infection of TROP2+ progenitors in liver may significantly impaired liver regeneration and could lead to pathology.
755 downloads genomics
Cristopher V. Van Hout, Ioanna Tachmazidou, Joshua D Backman, Joshua X Hoffman, Bin Ye, Ashutosh K Pandey, Claudia Gonzaga-Jauregui, Shareef Khalid, Daren Liu, Nilanjana Banerjee, Alexander H Li, O’Dushlaine Colm, Anthony Marcketta, Jeffrey Staples, Claudia Schurmann, Alicia Hawes, Evan Maxwell, Leland Barnard, Alexander Lopez, John Penn, Lukas Habegger, Andrew L Blumenfeld, Ashish Yadav, Kavita Praveen, Marcus Jones, William J Salerno, Wendy K. Chung, Ida Surakka, Cristen J. Willer, Kristian Hveem, Joseph B Leader, David J Carey, David H Ledbetter, Geisinger-Regeneron DiscovEHR Collaboration, Lon Cardon, George D Yancopoulos, Aris Economides, Giovanni Coppola, Alan R. Shuldiner, Suganthi Balasubramanian, Michael Cantor, Matthew R. Nelson, John Whittaker, Jeffrey G Reid, Jonathan Marchini, John D Overton, Robert A Scott, Gonçalo Abecasis, Laura Yerges-Armstrong, Aris Baras, on behalf of the Regeneron Genetics Center
The UK Biobank is a prospective study of 502,543 individuals, combining extensive phenotypic and genotypic data with streamlined access for researchers around the world. Here we describe the first tranche of large-scale exome sequence data for 49,960 study participants, revealing approximately 4 million coding variants (of which ~98.4% have frequency < 1%). The data includes 231,631 predicted loss of function variants, a >10-fold increase compared to imputed sequence for the same participants. Nearly all genes (>97%) had ≥1 predicted loss of function carrier, and most genes (>69%) had ≥10 loss of function carriers. We illustrate the power of characterizing loss of function variation in this large population through association analyses across 1,741 phenotypes. In addition to replicating a range of established associations, we discover novel loss of function variants with large effects on disease traits, including PIEZO1 on varicose veins, COL6A1 on corneal resistance, MEPE on bone density, and IQGAP2 and GMPR on blood cell traits. We further demonstrate the value of exome sequencing by surveying the prevalence of pathogenic variants of clinical significance in this population, finding that 2% of the population has a medically actionable variant. Additionally, we leverage the phenotypic data to characterize the relationship between rare BRCA1 and BRCA2 pathogenic variants and cancer risk. Exomes from the first 49,960 participants are now made accessible to the scientific community and highlight the promise offered by genomic sequencing in large-scale population-based studies.
755 downloads genomics
The respiratory tract constitutes an elaborated line of defense based on a unique cellular ecosystem. Single-cell profiling methods enable the investigation of cell population distributions and transcriptional changes along the airways. We have explored cellular heterogeneity of the human airway epithelium in 10 healthy living volunteers by single-cell RNA profiling. 77,969 cells were collected by bronchoscopy at 35 distinct locations, from the nose to the 12th division of the airway tree. The resulting atlas is composed of a high percentage of epithelial cells (89.1%), but also immune (6.2%) and stromal (4.7%) cells with peculiar cellular proportions in different sites of the airways. It reveals differential gene expression between identical cell types (suprabasal, secretory, and multiciliated cells) from the nose (MUC4, PI3, SIX3) and tracheobronchial (SCGB1A1, TFF3) airways. By contrast, cell-type specific gene expression was stable across all tracheobronchial samples. Our atlas improves the description of ionocytes, pulmonary neuroendocrine (PNEC) and brush cells, which are likely derived from a common population of precursor cells. We also report a population of KRT13 positive cells with a high percentage of dividing cells which are reminiscent of "hillock" cells previously described in mouse. Robust characterization of this unprecedented large single-cell cohort establishes an important resource for future investigations. The precise description of the continuum existing from nasal epithelium to successive divisions of lung airways and the stable gene expression profile of these regions better defines conditions under which relevant tracheobronchial proxies of human respiratory diseases can be developed.
754 downloads genomics
The effect of the rapid accumulation of non-synonymous mutations on the pathogenesis of SARS-CoV-2 is not yet known. To predict the impact of non-synonymous mutations and polyproline regions identified in ORF3a on the formation of B-cell epitopes and their role in evading the immune response, nucleotide and protein sequences of 537 available SARS-CoV-2 genomes were analyzed for the presence of non-synonymous mutations and polyproline regions. Mutations were correlated with changes in epitope formation. A total of 19 different non-synonymous amino acids substitutions were detected in ORF3a among 537 SARS-CoV-2 strains. G251V was the most common and identified in 9.9% (n=53) of the strains and was predicted to lead to the loss of a B-cell like epitope in ORF3a. Polyproline regions were detected in two strains (EPI\_ISL\_410486, France and EPI\_ISL\_407079, Finland) and affected epitopes formation. The accumulation of non-synonymous mutations and detected polyproline regions in ORF3a of SARS-CoV-2 could be driving the evasion of the host immune response thus favoring viral spread. Rapid mutations accumulating in ORF3a should be closely monitored throughout the COVID-19 pandemic.
751 downloads genomics
Over 10,000 viral genome sequences of the SARS-CoV-2 virus have been made readily available during the ongoing coronavirus pandemic since the initial genome sequence of the virus was released on the open access Virological website (http://virological.org/) early on January 11. We utilize the published data on the single stranded RNAs of 11,132 SARS-CoV-2 patients in the GISAID database, which contains fully or partially sequenced SARS-CoV-2 samples from laboratories around the world. Among many important research questions which are currently being investigated, one aspect pertains to the genetic characterization/classification of the virus. Here, we analyze data on the nucleotide sequencing of the virus and geographic information of a subset of 2,540 SARS-CoV-2 patients without missing entries that are available in the GISAID database. We apply principal component analysis to a similarity matrix that compares all pairs of the 2,540 SARS-CoV-2 nucleotide sequences at all loci simultaneously, using the Jaccard index. Our analysis results of the SARS-CoV-2 genome data illustrates the geographic progression of the virus, starting from the first cases that were observed in China to the current wave of cases in Europe and North America. We also observe that, based on their sequence data, the SARS-CoV-2 viruses cluster in distinct genetic subgroups. It is the subject of ongoing research to examine whether the genetic subgroup could be related to diseases outcome and its potential implications for vaccine development. ### Competing Interest Statement The authors have declared no competing interest.
747 downloads genomics
The recent pandemic of SARS-CoV-2 infection has affected more than 3.0 million people worldwide with more than 200 thousand reported deaths. The SARS-CoV-2 genome has a capability of gaining rapid mutations as the virus spreads. Whole genome sequencing data offers a wide range of opportunities to study the mutation dynamics. The advantage of increasing amount of whole genome sequence data of SARS-CoV-2 intrigued us to explore the mutation profile across the genome, to check the genome diversity and to investigate the implications of those mutations in protein stability and viral transmission. Four proteins, surface glycoprotein, nucleocapsid, ORF1ab and ORF8 showed frequent mutations, while envelop, membrane, ORF6 and ORF7a proteins showed conservation in terms of amino acid substitutions. Some of the mutations across different proteins showed co-occurrence, suggesting their functional cooperation in stability, transmission and adaptability. Combined analysis with the frequently mutated residues identified 20 viral variants, among which 12 specific combinations comprised more than 97% of the isolates considered for the analysis. Analysis of protein structure stability of surface glycoprotein mutants indicated viability of specific variants and are more prone to be temporally and spatially distributed across the globe. Similar empirical analysis of other proteins indicated existence of important functional implications of several variants. Analysis of co-occurred mutants indicated their structural and/or functional interaction among different SARS-COV-2 proteins. Identification of frequently mutated variants among COVID-19 patients might be useful for better clinical management on contact tracing and containment of the disease. ### Competing Interest Statement The authors have declared no competing interest.
702 downloads genomics
The global COVID-19 pandemic has led to an urgent need for scalable methods for clinical diagnostics and viral tracking. Next generation sequencing technologies have enabled large-scale genomic surveillance of SARS-CoV-2 as thousands of isolates are being sequenced around the world and deposited in public data repositories. A number of methods using both short- and long-read technologies are currently being applied for SARS-CoV-2 sequencing, including amplicon approaches, metagenomic methods, and sequence capture or enrichment methods. Given the small genome size, the ability to sequence SARS-CoV-2 at scale is limited by the cost and labor associated with making sequencing libraries. Here we describe a low-cost, streamlined, all amplicon-based method for sequencing SARS-CoV-2, which bypasses costly and time-consuming library preparation steps. We benchmark this tailed amplicon method against both the ARTIC amplicon protocol and sequence capture approaches and show that an optimized tailed amplicon approach achieves comparable amplicon balance, coverage metrics, and variant calls to the ARTIC v3 approach and represents a cost-effective and highly scalable method for SARS-CoV-2 sequencing. ### Competing Interest Statement The authors have declared no competing interest.
690 downloads genomics
Polygenic risk scores (PRS) use the results of genome-wide association studies (GWAS) to predict quantitative phenotypes or disease risk at an individual level. This provides a potential route to the use of genetic data in personalized medical care. However, a major barrier to the use of PRS is that the majority of GWAS come from cohorts of European ancestry. The predictive power of PRS constructed from these studies is substantially lower in non-European ancestry cohorts, although the reasons for this are unclear. To address this question, we investigate the performance of PRS for height in cohorts with admixed African and European ancestry, allowing us to evaluate ancestry-related differences in PRS predictive accuracy while controlling for environment and cohort differences. We first show that that the predictive accuracy of height PRS increases linearly with European ancestry and is largely explained by European ancestry segments of the admixed genomes. We show that differences in allele frequencies, recombination rate, and marginal effect sizes across ancestries all contribute to the decrease in predictive power, but none of these effects explain the decrease on its own. Finally, we demonstrate that prediction for admixed individuals can be improved by using a linear combination of PRS that includes ancestry-specific effect sizes, although this approach is at present limited by the small size of non-European ancestry discovery cohorts. ### Competing Interest Statement The authors have declared no competing interest.
689 downloads genomics
The development of DNA-barcoded antibodies to tag cell-surface molecules has enabled the use of droplet-based single cell sequencing (dsc-seq) to profile the surface proteomes of cells. Compared to flow and mass cytometry, the major limitation of current dsc-seq-based workflows is the high cost associated with profiling each cell, thus precluding its use in applications where millions of cells are required. Here, we introduce SCITO-seq, a new workflow that combines combinatorial indexing and commercially available dsc-seq to enable cost-effective cell surface proteomic sequencing of greater than 105 cells per microfluidic reaction. We demonstrate SCITO-seq's feasibility and scalability by profiling mixed species cell lines and mixed human T and B lymphocytes. To further demonstrate its applicability, we show comparable cellular composition estimates in peripheral blood mononuclear cells obtained with SCITO-seq and mass cytometry. SCITO-seq can be extended to include simultaneous profiling of additional modalities such as transcripts and accessible chromatin or tracking of experimental perturbations such as genome edits or extracellular stimuli.
682 downloads genomics
Here, we present CellOracle, a computational tool that integrates single-cell transcriptome and epigenome profiles to infer gene regulatory networks (GRNs), critical regulators of cell identity. Leveraging inferred GRNs, we simulate gene expression changes in response to transcription factor (TF) perturbation, enabling network configurations to be interrogated in silico, facilitating their interpretation. We validate the efficacy of CellOracle to recapitulate known regulatory changes across hematopoiesis, correctly predicting the outcomes of well-characterized TF perturbations. Integrating CellOracle analysis with lineage tracing of direct reprogramming reveals distinct network configurations underlying different reprogramming failure modes. Furthermore, analysis of GRN reconfiguration along successful reprogramming trajectories identifies new factors to enhance target cell yield, uncovering a role for the AP-1 subunit Fos, with the hippo signaling effector, Yap1. Together, these results demonstrate the efficacy of CellOracle to infer and interpret cell-type-specific GRN configurations, at high-resolution, promoting new mechanistic insights into the regulation and reprogramming of cell identity. CellOracle code and documentation are available at https://github.com/morris-lab/CellOracle. ### Competing Interest Statement The authors have declared no competing interest.
680 downloads genomics
Elisabetta Mereu, Atefeh Lafzi, Catia Moutinho, Christoph Ziegenhain, Davis J. MacCarthy, Adrian Alvarez, Eduard Batlle, Sagar, Dominic Grün, Julia K. Lau, Stéphane C Boutet, Chad Sanada, Aik Ooi, Robert C. Jones, Kelly Kaihara, Chris Brampton, Yasha Talaga, Yohei Sasagawa, Kaori Tanaka, Tetsutaro Hayashi, Itoshi Nikaido, Cornelius Fischer, Sascha Sauer, Timo Trefzer, Christian Conrad, Xian Adiconis, Lan T. Nguyen, Aviv Regev, Joshua Z. Levin, Swati Parekh, Aleksandar Janjic, Lucas E. Wange, Johannes W. Bagnoli, Wolfgang Enard, Ivo G. Gut, Rickard Sandberg, Ivo Gut, Oliver Stegle, Holger Heyn
Single-cell RNA sequencing (scRNA-seq) is the leading technique for charting the molecular properties of individual cells. The latest methods are scalable to thousands of cells, enabling in-depth characterization of sample composition without prior knowledge. However, there are important differences between scRNA-seq techniques, and it remains unclear which are the most suitable protocols for drawing cell atlases of tissues, organs and organisms. We have generated benchmark datasets to systematically evaluate techniques in terms of their power to comprehensively describe cell types and states. We performed a multi-center study comparing 13 commonly used single-cell and single-nucleus RNA-seq protocols using a highly heterogeneous reference sample resource. Comparative and integrative analysis at cell type and state level revealed marked differences in protocol performance, highlighting a series of key features for cell atlas projects. These should be considered when defining guidelines and standards for international consortia, such as the Human Cell Atlas project.
662 downloads genomics
Nathan R. Tucker, Mark Chaffin, Stephen J. Fleming, Amelia W. Hall, Victoria A Parsons, Kenneth Bedi, Amer-Denis Akkad, Caroline N Herndon, Alessandro Arduini, Irinna Papangeli, Carolina Roselli, François Aguet, Seung Hoan Choi, Kristin G. Ardlie, Mehrtash Babadi, Kenneth B. Margulies, Christian M Stegmann, Patrick T. Ellinor
Introduction: The human heart requires a complex ensemble of specialized cell types to perform its essential function. A greater knowledge of the intricate cellular milieu of the heart is critical to increase our understanding of cardiac homeostasis and pathology. As recent advances in low input RNA-sequencing have allowed definitions of cellular transcriptomes at single cell resolution at scale, here we have applied these approaches to assess the cellular and transcriptional diversity of the non-failing human heart. Methods: Microfluidic encapsulation and barcoding was used to perform single nuclear RNA sequencing with samples from seven human donors, selected for their absence of overt cardiac disease. Individual nuclear transcriptomes were then clustered based upon transcriptional profiles of highly variable genes. These clusters were used as the basis for between-chamber and between-sex differential gene expression analyses and intersection with genetic and pharmacologic data. Results: We sequenced the transcriptomes of 287,269 single cardiac nuclei, revealing a total of 9 major cell types and 20 subclusters of cell types within the human heart. Cellular subclasses include two distinct groups of resident macrophages, four endothelial subtypes, and two fibroblasts subsets. Comparisons of cellular transcriptomes by cardiac chamber or sex reveal diversity not only in cardiomyocyte transcriptional programs, but also in subtypes involved in extracellular matrix remodeling and vascularization. Using genetic association data, we identified strong enrichment for the role of cell subtypes in cardiac traits and diseases. Finally, intersection of our dataset with genes on cardiac clinical testing panels and the druggable genome reveals striking patterns of cellular specificity. Conclusions: Using large-scale single nuclei RNA sequencing, we have defined the transcriptional and cellular diversity in the normal human heart. Our identification of discrete cell subtypes and differentially expressed genes within the heart will ultimately facilitate the development of new therapeutics for cardiovascular diseases.
658 downloads genomics
Spatial transcriptomics seeks to integrate single-cell transcriptomic data within the 3-dimensional space of multicellular biology. Current methods use glass substrates pre-seeded with matrices of barcodes or fluorescence hybridization of a limited number of probes. We developed an alternative approach, called ZipSeq, that uses patterned illumination and photocaged oligonucleotides to serially print barcodes (Zipcodes) onto live cells within intact tissues, in real-time and with on-the-fly selection of patterns. Using ZipSeq, we mapped gene expression in three settings: in-vitro wound healing, live lymph node sections and in a live tumor microenvironment (TME). In all cases, we discovered new gene expression patterns associated with histological structures. In the TME, this demonstrated a trajectory of myeloid and T cell differentiation, from periphery inward. A variation of ZipSeq efficiently scales to the level of single cells, providing a pathway for complete mapping of live tissues, subsequent to real-time imaging or perturbation.
649 downloads genomics
How genes with novel cellular functions evolve is a central biological question. Exon shuffling is one mechanism to assemble new protein architectures. Here we show that DNA transposons, which are mobile and pervasive in genomes, have provided a recurrent supply of exons and splice sites to assemble protein-coding genes in vertebrates via exon-shuffling. We find that transposase domains have been captured, primarily via alternative splicing, to form new fusion proteins at least 94 times independently over ~350 million years of tetrapod evolution. Evolution favors fusion of transposase DNA-binding domains to host regulatory domains, especially the Krüppel-associated Box (KRAB), suggesting transposase capture frequently yields new transcriptional repressors. We show that four independently evolved KRAB-transposase fusion proteins repress gene expression in a sequence-specific fashion. Genetic knockout and rescue of the bat-specific KRABINER fusion gene in cells demonstrates that it binds its cognate transposons genome-wide and controls a vast network of genes and cis-regulatory elements. These results illustrate a powerful mechanism by which a transcription factor and its dispersed binding sites emerge at once from a transposon family. ### Competing Interest Statement The authors have declared no competing interest.
621 downloads genomics
Single-cell RNA sequencing is a powerful tool to study developmental biology but does not preserve spatial information about cellular interactions and tissue morphology. Here, we combined single-cell and spatial transcriptomics with new algorithms for data integration to study the early development of the chicken heart. We collected data from four key ventricular development stages, ranging from the early chamber formation stage to the late four-chambered stage. We created an atlas of the diverse cellular lineages in developing hearts, their spatial organization, and their interactions during development. Spatial mapping of differentiation transitions revealed the intricate interplay between cellular differentiation and morphogenesis in cardiac cellular lineages. Using spatially resolved expression analysis, we identified anatomically restricted gene expression programs. Last, we discovered a stage-dependent role for the small secreted peptide, thymosin beta-4, in the coordination of multi-lineage cellular populations. Overall, our study identifies key stage-specific regulatory programs that govern cardiac development. ### Competing Interest Statement The authors have declared no competing interest.
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!