Rxivist logo

Rxivist.org combines preprints from bioRxiv.org with data from Twitter to help you find the papers being discussed in your field.
Currently indexing 65,152 bioRxiv papers from 288,686 authors.

Most downloaded bioRxiv papers, since beginning of last month

Results 1 through 20 out of 4369

in category genomics


1: Single-cell RNA counting at allele- and isoform-resolution using Smart-seq3

Michael Hagemann-Jensen, Christoph Ziegenhain et al.

2,938 downloads (posted 25 Oct 2019)

Large-scale sequencing of RNAs from individual cells can reveal patterns of gene, isoform and allelic expression across cell types and states. However, current single-cell RNA-sequencing (scRNA-seq) methods have limited ability to count RNAs at allele- and isoform resolution, and long-read sequencing techniques lack the depth required for large-scale applications across cells. Here, we introduce Smart-seq3 that combines full-length transcriptome coverage with a 5' unique molecular identifier (UMI) RNA counting strategy that enabled in silico reconstruction of thousands of RNA molecules per cell. Importantly, a large portion of counted and reconstructed RNA molecules could be directly assigned to specific isoforms and allelic origin, and we identified significant transcript isoform regulation in mouse strains and human cell types. Moreover, Smart-seq3 showed a dramatic increase in sensitivity and typically detected thousands more genes per cell than Smart-seq2. Altogether, we developed a short-read sequencing strategy for single-cell RNA counting at isoform and allele-resolution applicable to large-scale characterization of cell types and states across tissues and organisms.


2: In vivo Perturb-Seq reveals neuronal and glial abnormalities associated with Autism risk genes

Xin Jin, Sean K Simmons et al.

2,023 downloads (posted 07 Oct 2019)

The thousands of disease risk genes and loci identified through human genetic studies far outstrip our current capacity to systematically study their functions. New experimental approaches are needed for functional investigations of large panels of genes in a biologically relevant context. Here, we developed a scalable genetic screen approach, in vivo Perturb-Seq, and applied this method to the functional evaluation of 35 autism spectrum disorder (ASD) de novo loss-of-function risk genes. Using CRISPR-Cas9, we introduce...


3: A guide to performing Polygenic Risk Score analyses

Shing Wan Choi, Timothy Mak et al.

1,996 downloads (posted 14 Sep 2018)

The application of polygenic risk scores (PRS) has become routine in genetic epidemiological studies. Among a range of applications, PRS are commonly used to assess shared aetiology among different phenotypes and to evaluate the predictive power of genetic data, while they are also now being exploited as part of study design, in which experiments are performed on individuals, or their biological samples (eg. tissues, cells), at the tails of the PRS distribution and contrasted. As GWAS sample sizes increase and PRS becom...


4: Sex Chromosome Dosage Effects On Gene Expression In Humans

Armin Raznahan, Neelroop Parikshak et al.

1,831 downloads (posted 14 May 2017)

A fundamental question in the biology of sex-differences has eluded direct study in humans: how does sex chromosome dosage (SCD) shape genome function? To address this, we developed a systematic map of SCD effects on gene function by analyzing genome-wide expression data in humans with diverse sex chromosome aneuploidies (XO, XXX, XXY, XYY, XXYY). For sex chromosomes, we demonstrate a pattern of obligate dosage sensitivity amongst evolutionarily preserved X-Y homologs, and update prevailing theoretical models for SCD co...


5: Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression

Christoph Hafemeister, Rahul Satija

1,715 downloads (posted 14 Mar 2019)

Single-cell RNA-seq (scRNA-seq) data exhibits significant cell-to-cell variation due to technical factors, including the number of molecules detected in each cell, which can confound biological heterogeneity with technical effects. To address this, we present a modeling framework for the normalization and variance stabilization of molecular count data from scRNA-seq experiments. We propose that the Pearson residuals from 'regularized negative binomial regression', where cellular sequencing depth is utilized as a covaria...


6: Comprehensive integration of single cell data

Tim Stuart, Andrew Butler et al.

1,580 downloads (posted 02 Nov 2018)

Single cell transcriptomics (scRNA-seq) has transformed our ability to discover and annotate cell types and states, but deep biological understanding requires more than a taxonomic listing of clusters. As new methods arise to measure distinct cellular modalities, including high-dimensional immunophenotypes, chromatin accessibility, and spatial positioning, a key analytical challenge is to integrate these datasets into a harmonized atlas that can be used to better understand cellular identity and function. Here, we devel...


7: Predicting 3D genome folding from DNA sequence

Geoffrey Fudenberg, David R Kelley et al.

1,457 downloads (posted 10 Oct 2019)

In interphase, the human genome sequence folds in three dimensions into a rich variety of locus-specific contact patterns. Here we present a deep convolutional neural network, Akita, that accurately predicts genome folding from DNA sequence alone. Representations learned by Akita underscore the importance of CTCF and reveal a complex grammar underlying genome folding. Akita enables rapid in silico predictions for sequence mutagenesis, genome folding across species, and genetic variants.


8: Variation across 141,456 human exomes and genomes reveals the spectrum of loss-of-function intolerance across human protein-coding genes

Konrad Karczewski, Laurent C Francioli et al.

1,411 downloads (posted 28 Jan 2019)

Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes critical for an organism’s function will be depleted for such variants in natural populations, while non-essential genes will tolerate their accumulation. However, predicted loss-of-function (pLoF) variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample size...


9: The Genomic Formation of South and Central Asia

Vagheesh M Narasimhan, Nick Patterson et al.

1,400 downloads (posted 31 Mar 2018)

The genetic formation of Central and South Asian populations has been unclear because of an absence of ancient DNA. To address this gap, we generated genome-wide data from 362 ancient individuals, including the first from eastern Iran, Turan (Uzbekistan, Turkmenistan, and Tajikistan), Bronze Age Kazakhstan, and South Asia. Our data reveal a complex set of genetic sources that ultimately combined to form the ancestry of South Asians today. We document a southward spread of genetic ancestry from the Eurasian Steppe, corre...


10: A portable and cost-effective microfluidic system for massively parallel single-cell transcriptome profiling

Chuanyu Liu, Tao Wu et al.

1,336 downloads (posted 25 Oct 2019)

Single-cell technologies are becoming increasingly widespread and have been revolutionizing our understanding of cell identity, state, diversity and function. However, current platforms can be slow to apply to large-scale studies and resource-limited clinical arenas due to a variety of reasons including cost, infrastructure, sample quality and requirements. Here we report DNBelab C4 (C4), a negative pressure orchestrated, portable and cost-effective device that enables high-throughput single-cell transcriptional profili...


11: High-Spatial-Resolution Multi-Omics Atlas Sequencing of Mouse Embryos via Deterministic Barcoding in Tissue

Yang Liu, Mingyu Yang et al.

1,282 downloads (posted 01 Oct 2019)

Spatial gene expression heterogeneity plays an essential role in a range of biological, physiological and pathological processes but it remains a challenge to conduct high spatial resolution, genome wide, unbiased biomolecular profiling over a large tissue area. Herein, we present a fundamentally new approach called microfluidic Deterministic Barcoding in Tissue for spatial omics sequencing (DBiTseq). It permits simultaneous barcoding of mRNAs, proteins, or even other omics on a fixed tissue slide to construct high spat...


12: Integrative analysis of 10,000 epigenomic maps across 800 samples for regulatory genomics and disease dissection

Carles Boix Adsera, Yongjin Park et al.

1,049 downloads (posted 18 Oct 2019)

To help elucidate genetic variants underlying complex traits, we develop EpiMap, a compendium of 833 reference epigenomes across 18 uniformly-processed and computationally-completed assays. We define chromatin states, high-resolution enhancers, activity patterns, enhancer modules, upstream regulators, and downstream target gene functions. We annotate 30,247 genetic variants associated with 534 traits, recognize principal and partner tissues underlying each trait, infer trait-tissue, tissue-tissue and trait-trait relatio...


13: Attacks on genetic privacy via uploads to genealogical databases

Michael D. Edge, Graham Coop

1,043 downloads (posted 22 Oct 2019)

Direct-to-consumer (DTC) genetics services are increasingly popular for genetic genealogy, with tens of millions of customers as of 2019. Several DTC genealogy services allow users to upload their own genetic datasets in order to search for genetic relatives. A user and a target person in the database are identified as genetic relatives if the user's uploaded genome shares one or more sufficiently long segments in common with that of the target person-that is, if the two genomes share one or more long regions identical ...


14: Next generation sequencing reveals NRAP as a candidate gene for hypertrophic cardiomyopathy in elderly patients

Ankit Sharma, Rakesh Koranchery et al.

985 downloads (posted 02 Oct 2019)

Hypertrophic cardiomyopathy (HCM) is a heterogenous heart muscle disease predominantly caused by sarcomeric protein encoding genes. However, the cause for a significant number of elderly patients remains unclear. Here, we performed whole-exome sequencing in a South Indian family with an elderly HCM proband. We identified a heterozygous missense variant in the Nebulin-Related-Anchoring Protein encoding gene NRAP (NM_001261463, c.1259A>G, p.Y420C) in the proband. NRAP is a multi-domain scaffolding protein involved in card...


15: Inference and effects of barcode multiplets in droplet-based single-cell assays

Caleb Lareau, Sai Ma et al.

975 downloads (posted 30 Oct 2019)

A widespread assumption for single-cell analyses specifies that one cell’s nucleic acids are predominantly captured by one oligonucleotide barcode. However, we show that ∼13-21% of cell barcodes from the 10x Chromium scATAC-seq assay may have been derived from a droplet with more than one oligonucleotide sequence, which we call “barcode multiplets”. We demonstrate that barcode multiplets can be derived from at least two different sources. First, we confirm that ∼4% of droplets from the 10x platform may contain multiple ...


16: A molecular cell atlas of the human lung from single cell RNA sequencing

Kyle J Travaglini, Ahmad N Nabhan et al.

972 downloads (posted 27 Aug 2019)

Although single cell RNA sequencing studies have begun providing compendia of cell expression profiles, it has proven more difficult to systematically identify and localize all molecular cell types in individual organs to create a full molecular cell atlas. Here we describe droplet- and plate-based single cell RNA sequencing applied to ~70,000 human lung and blood cells, combined with a multi-pronged cell annotation approach, which have allowed us to define the gene expression profiles and anatomical locations of 58 cel...


17: Integrating healthcare and research genetic data empowers the discovery of 49 novel developmental disorders

Joanna Kaplanis, Kaitlin E. Samocha et al.

965 downloads (posted 16 Oct 2019)

De novo mutations (DNMs) in protein-coding genes are a well-established cause of developmental disorders (DD). However, known DD-associated genes only account for a minority of the observed excess of such DNMs. To identify novel DD-associated genes, we integrated healthcare and research exome sequences on 31,058 DD parent-offspring trios, and developed a simulation-based statistical test to identify gene-specific enrichments of DNMs. We identified 299 significantly DD-associated genes, including 49 not previously robust...


18: L1 and B1 repeats blueprint the spatial organization of chromatin

J. Yuyang Lu, Lei Chang et al.

951 downloads (posted 13 Oct 2019)

Despite extensive mapping of three-dimensional (3D) chromatin structures, the basic principles underlying genome folding remain unknown. Here, we report a fundamental role for L1 and B1 retrotransposons in shaping the macroscopic 3D genome structure. Homotypic clustering of B1 and L1 repeats in the nuclear interior or at the nuclear and nucleolar peripheries, respectively, segregates the genome into mutually exclusive nuclear compartments. This spatial segregation of L1 and B1 is conserved in mouse and human cells, and ...


19: Insights into human genetic variation and population history from 929 diverse genomes

Anders Bergström, Shane A McCarthy et al.

937 downloads (posted 27 Jun 2019)

Genome sequences from diverse human groups are needed to understand the structure of genetic variation in our species and the history of, and relationships between, different populations. We present 929 high-coverage genome sequences from 54 diverse human populations, 26 of which are physically phased using linked-read sequencing. Analyses of these genomes reveal an excess of previously undocumented private genetic variation in southern and central Africa and in Oceania and the Americas, but an absence of fixed, private...


20: A single cell framework for multi-omic analysis of disease identifies malignant regulatory signatures in mixed phenotype acute leukemia

Jeffrey M. Granja, Sandy Klemm et al.

932 downloads (posted 09 Jul 2019)

In order to identify the molecular determinants of human diseases, such as cancer, that arise from a diverse range of tissue, it is necessary to accurately distinguish normal and pathogenic cellular programs. Here we present a novel approach for single-cell multi-omic deconvolution of healthy and pathological molecular signatures within phenotypically heterogeneous malignant cells. By first creating immunophenotypic, transcriptomic and epigenetic single-cell maps of hematopoietic development from healthy peripheral bloo...