Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 62,232 bioRxiv papers from 276,288 authors.
Most downloaded bioRxiv papers, since beginning of last month
49,529 results found. For more information, click each entry to expand.
1,384 downloads genomics
Konrad Karczewski, Laurent C Francioli, Grace Tiao, Beryl B Cummings, Jessica Alföldi, Qingbo Wang, Ryan L Collins, Kristen M Laricchia, Andrea Ganna, Daniel P. Birnbaum, Laura D Gauthier, Harrison Brand, Matthew Solomonson, Nicholas A Watts, Daniel Rhodes, Moriel Singer-Berk, Eleina M England, Eleanor G Seaby, Jack A. Kosmicki, Raymond K Walters, Katherine Tashman, Yossi Farjoun, Eric Banks, Timothy Poterba, Arcturus Wang, Cotton Seed, Nicola Whiffin, Jessica X Chong, Kaitlin E. Samocha, Emma Pierce-Hoffman, Zachary Zappala, Anne H. O’Donnell-Luria, Eric Vallabh Minikel, Ben Weisburd, Monkol Lek, James S Ware, Christopher Vittal, Irina M Armean, Louis Bergelson, Kristian Cibulskis, Kristen M Connolly, Miguel Covarrubias, Stacey Donnelly, Steven Ferriera, Stacey Gabriel, Jeff Gentry, Namrata Gupta, Thibault Jeandet, Diane Kaplan, Christopher Llanwarne, Ruchi Munshi, Sam Novod, Nikelle Petrillo, David Roazen, Valentin Ruano-Rubio, Andrea Saltzman, Molly Schleicher, Jose Soto, Kathleen Tibbetts, Charlotte Tolonen, Gordon Wade, Michael E. Talkowski, The Genome Aggregation Database Consortium, Benjamin M Neale, Mark J. Daly, Daniel G. MacArthur
Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes critical for an organism’s function will be depleted for such variants in natural populations, while non-essential genes will tolerate their accumulation. However, predicted loss-of-function (pLoF) variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes. Here, we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence pLoF variants in this cohort after filtering for sequencing and annotation artifacts. Using an improved human mutation rate model, we classify human protein-coding genes along a spectrum representing tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve gene discovery power for both common and rare diseases.
1,232 downloads genomics
Taylor Sterling Adams, Jonas Christian Schupp, Sergio Poli, Ehab A Ayaub, Nir Neumark, Farida Ahangari, Sarah G Chu, Benjamin A. Raby, Giuseppe DeIuliis, Michael Januszyk, Qiaonan Duan, Heather A Arnett, Asim Siddiqui, George R Washko, Robert Homer, Xiting Yan, Ivan O Rosas, Naftali Kaminski
We provide a single cell atlas of Idiopathic Pulmonary Fibrosis (IPF), a fatal interstitial lung disease, focusing on resident lung cell populations. By profiling 312,928 cells from 32 IPF, 29 healthy control and 18 chronic obstructive pulmonary disease (COPD) lungs, we demonstrate that IPF is characterized by changes in discrete subpopulations of cells in the three major parenchymal compartments: the epithelium, endothelium and stroma. Among epithelial cells, we identify a novel population of IPF enriched aberrant basaloid cells that co-express basal epithelial markers, mesenchymal markers, senescence markers, developmental transcription factors and are located at the edge of myofibroblast foci in the IPF lung. Among vascular endothelial cells in the in IPF lung parenchyma we identify an expanded cell population transcriptomically identical to vascular endothelial cells normally restricted to the bronchial circulation. We confirm the presence of both populations by immunohistochemistry and independent datasets. Among stromal cells we identify fibroblasts and myofibroblasts in both control and IPF lungs and leverage manifold-based algorithms diffusion maps and diffusion pseudotime to infer the origins of the activated IPF myofibroblast. Our work provides a comprehensive catalogue of the aberrant cellular transcriptional programs in IPF, demonstrates a new framework for analyzing complex disease with scRNAseq, and provides the largest lung disease single-cell atlas to date.
1,183 downloads genomics
We developed Hackflex, a low-cost method for the production of Illumina-compatible sequencing libraries that allows up to 11 times more libraries for high-throughput Illumina sequencing to be generated at a fixed cost. We call this new method Hackflex. Quality of library preparation was tested by constructing libraries from E. coli MG1655 genomic DNA using either Hackflex, standard Nextera Flex or a variation of standard Nextera Flex in which the bead-linked transposase is diluted prior to use. We demonstrated that Hackflex can produce high quality libraries and yields a highly uniform coverage, equivalent to the standard Nextera Flex kit. Using Hackflex, we were able to achieve a per sample reagent cost of library prep of A$8.66, which is 8.23 times lower than the Standard Nextera Flex protocol at advertised retail price. An additional simple modification to the protocol enables a further price reduction of up to 11 fold or about A$6.50/sample. This method will allow researchers to construct more libraries within a given budget, thereby yielding more data and facilitating research programs where sequencing large numbers of libraries is beneficial.
1,182 downloads neuroscience
Machine learning-based analysis of human functional magnetic resonance imaging (fMRI) patterns has enabled the visualization of perceptual content. However, it has been limited to the reconstruction with low-level image bases or to the matching to exemplars. Recent work showed that visual cortical activity can be decoded (translated) into hierarchical features of a deep neural network (DNN) for the same input image, providing a way to make use of the information from hierarchical visual features. Here, we present a novel image reconstruction method, in which the pixel values of an image are optimized to make its DNN features similar to those decoded from human brain activity at multiple layers. We found that the generated images resembled the stimulus images (both natural images and artificial shapes) and the subjective visual content during imagery. While our model was solely trained with natural images, our method successfully generalized the reconstruction to artificial shapes, indicating that our model indeed reconstructs or generates images from brain activity, not simply matches to exemplars. A natural image prior introduced by another deep neural network effectively rendered semantically meaningful details to reconstructions by constraining reconstructed images to be similar to natural images. Furthermore, human judgment of reconstructions suggests the effectiveness of combining multiple DNN layers to enhance visual quality of generated images. The results suggest that hierarchical visual information in the brain can be effectively combined to reconstruct perceptual and subjective images.
1,153 downloads scientific communication and education
Every year for three years (2016 to 2018), I tried to identify every single person hired as a tenure track prof in ecology or an allied field (e.g., fish & wildlife) in N. America. I identified a total of 566 hires. I used public sources to compile various data on the new hires and the institutions that hired them (e.g., number of publications, teaching experience, hiring institution Carnegie class). I also compiled data provided by anonymous ecology faculty job seekers on ecoevojobs.net (e.g., number of positions applied for, number of publications, numbers of interviews and offers). And I polled readers of the Dynamic Ecology blog to get information about applicant and search committee behavior (e.g., regarding customization of applications to the hiring institution). These data address some widespread anxieties and misunderstandings about the ecology faculty job market, and also speak to gender diversity and equity in recent ecology faculty hiring. They complement, and in some cases improve on, other sources of information, such as anecdotal personal experiences.
1,077 downloads evolutionary biology
Ekaterina Khrameeva, Ilia Kurochkin, Dingding Han, Patricia Guijarro, Sabina Kanton, Malgorzata Santel, Zhengzong Qian, Shen Rong, Pavel Mazin, Matvei Bulat, Olga Efimova, Anna Tkachev, Song Guo, Chet Sherwood, Gray Camp, Svante Paabo, Barbara Treutlein, Philipp Khaitovich
Identification of gene expression traits unique to the human brain sheds light on the mechanisms of human cognition. Here we searched for gene expression traits separating humans from other primates by analyzing 88,047 cell nuclei and 422 tissue samples representing 33 brain regions of humans, chimpanzees, bonobos, and macaques. We show that gene expression evolves rapidly within cell types, with more than two-thirds of cell type-specific differences not detected using conventional RNA sequencing of tissue samples. Neurons tend to evolve faster in all hominids, but non-neuronal cell types, such as astrocytes and oligodendrocyte progenitors, show more differences on the human lineage, including alterations of spatial distribution across neocortical layers.
1,073 downloads genomics
Objective Type 2 diabetes (T2D) is a complex disease characterized by pancreatic islet dysfunction, insulin resistance, and disruption of blood glucose levels. Genome wide association studies (GWAS) have identified >400 independent signals that encode genetic predisposition. More than 90% of the associated single nucleotide polymorphisms (SNPs) localize to non-coding regions and are enriched in chromatin-defined islet enhancer elements, indicating a strong transcriptional regulatory component to disease susceptibility. Pancreatic islets are a mixture of cell types that express distinct hormonal programs, and so each cell type may contribute differentially to the underlying regulatory processes that modulate T2D-associated transcriptional circuits. Existing chromatin profiling methods such as ATAC-seq and DNase-seq, applied to islets in bulk, produce aggregate profiles that mask important cellular and regulatory heterogeneity. Methods We present genome-wide single cell chromatin accessibility profiles in >1,600 cells derived from a human pancreatic islet sample using single-cell-combinatorial-indexing ATAC-seq (sci-ATAC-seq). We also developed a deep learning model based on the U-Net architecture to accurately predict open chromatin peak calls in rare cell populations. Results We show that sci-ATAC-seq profiles allow us to deconvolve alpha, beta, and delta cell populations and identify cell-type-specific regulatory signatures underlying T2D. Particularly, we find that T2D GWAS SNPs are significantly enriched in beta cell-specific and cross cell-type shared islet open chromatin, but not in alpha or delta cell-specific open chromatin. We also demonstrate, using less abundant delta cells, that deep-learning models can improve signal recovery and feature reconstruction of rarer cell populations. Finally, we use co-accessibility measures to nominate the cell-specific target genes at 104 non-coding T2D GWAS signals. Conclusions Collectively, we identify the islet cell type of action across genetic signals of T2D predisposition and provide higher-resolution mechanistic insights into genetically encoded risk pathways.
1,067 downloads neuroscience
Eric M. Trautmann, Daniel J. O’Shea, Xulu Sun, James H Marshel, Ailey Crow, Brian Hsueh, Sam Vesuna, Lucas Cofer, Gergő Bohner, Will Allen, Isaac Kauvar, Sean Quirin, Matthew MacDougall, Yuzhi Chen, Matthew P. Whitmire, Charu Ramakrishnan, Maneesh Sahani, Eyal Seidemann, Stephen I Ryu, Karl Deisseroth, Krishna V Shenoy
Calcium imaging has rapidly developed into a powerful tool for recording from large populations of neurons in vivo . Imaging in rhesus macaque motor cortex can enable the discovery of new principles of motor cortical function and can inform the design of next generation brain-computer interfaces (BCIs). Surface two-photon (2P) imaging, however, cannot presently access somatic calcium signals of neurons from all layers of macaque motor cortex due to photon scattering. Here, we demonstrate an implant and imaging system capable of chronic, motion-stabilized two-photon (2P) imaging of calcium signals from in macaques engaged in a motor task. By imaging apical dendrites, some of which originated from deep layer 5 neurons, as as well as superficial cell bodies, we achieved optical access to large populations of deep and superficial cortical neurons across dorsal premotor (PMd) and gyral primary motor (M1) cortices. Dendritic signals from individual neurons displayed tuning for different directions of arm movement, which was stable across many weeks. Combining several technical advances, we developed an optical BCI (oBCI) driven by these dendritic signals and successfully decoded movement direction online. By fusing 2P functional imaging with CLARITY volumetric imaging, we verify that an imaged dendrite, which contributed to oBCI decoding, originated from a putative Betz cell in motor cortical layer 5. This approach establishes new opportunities for studying motor control and designing BCIs.
1,041 downloads bioinformatics
Kishwar Shafin, Trevor Pesout, Ryan Lorig-Roach, Marina Haukness, Hugh E Olsen, Colleen Bosworth, Joel Armstrong, Kristof Tigyi, Nicholas Maurer, Sergey Koren, Fritz J. Sedlazeck, Tobias Marschall, Simon Mayes, Vania Costa, Justin M Zook, Kelvin J Liu, Duncan Kilburn, Melanie Sorensen, Katy M Munson, Mitchell R. Vollger, Evan E Eichler, Sofie Salama, David Haussler, Richard E. Green, Mark Akeson, Adam Phillippy, Karen H Miga, Paolo Carnevali, Miten Jain, Benedict Paten
Present workflows for producing human genome assemblies from long-read technologies have cost and production time bottlenecks that prohibit efficient scaling to large cohorts. We demonstrate an optimized PromethION nanopore sequencing method for eleven human genomes. The sequencing, performed on one machine in nine days, achieved an average 63x coverage, 42 Kb read N50, 90% median read identity and 6.5x coverage in 100 Kb+ reads using just three flow cells per sample. To assemble these data we introduce new computational tools: Shasta - a de novo long read assembler, and MarginPolish & HELEN - a suite of nanopore assembly polishing algorithms. On a single commercial compute node Shasta can produce a complete human genome assembly in under six hours, and MarginPolish & HELEN can polish the result in just over a day, achieving 99.9% identity (QV30) for haploid samples from nanopore reads alone. We evaluate assembly performance for diploid, haploid and trio-binned human samples in terms of accuracy, cost, and time and demonstrate improvements relative to current state-of-the-art methods in all areas. We further show that addition of proximity ligation (Hi-C) sequencing yields near chromosome-level scaffolds for all eleven genomes.
1,019 downloads animal behavior and cognition
This tutorial introduces the reader to Gaussian process regression as an expressive tool to model, actively explore and exploit unknown functions. Gaussian process regression is a powerful, non-parametric Bayesian approach towards regression problems that can be utilized in exploration and exploitation scenarios. This tutorial aims to provide an accessible introduction to these techniques. We will introduce Gaussian processes which generate distributions over functions used for Bayesian non-parametric regression, and demonstrate their use in applications and didactic examples including simple regression problems, a demonstration of kernel-encoded prior assumptions and compositions, a pure exploration scenario within an optimal design framework, and a bandit-like exploration-exploitation scenario where the goal is to recommend movies. Beyond that, we describe a situation modelling risk-averse exploration in which an additional constraint (not to sample below a certain threshold) needs to be accounted for. Lastly, we summarize recent psychological experiments utilizing Gaussian processes. Software and literature pointers are also provided.
1,018 downloads immunology
Antibody recognition of antigen relies on the specific interaction of amino acids at the paratope-epitope interface. A long-standing question in the fields of immunology and structural biology is whether paratope-epitope interaction is predictable. A fundamental premise for the predictability of paratope-epitope binding is the existence of structural units that are universally shared among antibody-antigen binding complexes. Here, we identified structural interaction motifs, which together compose a vocabulary of paratope-epitope binding that is shared among investigated antibody-antigen complexes. The vocabulary (i) is finite with less than 104 motifs, (ii) mediates specific and non-redundant interactions between paratope-epitope pairs, (iii) is immunity-specific (distinct from the motif vocabulary used by non-immune protein-protein interactions), and (iv) enables the machine learning prediction of paratope or epitope. The discovery of a vocabulary of paratope-epitope interaction demonstrates the learnability and predictability of paratope-epitope interaction.
1,012 downloads genomics
We analyzed publicly available whole genome sequencing data from cattle which were germline genome-edited to introduce polledness. Our analysis discovered the unintended heterozygous integration of the plasmid and a second copy of the repair template sequence, at the target site. Our finding underscores the importance of employing screening methods suited to reliably detect the unintended integration of plasmids and multiple template copies.
1,009 downloads genomics
Davis McCarthy, Raghd Rostom, Yuanhua Huang, Daniel J Kunz, Petr Danecek, Marc Jan Bonder, Tzachi Hagai, HipSci Consortium, Wenyi Wang, Daniel J Gaffney, Benjamin D Simons, Oliver Stegle, Sarah A Teichmann
Decoding the clonal substructures of somatic tissues sheds light on cell growth, development and differentiation in health, ageing and disease. DNA-sequencing, either using bulk or using single-cell assays, has enabled the reconstruction of clonal trees from frequency and co-occurrence patterns of somatic variants. However, approaches to systematically characterize phenotypic and functional variations between individual clones are not established. Here we present cardelino (https://github.com/PMBio/cardelino), a computational method for inferring the clone of origin of individual cells that have been assayed using single-cell RNA-seq (scRNA-seq). After validating our model using simulations, we apply cardelino to matched scRNA-seq and exome sequencing data from 32 human dermal fibroblast lines, identifying hundreds of differentially expressed genes between cells from different somatic clones. These genes are frequently enriched for cell cycle and proliferation pathways, indicating a key role for cell division genes in non-neutral somatic evolution.
996 downloads neuroscience
Neurons undergo nanometer-scale deformations during action potentials, and the underlying mechanism has been actively debated for decades. Previous observations were limited to a single spot or the cell boundary, while movement across the entire neuron during the action potential remained unclear. We report full-field imaging of cellular deformations accompanying the action potential in mammalian neuron somas (-1.8nm~1.3nm) and neurites (-0.7nm~0.9nm), using fast quantitative phase imaging with a temporal resolution of 0.1ms and an optical pathlength sensitivity of <4pm per pixel. Spike-triggered average, synchronized to electrical recording, demonstrates that the time course of the optical phase changes matches the dynamics of the electrical signal, with the optical signal revealing the intracellular potential rather than its time derivative detected via extracellular electrodes. Using 3D cellular morphology extracted via confocal microscopy, we demonstrate that the voltage-dependent changes in the membrane tension induced by ionic repulsion can explain the magnitude, time course and spatial features of the phase imaging. Our full-field observations of the spike-induced deformations in mammalian neurons opens the door to non-invasive label-free imaging of neural signaling.
977 downloads genomics
Jiarui Ding, Xian Adiconis, Sean K Simmons, Monika S. Kowalczyk, Cynthia C. Hession, Nemanja D. Marjanovic, Travis K Hughes, Marc H Wadsworth, Tyler Burks, Lan T. Nguyen, John Y. H. Kwon, Boaz Barak, William Ge, Amanda J. Kedaigle, Shaina Carroll, Shuqiang Li, Nir Hacohen, Orit Rozenblatt-Rosen, Alex K Shalek, Alexandra-Chloé Villani, Aviv Regev, Joshua Z Levin
A multitude of single-cell RNA sequencing methods have been developed in recent years, with dramatic advances in scale and power, and enabling major discoveries and large scale cell mapping efforts. However, these methods have not been systematically and comprehensively benchmarked. Here, we directly compare seven methods for single cell and/or single nucleus profiling from three types of samples -- cell lines, peripheral blood mononuclear cells and brain tissue -- generating 36 libraries in six separate experiments in a single center. To analyze these datasets, we developed and applied scumi, a flexible computational pipeline that can be used for any scRNA-seq method. We evaluated the methods for both basic performance and for their ability to recover known biological information in the samples. Our study will help guide experiments with the methods in this study as well as serve as a benchmark for future studies and for computational algorithm development.
975 downloads biophysics
HIV-1 Gag protein self-assembles at the plasma membrane of infected cells for viral particle formation. Gag targets lipids, mainly the phosphatidylinositol (4,5) bisphosphate, at the inner leaflet of this membrane. Here, we address the question whether Gag is able to trap specifically PI(4,5)P2 or other lipids during HIV-1 assembly in the host CD4+ T lymphocytes. Lipid dynamics within and away from HIV-1 assembly sites was determined using super-resolution STED microscopy coupled with scanning Fluorescence Correlation Spectroscopy in living T cells. Analysis of HIV-1 infected cells revealed that, upon assembly, HIV-1 is able to specifically trap PI(4,5)P2, and cholesterol, but not phosphatidylethanolamine or sphingomyelin. Furthermore, our data show that Gag is the main driving force to restrict PI(4,5)P2 and cholesterol mobility at the cell plasma membrane. This is first direct evidence showing that HIV-1 creates its own specific lipid environment by selectively recruiting PI(4,5)P2 and cholesterol, as a membrane nano-platform for virus assembly.
961 downloads plant biology
Conditional manipulation of gene expression is a key approach to investigating the primary function of a gene in a biological process. While conditional and cell-type specific overexpression systems exist for plants, there are currently no systems available to disable a gene completely and conditionally. Here, we present a novel tool with which target genes can be efficiently conditionally knocked out at any developmental stage. The target gene is manipulated using the CRISPR-Cas9 genome editing technology, and conditionality is achieved with the well-established estrogen-inducible XVE system. Target genes can also be knocked-out in a cell-type specific manner. Our tool is easy to construct and will be particularly useful for studying genes which have null-alleles that are non-viable or show strong developmental defects.
955 downloads neuroscience
Serial and parallel processing in visual search have been long debated in psychology but the processing mechanism remains an open issue. Serial processing allows only one object at a time to be processed, whereas parallel processing assumes that various objects are processed simultaneously. Here we present novel neural models for the two types of processing mechanisms based on analysis of simultaneously recorded spike trains using electrophysiological data from prefrontal cortex of rhesus monkeys while processing task-relevant visual displays. We combine mathematical models describing neuronal attention and point process models for spike trains. The same model can explain both serial and parallel processing by adopting different parameter regimes. We present statistical methods to distinguish between serial and parallel processing based on both maximum likelihood estimates and decoding analysis of the attention when two stimuli are presented simultaneously. Results show that both processing mechanisms are in play for the simultaneously recorded neurons, but neurons tend to follow parallel processing in the beginning after the onset of the stimulus pair, whereas they tend to serial processing later on. This could be explained by parallel processing being related to sensory bottom-up signals or feedforward processing, which typically occur in the beginning after stimulus onset, whereas top-down signals related to cognitive modulatory influences guiding attentional effects in recurrent feedback connections occur after a small delay, and is related to serial processing, where all processing capacities are being directed towards the attended object.
942 downloads cancer biology
Chloe Chong, Markus Muller, HuiSong Pak, Dermot Harnett, Florian Huber, Delphine Grun, Marion Leleu, Aymeric Auger, Marion Arnaud, Brian J Stevenson, Justine Michaux, Ilija Bilic, Antje Hirsekorn, Lorenzo Calviello, Laia Simo-Riudalbas, Evarist Planet, Jan Lubinski, Marta Bryskiewicz, Maciej Wiznerowicz, Ioannis Xenarios, Lin Zhang, Didier Trono, Alexandre Harari, Uwe Ohler, George Coukos, Michal Bassani-Sternberg
Efforts to precisely identify tumor human leukocyte antigen (HLA) bound peptides capable of mediating T cell-based tumor rejection still face important challenges. Recent studies suggest that non-canonical tumor-specific HLA peptides that derive from annotated non-coding regions could elicit anti-tumor immune responses. However, sensitive and accurate mass-spectrometry (MS)-based proteogenomics approaches are required to robustly identify these non-canonical peptides. We present an MS-based analytical approach that characterizes the non-canonical tumor HLA peptide repertoire, by incorporating whole exome sequencing, bulk and single cell transcriptomics, ribosome profiling, and a combination of two MS/MS search tools. This approach results in the accurate identification of hundreds of shared and tumor-specific non-canonical HLA peptides and of an immunogenic peptide from a downstream reading frame in the melanoma stem cell marker gene ABCB5. It holds great promise for the discovery of novel cancer antigens for cancer immunotherapy.
937 downloads bioinformatics
Analysis of single-cell RNA-seq data begins with pre-processing of sequencing reads to generate count matrices. We investigate algorithm choices for the challenges of pre-processing, and describe a workflow that balances efficiency and accuracy. Our workflow is based on the kallisto (<https://pachterlab.github.io/kallisto/>) and bustools (<https://bustools.github.io/>) programs, and is near-optimal in speed and memory. The workflow is modular, and we demonstrate its flexibility by showing how it can be used for RNA velocity analyses. Documentation and tutorials for using the kallisto | bus workflow are available at <https://www.kallistobus.tools/>.
- Top preprints of 2018
- Paper search
- Author leaderboards
- Overall metrics
- The API
- Email newsletter
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!