Rxivist uses download data on preprints from bioRxiv to help you find the papers being discussed in your field. Currently indexing 100,745 bioRxiv papers from 425,630 authors.
Most downloaded bioRxiv papers, all time
in category systems biology
2,544 results found. For more information, click each entry to expand.
3,275 downloads systems biology
In embryology, image processing methods such as segmentation are applied to acquiring quantitative criteria from time-series three-dimensional microscopic images. When used to segment cells or intracellular organelles, several current deep learning techniques outperform traditional image processing algorithms. However, segmentation algorithms still have unsolved problems, especially in bioimage processing. The most critical issue is that the existing deep learning-based algorithms for bioimages can perform only semantic segmentation, which distinguishes whether a pixel is within an object (for example, nucleus) or not. In this study, we implemented a novel segmentation algorithm, based on deep learning, which segments each nucleus and adds different labels to the detected objects. This segmentation algorithm is called instance segmentation. Our instance segmentation algorithm, implemented as a neural network, which we named QCA Net, substantially outperformed 3D U-Net, which is the best semantic segmentation algorithm that uses deep learning. Using QCA Net, we quantified the nuclear number, volume, surface area, and center of gravity coordinates during the development of mouse embryos. In particular, QCA Net distinguished nuclei of embryonic cells from those of polar bodies formed in meiosis. We consider that QCA Net can greatly contribute to bioimage segmentation in embryology by generating quantitative criteria from segmented images.
3,261 downloads systems biology
RNA profiling is an excellent phenotype of cellular responses and tissue states, but can be costly to generate at the massive scale required for studies of regulatory circuits, genetic states or perturbation screens. Here, we draw on a series of advances over the last decade in the field of mathematics to establish a rigorous link between biological structure, data compressibility, and efficient data acquisition. We propose that very few random composite measurements - in which gene abundances are combined in a random linear combination - are needed to approximate the high-dimensional similarity between any pair of gene abundance profiles. We then show how finding latent, sparse representations of gene expression data would enable us to 'decompress' a small number of random composite measurements and recover high-dimensional gene expression levels that were not measured (unobserved). We present a new algorithm for finding sparse, modular structure, which improves the ability to interpret samples in terms of small numbers of active modules, and show that the modular structure we find is sufficient to recover gene expression profiles from composite measurements (with ~100-fold fewer composite measurements than genes). Moreover, the knowledge that sparse, modular structures exist allows us to recover expression profiles from composite measurements, even without access to any training data. Finally, we present a proof-of-concept experiment for making composite measurements in the laboratory, involving the measurement of linear combinations of RNA abundances. Altogether, our results suggest new compressive modalities in experimental biology that can form a foundation for massive scaling in high-throughput measurements, while also offering new insights into the interpretation of high-dimensional data. A recorded seminar presentation of this work is available at: https://www.youtube.com/watch?v=2dBZEOXqKHs
3,247 downloads systems biology
Recent studies using single cell RNA-seq (scRNA-seq) data derived from differentiating systems have raised fundamental questions regarding the discrete vs continuous nature of both differentiation and cell fate. Here we present Palantir, an algorithm that models trajectories of differentiating cells, which treats cell-fate as a probabilistic process, and leverages entropy to measure the changing nature of cell plasticity along the differentiation trajectory. Palantir generates a high resolution pseudotime ordering of cells, and assigns each cell state with its probability to differentiate into each terminal state. We apply Palantir to human bone marrow scRNA-seq data and detect key landmarks of hematopoietic differentiation. Palantir's resolution enables identification of key transcription factors driving lineage fate choices, as these TFs closely track when cells lose plasticity. We demonstrate that Palantir is generalizable to diverse tissue types and well-suited to resolve less studied differentiating systems.
3,227 downloads systems biology
Single-cell, spatially resolved 'omics analysis of tissues is poised to transform biomedical research and clinical practice. We have developed a computational histology topography cytometry analysis toolbox (histoCAT) to enable the interactive, quantitative, and comprehensive exploration of phenotypes of individual cells, cell-to-cell interactions, microenvironments, and morphological structures within intact tissues. histoCAT will be useful in all areas of tissue-based research. We highlight the unique abilities of histoCAT by analysis of highly multiplexed mass cytometry images of human breast cancer tissues.
3,221 downloads systems biology
Molecular differences between individual cells can lead to dramatic differences in cell fate, such as the difference between death versus survival of cancer cells upon treatment with anti-cancer drugs. These originating differences have remained hidden, however, due to our inability to precisely determine what variable molecular features lead to what cellular fates. Here, we trace drug-resistant cell fates back to differences in the molecular profiles of their drug-naive melanoma precursors, revealing a rich substructure of variability underlying a number of resistant phenotypes at the single cell level. We make these connections using Rewind, a methodology that combines genetic barcoding with an RNA-based readout to directly capture rare cells that give rise to cellular behaviors of interest. We performed extensive single cell analysis to identify differences in gene expression and MAP-kinase signaling that mark a rare population of drug-naive cells (initial frequency of ~1:1000-1:10,000 cells) that ultimately gives rise to drug resistant clones. We demonstrate that this rare subpopulation has rich substructure and is composed of several distinct subpopulations, and the molecular differences between these subpopulations predict future differences in phenotypic behavior, such as the ultimate proliferative capacity of drug resistant cells. Similarly, we show that treatments that modify the frequency of resistance can allow otherwise non-resistant cells in the drug-naive population to become resistant, and that these new populations are marked by the variable expression of distinct genes. Together, our results reveal the presence of hidden, rare-cell variability that can underlie a range of latent phenotypic outcomes upon drug exposure. ### Competing Interest Statement AR receives consulting income and AR and SMS receive royalties related to Stellaris RNA FISH probes.
3,210 downloads systems biology
Determining protein levels in each tissue and how they compare with RNA levels is important for understanding human biology and disease as well as regulatory processes that control protein levels. We quantified the relative protein levels from 12,627 genes across 32 normal human tissue types prepared by the GTEx project. Known and new tissue specific or enriched proteins (5,499) were identified and compared to transcriptome data. Many ubiquitous transcripts are found to encode highly tissue specific proteins. Discordance in the sites of RNA expression and protein detection also revealed potential sites of synthesis and action of protein signaling molecules. Overall, these results provide an extraordinary resource, and demonstrate that understanding protein levels can provide insights into metabolism, regulation, secretome, and human diseases. Summary Quantitative proteome study of 32 human tissues and integrated analysis with transcriptome data revealed that understanding protein levels could provide in-depth knowledge to post transcriptional or translational regulations, human metabolism, secretome, and diseases.
3,193 downloads systems biology
Mass spectrometry is the method of choice for deep and comprehensive analysis of proteomes and has become a key technology to support the progress in life science and biomedicine. However, sample preparation in proteomics is not standardized and contributes to a lack of reproducibility. The main challenge is to extract all proteins in a manner that enables efficient digestion into peptides and is compatible with subsequent mass spectrometric analysis. Current methods are based on the idea of removing detergents or chaotropic agents during sample processing, which are essential for protein extraction but interfere with digestion and LC-MS. These multi-step preparations are prone to losses, biases and contaminations, while being time-consuming and labor-intensive. We report a universal detergent-free method, named Sample Preparation by Easy Extraction and Digestion (SPEED), which is based on a simple three-step procedure, acidification, neutralization and digestion. SPEED is a one-pot method for peptide generation from various sources and is easily applicable even for lysis-resistant sample types as pure trifluoroacetic acid (TFA) is used for highly efficient protein extraction. SPEED-based sample processing is highly reproducible, provides exceptional peptide yields and enables preparation even of tissue samples with less than 15 min hands-on time and without any special equipment. Evaluation of SPEED performance revealed, that the number of quantified proteins and the quantitative reproducibility are superior compared to the well-established sample processing protocols FASP, ISD-Urea and SP3 for various sample types, including human cells, bacteria and tissue, even at low protein starting amounts.
3,187 downloads systems biology
Technologies that visualize multiple biomolecules at the nanometer scale in cells will enable deeper understanding of biological processes that proceed at the molecular scale. Current fluorescence-based methods for microscopy are constrained by a combination of spatial resolution limitations, limited parameters per experiment, and detector systems for the wide variety of biomolecules found in cells. We present here super-resolution ion beam imaging (srIBI), a secondary ion mass spectrometry approach capable of high-parameter imaging in 3D of targeted biological entities and exogenously added small molecules. Uniquely, the atomic constituents of the biomolecules themselves can often be used in our system as the "tag". We visualized the subcellular localization of the chemotherapy drug cisplatin simultaneously with localization of five other nuclear structures, with further carbon elemental mapping and secondary electron visualization, down to ~30 nm lateral resolution. Cisplatin was preferentially enriched in nuclear speckles and excluded from closed-chromatin regions, indicative of a role for cisplatin in active regions of chromatin. These data highlight how multiplexed super-resolution techniques, such as srIBI, will enable studies of biomolecule distributions in biologically relevant subcellular microenvironments.
3,172 downloads systems biology
Motivation: Parameter estimation methods for ordinary differential equation (ODE) models of biological processes can exploit gradients and Hessians of objective functions to achieve convergence and computational efficiency. However, the computational complexity of established methods to evaluate the Hessian scales linearly with the number of state variables and quadratically with the number of parameters. This limits their application to low-dimensional problems. Results: We introduce second order adjoint sensitivity analysis for the computation of Hessians and a hybrid optimization-integration based approach for profile likelihood computation. Second order adjoint sensitivity analysis scales linearly with the number of parameters and state variables. The Hessians are effectively exploited by the proposed profile likelihood computation approach. We evaluate our approaches on published biological models with real measurement data. Our study reveals an improved computational efficiency and robustness of optimization compared to established approaches, when using Hessians computed with adjoint sensitivity analysis. The hybrid computation method was more than two-fold faster than the best competitor. Thus, the proposed methods and implemented algorithms allow for the improvement of parameter estimation for medium and large scale ODE models. Availability: The algorithms for second order adjoint sensitivity analysis are implemented in the Advance MATLAB Interface CVODES and IDAS (AMICI, https://github.com/ICB-DCM/AMICI/). The algorithm for hybrid profile likelihood computation is implemented in the parameter estimation toolbox (PESTO, https://github.com/ICB-DCM/PESTO/). Both toolboxes are freely available under the BSD license.
3,087 downloads systems biology
The observations of phenotypic plasticity have stimulated the revival of 'epigenetics'. Over the past 70 years the term has come in many colors and flavors, depending on the biological discipline and time period. The meanings span from Waddington's "epigenotype" and "epigenetic landscape" to the molecular biologists' "epigenetic marks" embodied by DNA methylation and histone modifications. Here we seek to quell the ambiguity of the name. First we place "epigenetic" in the various historical contexts. Then, by presenting the formal concepts of dynamical systems theory we show that the "epigenetic landscape" is more than a metaphor: it has specific mathematical foundations. The latter explains how gene regulatory networks produce multiple attractor states, the self-stabilizing patterns of gene activation across the genome that account for "epigenetic memory". This network dynamics approach replaces the reductionist correspondence of molecular epigenetic modifications with concept of the epigenetic landscape, by providing a concrete and crisp correspondence.
3,077 downloads systems biology
Integrated -omics approaches are quickly spreading across microbiology research labs, leading to i) the possibility of detecting previously hidden features of microbial cells like multi-scale spatial organisation and ii) tracing molecular components across multiple cellular functional states. This promises to reduce the knowledge gap between genotype and phenotype and poses new challenges for computational microbiologists. We underline how the capability to unravel the complexity of microbial life will strongly depend on the integration of the huge and diverse amount of information that can be derived today from -omics experiments. In this work, we present opportunities and challenges of multi –omics data integration in current systems biology pipelines. We here discuss which layers of biological information are important for biotechnological and clinical purposes, with a special focus on bacterial metabolism and modelling procedures. A general review of the most recent computational tools for performing large-scale datasets integration is also presented, together with a possible framework to guide the design of systems biology experiments by microbiologists.
3,068 downloads systems biology
Omics data contains signal from the molecular, physical, and kinetic inter- and intra-cellular interactions that control biological systems. Matrix factorization techniques can reveal low-dimensional structure from high-dimensional data that reflect these interactions. These techniques can uncover new biological knowledge from diverse high-throughput omics data in topics ranging from pathway discovery to time course analysis. We review exemplary applications of matrix factorization for systems-level analyses. We discuss appropriate application of these methods, their limitations, and focus on analysis of results to facilitate optimal biological interpretation. The inference of biologically relevant features with matrix factorization enables discovery from high-throughput data beyond the limits of current biological knowledge-answering questions from high-dimensional data that we have not yet thought to ask.
2,981 downloads systems biology
Edward L. Huttlin, Raphael J. Bruckner, Jose Navarrete-Perea, Joe R. Cannon, Kurt Baltier, Fana Gebreab, Melanie P. Gygi, Alexandra Thornock, Gabriela Zarraga, Stanley Tam, John Szpyt, Alexandra Panov, Hannah Parzen, Sipei Fu, Arvene Golbazi, Eila Maenpaa, Keegan Stricker, Sanjukta Guha Thakurta, Ramin Rad, Joshua Pan, David P. Nusinow, Joao A. Paulo, Devin K. Schweppe, Laura Pontano Vaites, J. Wade Harper, Steven P. Gygi
Thousands of interactions assemble proteins into modules that impart spatial and functional organization to the cellular proteome. Through affinity-purification mass spectrometry, we have created two proteome-scale, cell-line-specific interaction networks. The first, BioPlex 3.0, results from affinity purification of 10,128 human proteins - half the proteome - in 293T cells and includes 118,162 interactions among 14,586 proteins; the second results from 5,522 immunoprecipitations in HCT116 cells. These networks model the interactome at unprecedented scale, encoding protein function, localization, and complex membership. Their comparison validates thousands of interactions and reveals extensive customization of each network. While shared interactions reside in core complexes and involve essential proteins, cell-specific interactions bridge conserved complexes, likely 'rewiring' each cell's interactome. Interactions are gained and lost in tandem among proteins of shared function as the proteome remodels to produce each cell's phenotype. Viewable interactively online through BioPlexExplorer, these networks define principles of proteome organization and enable unknown protein characterization.
2,973 downloads systems biology
A major biomedical challenge is the interpretation of genetic variation and the ability to design functional novel sequences. Since the space of all possible genetic variation is enormous, there is a concerted effort to develop reliable methods that can capture genotype to phenotype maps. State-of-art computational methods rely on models that leverage evolutionary information and capture complex interactions between residues. However, current methods are not suitable for a large number of important applications because they depend on robust protein or RNA alignments. Such applications include genetic variants with insertions and deletions, disordered proteins, and functional antibodies. Ideally, we need models that do not rely on assumptions made by multiple sequence alignments. Here we borrow from recent advances in natural language processing and speech synthesis to develop a generative deep neural network-powered autoregressive model for biological sequences that captures functional constraints without relying on an explicit alignment structure. Application to unseen experimental measurements of 43 deep mutational scans predicts the effect of insertions and deletions while matching state-of-art missense mutation prediction accuracies. We then test the model on single domain antibodies, or nanobodies, a complex target for alignment-based models due to the highly variable complementarity determining regions. We fit the model to a naïve llama immune repertoire and generate a diverse, optimized library of 105 nanobody sequences for experimental validation. Our results demonstrate the power of the 'alignment-free' autoregressive model in mutation effect prediction and design of traditionally challenging sequence families.
2,962 downloads systems biology
Nucleosomes restrict DNA accessibility throughout eukaryotic genomes, with repercussions for replication, transcription, and other DNA-templated processes. How this globally restrictive organization emerged from a presumably more open ancestral state remains poorly understood. Here, to better understand the challenges associated with establishing globally restrictive chromatin, we express histones in a naïve bacterial system that has not evolved to deal with nucleosomal structures: Escherichia coli . We find that histone proteins from the archaeon Methanothermus fervidus assemble on the E. coli chromosome in vivo and protect DNA from micrococcal nuclease digestion, allowing us to map binding footprints genome-wide. We provide evidence that nucleosome occupancy along the E. coli genome tracks intrinsic sequence preferences but is disturbed by ongoing transcription and replication. Notably, we show that higher nucleosome occupancy at promoters and across gene bodies is associated with lower transcript levels, consistent with local repressive effects. Surprisingly, however, this sudden enforced chromatinization has only mild repercussions for growth, suggesting that histones can become established as ubiquitous chromatin proteins without interfering critically with key DNA-templated processes. Our results have implications for the evolvability of transcriptional ground states and highlight chromatinization by archaeal histones as a potential avenue for controlling genome accessibility in synthetic prokaryotic systems.
2,927 downloads systems biology
Despite longstanding appreciation of gene expression heterogeneity in isogenic bacterial populations, affordable and scalable technologies for studying single bacterial cells have been limited. While single-cell RNA sequencing (scRNA-seq) has revolutionized studies of transcriptional heterogeneity in diverse eukaryotic systems, application of scRNA-seq to prokaryotes has been hindered by their extremely low mRNA abundance, lack of mRNA polyadenylation, and thick cell walls. Here, we present Prokaryotic Expression-profiling by Tagging RNA In Situ and sequencing (PETRI-seq), a low-cost, high-throughput, prokaryotic scRNA-seq pipeline that overcomes these technical obstacles. PETRI-seq uses in situ combinatorial indexing to barcode transcripts from tens of thousands of cells in a single experiment. PETRI-seq captures single cell transcriptomes of Gram-negative and Gram-positive bacteria with high purity and low bias, with median capture rates >200 mRNAs/cell for exponentially growing E. coli . These characteristics enable robust discrimination of cell-states corresponding to different phases of growth. When applied to wild-type S. aureus, PETRI-seq revealed a rare sub-population of cells undergoing prophage induction. We anticipate broad utility of PETRI-seq in defining single-cell states and their dynamics in complex microbial communities.
2,916 downloads systems biology
Bridging genotype to phenotype, the proteome has increasingly become of major importance to generate large, longitudinal sample series for data-driven biology and personalized medicine. Major improvements in laboratory automation, chromatography and software have increased the scale and precision of proteomics. So far missing are however mass spectrometric acquisition techniques that could deal with very fast chromatographic gradients. Here we present scanning SWATH, a data-independent acquisition (DIA) method, in which the DIA-typical stepwise windowed acquisition is replaced by a continuous movement of the precursor isolation window. Scanning SWATH accelerates the duty cycles to a few hundreds of milliseconds, and enables precursor mass assignment to the MS2 fragment traces for improving true positive precursor identification in fast proteome experiments. In combination with 800 µL/min high-flow chromatography, we report the quantification of 270 precursors per second, increasing the precursor identifications by 70% or more compared to previous methods. Scanning SWATH quantified 1,410 Human protein groups in conjunction with chromatographic gradients as fast as 30 seconds, 2,250 with 60-second gradients, and 4,586 in conjunction with 5-minute gradients. At high quantitative precision, our method hence increases the proteomic throughput to hundreds of samples per day per mass spectrometer. Scanning SWATH hence enables a broad range of new proteomic applications that depend on large numbers of cheap yet quantification precise proteomes. ### Competing Interest Statement N.B, G.I., F.W and S.T. are employees of SCIEX
2,877 downloads systems biology
Tapio Lönnberg, Valentine Svensson, Kylie R James, Daniel Fernandez-Ruiz, Ismail Sebina, Ruddy Montandon, Megan S F Soon, Lily G Fogg, Michael J.T. Stubbington, Frederik Otzen Bagger, Max Zwiessele, Neil Lawrence, Fernando Souza-Fonseca-Guimaraes, William R Heath, Oliver Billker, O Stegle, Ashraful Haque, Sarah A. Teichmann
Differentiation of naïve CD4+ T cells into functionally distinct T helper subsets is crucial for the orchestration of immune responses. Due to multiple levels of heterogeneity and multiple overlapping transcriptional programs in differentiating T cell populations, this process has remained a challenge for systematic dissection in vivo. By using single-cell RNA transcriptomics and computational modelling of temporal mixtures, we reconstructed the developmental trajectories of Th1 and Tfh cell populations during Plasmodium infection in mice at single-cell resolution. These cell fates emerged from a common, highly proliferative and metabolically active precursor. Moreover, by tracking clonality from T cell receptor sequences, we infer that ancestors derived from the same naïve CD4+ T cell can concurrently populate both Th1 and Tfh subsets. We further found that precursor T cells were coached towards a Th1 but not a Tfh fate by monocytes/macrophages. The integrated genomic and computational approach we describe is applicable for analysis of any cellular system characterized by differentiation towards multiple fates.
2,875 downloads systems biology
Background: Numerous centrality measures have been introduced to identify "central" nodes in large networks. The availability of a wide range of measures for ranking influential nodes leaves the user to decide which measure may best suit the analysis of a given network. The choice of a suitable measure is furthermore complicated by the impact of the network topology on ranking influential nodes by centrality measures. To approach this problem systematically, we examined the centrality profile of nodes of yeast protein-protein interaction networks (PPINs) in order to detect which centrality measure is succeeding in predicting influential proteins. We studied how different topological network features are reflected in a large set of commonly used centrality measures. Results: We used yeast PPINs to compare 27 common of centrality measures. The measures characterize and assort influential nodes of the networks. We applied principal component analysis (PCA) and hierarchical clustering and found that the most informative measures depend on the network's topology. Interestingly, some measures had a high level of contribution in comparison to others in all PPINs, namely Latora closeness, Decay, Lin, Freeman closeness, Diffusion, Residual closeness and Average distance centralities. Conclusions: The choice of a suitable set of centrality measures is crucial for inferring important functional properties of a network. We concluded that undertaking data reduction using unsupervised machine learning methods helps to choose appropriate variables (centrality measures). Hence, we proposed identifying the contribution proportions of the centrality measures with PCA as a prerequisite step of network analysis before inferring functional consequences, e.g., essentiality of a node.
2,826 downloads systems biology
The intestinal epithelium is a highly structured tissue composed of repeating crypt-villus units. Enterocytes, which constitute the most abundant cell type, perform the diverse tasks of absorbing a wide range of nutrients while protecting the body from the harsh bacterial-rich environment. It is unknown if these tasks are equally performed by all enterocytes or whether they are spatially zonated along the villus axis. Here, we performed whole-transcriptome measurements of laser-capture-microdissected villus segments to extract a large panel of landmark genes, expressed in a zonated manner. We used these genes to localize single sequenced enterocytes along the villus axis, thus reconstructing a global spatial expression map. We found that most enterocyte genes were zonated. Enterocytes at villi bottoms expressed an anti-bacterial Reg gene program in a microbiome-dependent manner, potentially reducing the crypt pathogen exposure. Translation, splicing and respiration genes steadily decreased in expression towards the villi tops, whereas distinct mid-top villus zones sub-specialized in the absorption of carbohydrates, peptides and fat. Enterocytes at the villi tips exhibited a unique gene-expression signature consisting of Klf4, Egfr, Neat1, Malat1, cell adhesion and purine metabolism genes. Our study exposes broad spatial heterogeneity of enterocytes, which could be important for achieving their diverse tasks.
- 20 Oct 2020: Support for sorting preprints using Twitter activity has been removed, at least temporarily, until a new source of social media activity data becomes available.
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!