Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 92,017 bioRxiv papers from 393,169 authors.
Most downloaded bioRxiv preprints of 2018
There were 20,050 manuscripts posted to biorxiv.org in 2018, downloaded 5,730,324 times. Below are the 25 preprints posted in 2018 that got the most downloads in that year.
1: The Genomic Formation of South and Central AsiaVagheesh Narasimhan, Nick Patterson et al.
Posted to bioRxiv 31 Mar 2018
Abstract: The genetic formation of Central and South Asian populations has been unclear because of an absence of ancient DNA. To address this gap, we generated genome-wide data from 362 ancient individuals, including the first from eastern Iran, Turan (Uzbekistan, Turkmenistan, and Tajikistan), Bronze Age Kazakhstan, and South Asia. Our data reveal a complex set of genetic sources that ultimately combined to form the ancestry of South Asians today. We document a southward spread of genetic ancestry from the Eurasian Steppe, correlating with the archaeologically known expansion of pastoralist sites from the Steppe to Turan in the Middle Bronze Age (2300-1500 BCE). These Steppe communities mixed genetically with peoples of the Bactria Margiana Archaeological Complex (BMAC) whom they encountered in Turan (primarily descendants of earlier agriculturalists of Iran), but there is no evidence that the main BMAC population contributed genetically to later South Asians. Instead, Steppe communities integrated farther south throughout the 2nd millennium BCE, and we show that they mixed with a more southern population that we document at multiple sites as outlier individuals exhibiting a distinctive mixture of ancestry related to Iranian agriculturalists and South Asian hunter-gathers. We call this group Indus Periphery because they were found at sites in cultural contact with the Indus Valley Civilization (IVC) and along its northern fringe, and also because they were genetically similar to post-IVC groups in the Swat Valley of Pakistan. By co-analyzing ancient DNA and genomic data from diverse present-day South Asians, we show that Indus Periphery-related people are the single most important source of ancestry in South Asia — consistent with the idea that the Indus Periphery individuals are providing us with the first direct look at the ancestry of peoples of the IVC — and we develop a model for the formation of present-day South Asians in terms of the temporally and geographically proximate sources of Indus Periphery-related, Steppe, and local South Asian hunter-gatherer-related ancestry. Our results show how ancestry from the Steppe genetically linked Europe and South Asia in the Bronze Age, and identifies the populations that almost certainly were responsible for spreading Indo-European languages across much of Eurasia.
2: Patterns of genetic differentiation and the footprints of historical migrations in the Iberian PeninsulaClare Bycroft, Ceres Fernandez-Rozadilla et al.
Posted to bioRxiv 12 Mar 2018
Abstract: Genetic differences within or between human populations (population structure) has been studied using a variety of approaches over many years. Recently there has been an increasing focus on studying genetic differentiation at fine geographic scales, such as within countries. Identifying such structure allows the study of recent population history, and identifies the potential for confounding in association studies, particularly when testing rare, often recently arisen variants. The Iberian Peninsula is linguistically diverse, has a complex demographic history, and is unique among European regions in having a centuries-long period of Muslim rule. Previous genetic studies of Spain have examined either a small fraction of the genome or only a few Spanish regions. Thus, the overall pattern of fine-scale population structure within Spain remains uncharacterised. Here we analyse genome-wide genotyping array data for 1,413 Spanish individuals sampled from all regions of Spain. We identify extensive fine-scale structure, down to unprecedented scales, smaller than 10 Km in some places. We observe a major axis of genetic differentiation that runs from east to west of the peninsula. In contrast, we observe remarkable genetic similarity in the north-south direction, and evidence of historical north-south population movement. Finally, without making particular prior assumptions about source populations, we show that modern Spanish people have regionally varying fractions of ancestry from a group most similar to modern north Moroccans. The north African ancestry results from an admixture event, which we date to 860 - 1120 CE, corresponding to the early half of Muslim rule. Our results indicate that it is possible to discern clear genetic impacts of the Muslim conquest and population movements associated with the subsequent Reconquista.
3: Identification of Pre-Existing Adaptive Immunity to Cas9 Proteins in HumansCarsten T. Charlesworth, Priyanka S Deshpande et al.
Posted to bioRxiv 05 Jan 2018
Abstract: The CRISPR-Cas9 system has proven to be a powerful tool for genome editing allowing for the precise modification of specific DNA sequences within a cell. Many efforts are currently underway to use the CRISPR-Cas9 system for the therapeutic correction of human genetic diseases. The most widely used homologs of the Cas9 protein are derived from the bacteria Staphylococcus aureus (S. aureus) and Streptococcus pyogenes (S. pyogenes). Based on the fact that these two bacterial species cause infections in the human population at high frequencies, we looked for the presence of pre-existing adaptive immune responses to their respective Cas9 homologs, SaCas9 (S. aureus homolog of Cas9) and SpCas9 (S. pyogenes homolog of Cas9). To determine the presence of anti-Cas9 antibodies, we probed for the two homologs using human serum and were able to detect antibodies against both, with 79% of donors staining against SaCas9 and 65% of donors staining against SpCas9. Upon investigating the presence of antigen-specific T-cells against the two homologs in human peripheral blood, we found anti-SaCas9 T-cells in 46% of donors. Upon isolating, expanding, and conducting antigen re-stimulation experiments on several of these donors anti-SaCas9 T-cells, we observed a SaCas9-specific response confirming that these T-cells were antigen-specific. We were unable to detect antigen-specific T-cells against SpCas9, although the sensitivity of the assay precludes us from concluding that such T-cells do not exist. Together, this data demonstrates that there are pre-existing humoral and cell-mediated adaptive immune responses to Cas9 in humans, a factor which must be taken into account as the CRISPR-Cas9 system moves forward into clinical trials.
4: Prefrontal cortex as a meta-reinforcement learning systemJane X Wang, Zeb Kurth-Nelson et al.
Posted to bioRxiv 06 Apr 2018
Abstract: Over the past twenty years, neuroscience research on reward-based learning has converged on a canonical model, under which the neurotransmitter dopamine 'stamps in' associations between situations, actions and rewards by modulating the strength of synaptic connections between neurons. However, a growing number of recent findings have placed this standard model under strain. In the present work, we draw on recent advances in artificial intelligence to introduce a new theory of reward-based learning. Here, the dopamine system trains another part of the brain, the prefrontal cortex, to operate as its own free-standing learning system. This new perspective accommodates the findings that motivated the standard model, but also deals gracefully with a wider range of observations, providing a fresh foundation for future research.
5: Observing the Cell in Its Native State: Imaging Subcellular Dynamics in Multicellular OrganismsTsung-Li Liu, Srigokul Upadhyayula et al.
Posted to bioRxiv 08 Jan 2018
Abstract: True physiological imaging of subcellular dynamics requires studying cells within their parent organisms, where all the environmental cues that drive gene expression, and hence the phenotypes we actually observe, are present. A complete understanding also requires volumetric imaging of the cell and its surroundings at high spatiotemporal resolution without inducing undue stress on either. We combined lattice light sheet microscopy with two-channel adaptive optics to achieve, across large multicellular volumes, noninvasive aberration-free imaging of subcellular processes, including endocytosis, organelle remodeling during mitosis, and the migration of axons, immune cells, and metastatic cancer cells in vivo. The technology reveals the phenotypic diversity within cells across different organisms and developmental stages, and may offer insights into how cells harness their intrinsic variability to adapt to different physiological environments.
6: A Framework for Intelligence and Cortical Function Based on Grid Cells in the NeocortexJeff Hawkins, Marcus Lewis et al.
Posted to bioRxiv 13 Oct 2018
Abstract: How the neocortex works is a mystery. In this paper we propose a novel framework for understanding its function. Grid cells are neurons in the entorhinal cortex that represent the location of an animal in its environment. Recent evidence suggests that grid cell-like neurons may also be present in the neocortex. We propose that grid cells exist throughout the neocortex, in every region and in every cortical column. They define a location-based framework for how the neocortex functions. Whereas grid cells in the entorhinal cortex represent the location of one thing, the body relative to its environment, we propose that cortical grid cells simultaneously represent the location of many things. Cortical columns in somatosensory cortex track the location of tactile features relative to the object being touched and cortical columns in visual cortex track the location of visual features relative to the object being viewed. We propose that mechanisms in the entorhinal cortex and hippocampus that evolved for learning the structure of environments are now used by the neocortex to learn the structure of objects. Having a representation of location in each cortical column suggests mechanisms for how the neocortex represents object compositionality and object behaviors. It leads to the hypothesis that every part of the neocortex learns complete models of objects and that there are many models of each object distributed throughout the neocortex. The similarity of circuitry observed in all cortical regions is strong evidence that even high-level cognitive tasks are learned and represented in a location-based framework.
7: Panoptic vDISCO imaging reveals neuronal connectivity, remote trauma effects and meningeal vessels in intact transparent miceRuiyao Cai, Chenchen Pan et al.
Posted to bioRxiv 23 Jul 2018
Abstract: Analysis of entire transparent rodent bodies could provide holistic information on biological systems in health and disease. However, it has been challenging to reliably image and quantify signal from endogenously expressed fluorescent proteins in large cleared mouse bodies due to the low signal contrast. Here, we devised a pressure driven, nanobody based whole-body immunolabeling technology to enhance the signal of fluorescent proteins by up to two orders of magnitude. This allowed us to image subcellular details in transparent mouse bodies through bones and highly autofluorescent tissues, and perform quantifications. We visualized for the first-time whole-body neuronal connectivity of an entire adult mouse and discovered that brain trauma induces degeneration of peripheral axons. We also imaged meningeal lymphatic vessels and immune cells through the intact skull and vertebra in naive animals and trauma models. Thus, our new approach can provide an unbiased holistic view of biological events affecting the nervous system and the rest of the body.
8: Ancient Fennoscandian genomes reveal origin and spread of Siberian ancestry in EuropeThiseas C. Lamnidis, Kerttu Majander et al.
Posted to bioRxiv 22 Mar 2018
Abstract: European history has been shaped by migrations of people, and their subsequent admixture. Recently, evidence from ancient DNA has brought new insights into migration events that could be linked to the advent of agriculture, and possibly to the spread of Indo-European languages. However, little is known so far about the ancient population history of north-eastern Europe, in particular about populations speaking Uralic languages, such as Finns and Saami. Here we analyse ancient genomic data from 11 individuals from Finland and Northwest Russia. We show that the specific genetic makeup of northern Europe traces back to migrations from Siberia that began at least 3,500 years ago. This ancestry was subsequently admixed into many modern populations in the region, in particular populations speaking Uralic languages today. In addition, we show that ancestors of modern Saami inhabited a larger territory during the Iron Age than today, which adds to historical and linguistic evidence for the population history of Finland.
9: Moving beyond P values: Everyday data analysis with estimation plotsJoses Ho, Tayfun Tumkaya et al.
Posted to bioRxiv 26 Jul 2018
Abstract: Over the past 75 years, a number of statisticians have advised that the data-analysis method known as null-hypothesis significance testing (NHST) should be deprecated (Berkson, 1942; Halsey et al., 2015; Wasserstein et al., 2019). The limitations of NHST have been extensively discussed, with a broad consensus that current statistical practice in the biological sciences needs reform. However, there is less agreement on reform’s specific nature, with vigorous debate surrounding what would constitute a suitable alternative (Altman et al., 2000; Benjamin et al., 2017; Cumming and Calin-Jageman, 2016). An emerging view is that a more complete analytic technique would use statistical graphics to estimate effect sizes and evaluate their uncertainty (Cohen, 1994; Cumming and Calin-Jageman, 2016). As these estimation methods require only minimal statistical retraining, they have great potential to shift the current data-analysis culture away from dichotomous thinking towards quantitative reasoning (Claridge-Chang and Assam, 2016). The evolution of statistics has been inextricably linked to the development of quantitative displays that support complex visual reasoning (Tufte, 2001). We consider that the graphic we describe here as estimation plot is the most intuitive way to display the complete statistical information about experimental data sets. However, a major obstacle to adopting estimation plots is accessibility to suitable software. To lower this hurdle, we have developed free software that makes high-quality estimation plotting available to all. Here, we explain the rationale for estimation plots by contrasting them with conventional charts used to display data with NHST results, and describe how the use of these graphs affords five major analytical advantages.
10: A comparison of single-cell trajectory inference methods: towards more accurate and robust toolsWouter Saelens, Robrecht Cannoodt et al.
Posted to bioRxiv 05 Mar 2018
Abstract: Using single-cell -omics data, it is now possible to computationally order cells along trajectories, allowing the unbiased study of cellular dynamic processes. Since 2014, more than 50 trajectory inference methods have been developed, each with its own set of methodological characteristics. As a result, choosing a method to infer trajectories is often challenging, since a comprehensive assessment of the performance and robustness of each method is still lacking. In order to facilitate the comparison of the results of these methods to each other and to a gold standard, we developed a global framework to benchmark trajectory inference tools. Using this framework, we compared the trajectories from a total of 29 trajectory inference methods, on a large collection of real and synthetic datasets. We evaluate methods using several metrics, including accuracy of the inferred ordering, correctness of the network topology, code quality and user friendliness. We found that some methods, including Slingshot, TSCAN and Monocle DDRTree, clearly outperform other methods, although their performance depended on the type of trajectory present in the data. Based on our benchmarking results, we therefore developed a set of guidelines for method users. However, our analysis also indicated that there is still a lot of room for improvement, especially for methods detecting complex trajectory topologies. Our evaluation pipeline can therefore be used to spearhead the development of new scalable and more accurate methods, and is available at github.com/dynverse/dynverse. To our knowledge, this is the first comprehensive assessment of trajectory inference methods. For now, we exclusively evaluated the methods on their default parameters, but plan to add a detailed parameter tuning procedure in the future. We gladly welcome any discussion and feedback on key decisions made as part of this study, including the metrics used in the benchmark, the quality control checklist, and the implementation of the method wrappers. These discussions can be held at github.com/dynverse/dynverse/issues.
11: Molecular architecture of the mouse nervous systemAmit Zeisel, Hannah Hochgerner et al.
Posted to bioRxiv 05 Apr 2018
Abstract: The mammalian nervous system executes complex behaviors controlled by specialised, precisely positioned and interacting cell types. Here, we used RNA sequencing of half a million single cells to create a detailed census of cell types in the mouse nervous system. We mapped cell types spatially and derived a hierarchical, data-driven taxonomy. Neurons were the most diverse, and were grouped by developmental anatomical units, and by the expression of neurotransmitters and neuropeptides. Neuronal diversity was driven by genes encoding cell identity, synaptic connectivity, neurotransmission and membrane conductance. We discovered several distinct, regionally restricted, astrocytes types, which obeyed developmental boundaries and correlated with the spatial distribution of key glutamate and glycine neurotransmitters. In contrast, oligodendrocytes showed a loss of regional identity, followed by a secondary diversi cation. The resource presented here lays a solid foundation for understanding the molecular architecture of the mammalian nervous system, and enables genetic manipulation of specific cell types.
12: Statistical physics of liquid brainsJordi Piñero, Ricard Solé
Posted to bioRxiv 26 Nov 2018
Abstract: Liquid neural networks (or ''liquid brains'') are a widespread class of cognitive living networks characterised by a common feature: the agents (ants or immune cells, for example) move in space. Thus, no fixed, long-term agent-agent connections are maintained, in contrast with standard neural systems. How is this class of systems capable of displaying cognitive abilities, from learning to decision-making? In this paper, the collective dynamics, memory and learning properties of liquid brains is explored under the perspective of statistical physics. Using a comparative approach, we review the generic properties of three large classes of systems, namely: standard neural networks (''solid brains''), ant colonies and the immune system. It is shown that, despite their intrinsic physical differences, these systems share key properties with standard neural systems in terms of formal descriptions, but strongly depart in other ways. On one hand, the attractors found in liquid brains are not always based on connection weights but instead on population abundances. However, some liquid systems use fluctuations in ways similar to those found in cortical networks, suggesting a relevant role of criticality as a way of rapidly reacting to external signals.
13: Genome-wide association analysis of lifetime cannabis use (N=184,765) identifies new risk loci, genetic overlap with mental health, and a causal influence of schizophrenia on cannabis useJoëlle A. Pasman, Karin J.H. Verweij et al.
Posted to bioRxiv 08 Jan 2018
Abstract: Cannabis use is a heritable trait  that has been associated with adverse mental health outcomes. To identify risk variants and improve our knowledge of the genetic etiology of cannabis use, we performed the largest genome-wide association study (GWAS) meta-analysis for lifetime cannabis use (N=184,765) to date. We identified 4 independent loci containing genome-wide significant SNP associations. Gene-based tests revealed 29 genome-wide significant genes located in these 4 loci and 8 additional regions. All SNPs combined explained 10% of the variance in lifetime cannabis use. The most significantly associated gene, CADM2, has previously been associated with substance use and risk-taking phenotypes [2-4]. We used S-PrediXcan to explore gene expression levels and found 11 unique eGenes. LD-score regression uncovered genetic correlations with smoking, alcohol use and mental health outcomes, including schizophrenia and bipolar disorder. Mendelian randomisation analysis provided evidence for a causal positive influence of schizophrenia risk on lifetime cannabis use.
14: Evaluation of UMAP as an alternative to t-SNE for single-cell dataEtienne Becht, Charles-Antoine Dutertre et al.
Posted to bioRxiv 10 Apr 2018
Abstract: Uniform Manifold Approximation and Projection (UMAP) is a recently-published non-linear dimensionality reduction technique. Another such algorithm, t-SNE, has been the default method for such task in the past years. Herein we comment on the usefulness of UMAP high-dimensional cytometry and single-cell RNA sequencing, notably highlighting faster runtime and consistency, meaningful organization of cell clusters and preservation of continuums in UMAP compared to t-SNE.
15: End-to-end differentiable learning of protein structureMohammed AlQuraishi
Posted to bioRxiv 14 Feb 2018
Abstract: Accurate prediction of protein structure is one of the central challenges of biochemistry. Despite significant progress made by co-evolution methods to predict protein structure from signatures of residue-residue coupling found in the evolutionary record, a direct and explicit mapping between protein sequence and structure remains elusive, with no substantial recent progress. Meanwhile, rapid developments in deep learning, which have found remarkable success in computer vision, natural language processing, and quantum chemistry raise the question of whether a deep learning based approach to protein structure could yield similar advancements. A key ingredient of the success of deep learning is the reformulation of complex, human-designed, multi-stage pipelines with differentiable models that can be jointly optimized end-to-end. We report the development of such a model, which reformulates the entire structure prediction pipeline using differentiable primitives. Achieving this required combining four technical ideas: (1) the adoption of a recurrent neural architecture to encode the internal representation of protein sequence, (2) the parameterization of (local) protein structure by torsional angles, which provides a way to reason over protein conformations without violating the covalent chemistry of protein chains, (3) the coupling of local protein structure to its global representation via recurrent geometric units, and (4) the use of a differentiable loss function to capture deviations between predicted and experimental structures. To our knowledge this is the first end-to-end differentiable model for learning of protein structure. We test the effectiveness of this approach using two challenging tasks: the prediction of novel protein folds without the use of co-evolutionary information, and the prediction of known protein folds without the use of structural templates. On the first task the model achieves state-of-the-art performance, even when compared to methods that rely on co-evolutionary data. On the second task the model is competitive with methods that use experimental protein structures as templates, achieving 3-7Å accuracy despite being template-free. Beyond protein structure prediction, end-to-end differentiable models of proteins represent a new paradigm for learning and modeling protein structure, with potential applications in docking, molecular dynamics, and protein design.
16: Spontaneous behaviors drive multidimensional, brain-wide population activityCarsen Stringer, Marius Pachitariu et al.
Posted to bioRxiv 22 Apr 2018
Abstract: Cortical responses to sensory stimuli are highly variable, and sensory cortex exhibits intricate spontaneous activity even without external sensory input. Cortical variability and spontaneous activity have been variously proposed to represent random noise, recall of prior experience, or encoding of ongoing behavioral and cognitive variables. Here, by recording over 10,000 neurons in mouse visual cortex, we show that spontaneous activity reliably encodes a high-dimensional latent state, which is partially related to the ongoing behavior of the mouse and is represented not just in visual cortex but across the forebrain. Sensory inputs do not interrupt this ongoing signal, but add onto it a representation of visual stimuli in orthogonal dimensions. Thus, visual cortical population activity, despite its apparently noisy structure, reliably encodes an orthogonal fusion of sensory and multidimensional behavioral information.
17: A Single-Cell Atlas of Cell Types, States, and Other Transcriptional Patterns from Nine Regions of the Adult Mouse BrainArpiar Saunders, Evan Macosko et al.
Posted to bioRxiv 10 Apr 2018
Abstract: The mammalian brain is composed of diverse, specialized cell populations, few of which we fully understand. To more systematically ascertain and learn from cellular specializations in the brain, we used Drop-seq to perform single-cell RNA sequencing of 690,000 cells sampled from nine regions of the adult mouse brain: frontal and posterior cortex (156,000 and 99,000 cells, respectively), hippocampus (113,000), thalamus (89,000), cerebellum (26,000), and all of the basal ganglia - the striatum (77,000), globus pallidus externus/nucleus basalis (66,000), entopeduncular/subthalamic nuclei (19,000), and the substantia nigra/ventral tegmental area (44,000). We developed computational approaches to distinguish biological from technical signals in single-cell data, then identified 565 transcriptionally distinct groups of cells, which we annotate and present through interactive online software we developed for visualizing and re-analyzing these data (DropViz). Comparison of cell classes and types across regions revealed features of brain organization. These included a neuronal gene-expression module for synthesizing axonal and presynaptic components; widely shared patterns in the combinatorial co-deployment of voltage-gated ion channels by diverse neuronal populations; functional distinctions among cells of the brain vasculature; and specialization of glutamatergic neurons across cortical regions to a degree not observed in other neuronal or non-neuronal populations. We describe systematic neuronal classifications for two complex, understudied regions of the basal ganglia, the globus pallidus externus and substantia nigra reticulata. In the striatum, where neuron types have been intensely researched, our data reveal a previously undescribed population of striatal spiny projection neurons (SPNs) comprising 4% of SPNs. The adult mouse brain cell atlas can serve as a reference for analyses of development, disease, and evolution.
18: The genetic prehistory of the Greater CaucasusChuan-Chao Wang, Sabine Reinhold et al.
Posted to bioRxiv 16 May 2018
Abstract: Archaeogenetic studies have described the formation of Eurasian 'steppe ancestry' as a mixture of Eastern and Caucasus hunter-gatherers. However, it remains unclear when and where this ancestry arose and whether it was related to a horizon of cultural innovations in the 4th millennium BCE that subsequently facilitated the advance of pastoral societies likely linked to the dispersal of Indo-European languages. To address this, we generated genome-wide SNP data from 45 prehistoric individuals along a 3000-year temporal transect in the North Caucasus. We observe a genetic separation between the groups of the Caucasus and those of the adjacent steppe. The Caucasus groups are genetically similar to contemporaneous populations south of it, suggesting that - unlike today - the Caucasus acted as a bridge rather than an insurmountable barrier to human movement. The steppe groups from Yamnaya and subsequent pastoralist cultures show evidence for previously undetected Anatolian farmer-related ancestry from different contact zones, while Steppe Maykop individuals harbour additional Upper Palaeolithic Siberian and Native American related ancestry.
19: Highly Multiplexed Single-Cell RNA-seq for Defining Cell Population and Transcriptional SpacesJase Gehring, Jong Hwee Park et al.
Posted to bioRxiv 05 May 2018
Abstract: We describe a universal sample multiplexing method for single-cell RNA-seq in which cells are chemically labeled with identifying DNA oligonucleotides. Analysis of a 96-plex perturbation experiment revealed changes in cell population structure and transcriptional states that cannot be discerned from bulk measurements, establishing a cost effective means to survey cell populations from large experiments and clinical samples with the depth and resolution of single-cell RNA-seq.
20: Population Replacement in Early Neolithic BritainSelina Brace, Yoan Diekmann et al.
Posted to bioRxiv 18 Feb 2018
Abstract: The roles of migration, admixture and acculturation in the European transition to farming have been debated for over 100 years. Genome-wide ancient DNA studies indicate predominantly Anatolian ancestry for continental Neolithic farmers, but also variable admixture with local Mesolithic hunter-gatherers. Neolithic cultures first appear in Britain c. 6000 years ago (kBP), a millennium after they appear in adjacent areas of northwestern continental Europe. However, the pattern and process of the British Neolithic transition remains unclear. We assembled genome-wide data from six Mesolithic and 67 Neolithic individuals found in Britain, dating from 10.5-4.5 kBP, a dataset that includes 22 newly reported individuals and the first genomic data from British Mesolithic hunter-gatherers. Our analyses reveals persistent genetic affinities between Mesolithic British and Western European hunter-gatherers over a period spanning Britain's separation from continental Europe. We find overwhelming support for agriculture being introduced by incoming continental farmers, with small and geographically structured levels of additional hunter-gatherer introgression. We find genetic affinity between British and Iberian Neolithic populations indicating that British Neolithic people derived much of their ancestry from Anatolian farmers who originally followed the Mediterranean route of dispersal and likely entered Britain from northwestern mainland Europe.
21: Phenotypic Age: a novel signature of mortality and morbidity riskZuyun Liu, Pei-Lun Kuo et al.
Posted to bioRxiv 05 Jul 2018
Abstract: Background: A person's rate of aging has important implications for his/her risk of death and disease, thus, quantifying aging using observable characteristics has important applications for clinical, basic, and observational research. We aimed to validate a novel aging measure, 'Phenotypic Age', constructed based on routine clinical chemistry measures, by assessing its applicability for differentiating risk for morbidity and mortality in both healthy and unhealthy populations of various ages. Methods: A nationally representative US sample, NHANES III, was used to derive 'Phenotypic Age' based on a linear combination of chronological age and nine multi-system clinical chemistry measures, selected via cox proportional elastic net. Mortality predictions were validated using an independent sample (NHANES IV), consisting of 11,432 participants, for whom we observed a total of 871 deaths, ascertained over 12.6 year of follow-up. Proportional hazard models and ROC curves were used to evaluate predictions. Results: Phenotypic Age was significantly associated with all-cause mortality and cause-specific mortality. These results were robust to age and sex stratification, and remained even when excluding short-term mortality. Similarly, Phenotypic Age was associated with mortality among seemingly 'healthy' participants, defined as those who were disease-free and had normal BMI at baseline, as well as the oldest-old (aged 85+), a group with high disease burden. Conclusions: Phenotypic Age is a reliable predictor of all-cause and cause-specific mortality in multiple subgroups of the population. Risk stratification by this composite measure is far superior to that of the individual measures that go into it, as well as traditional measures of health. It is able to differentiate individuals who appear healthy, who may have otherwise been missed using traditional health assessments. Further, it can differentiate risk among persons with shared disease burden. Overall, this easily measured metric may be useful in the clinical setting and facilitate secondary and tertiary prevention strategies.
22: Bayesian Inference for a Generative Model of Transcriptome Profiles from Single-cell RNA SequencingRomain Lopez, Jeffrey Regier et al.
Posted to bioRxiv 30 Mar 2018
Abstract: Transcriptome profiles of individual cells reflect true and often unexplored biological diversity, but are also affected by noise of biological and technical nature. This raises the need to explicitly model the resulting uncertainty and take it into account in any downstream analysis, such as dimensionality reduction, clustering, and differential expression. Here, we introduce Single-cell Variational Inference (scVI), a scalable framework for probabilistic representation and analysis of gene expression in single cells. Our model uses variational inference and stochastic optimization of deep neural networks to approximate the parameters that govern the distribution of expression values of each gene in every cell, using a non-linear mapping between the observations and a low-dimensional latent space. By doing so, scVI pools information between similar cells or genes while taking nuisance factors of variation such as batch effects and limited sensitivity into account. To evaluate scVI, we conducted a comprehensive comparative analysis to existing methods for distributional modeling and dimensionality reduction, all of which rely on generalized linear models. We first show that scVI scales to over one million cells, whereas competing algorithms can process at most tens of thousands of cells. Next, we show that scVI fits unseen data more closely and can impute missing data more accurately, both indicative of a better generalization capacity. We then utilize scVI to conduct a set of fundamental analysis tasks -- including batch correction, visualization, clustering and differential expression -- and demonstrate its accuracy in comparison to the state-of-the-art tools in each task. scVI is publicly available, and can be readily used as a principled and inclusive solution for multiple tasks of single-cell RNA sequencing data analysis.
23: Cortical Column and Whole Brain Imaging of Neural Circuits with Molecular Contrast and Nanoscale ResolutionRuixuan Gao, Shoh M Asano et al.
Posted to bioRxiv 23 Jul 2018
Abstract: Optical and electron microscopy have made tremendous inroads in understanding the complexity of the brain, but the former offers insufficient resolution to reveal subcellular details and the latter lacks the throughput and molecular contrast to visualize specific molecular constituents over mm-scale or larger dimensions. We combined expansion microscopy and lattice light sheet microscopy to image the nanoscale spatial relationships between proteins across the thickness of the mouse cortex or the entire Drosophila brain, including synaptic proteins at dendritic spines, myelination along axons, and presynaptic densities at dopaminergic neurons in every fly neuropil domain. The technology should enable statistically rich, large scale studies of neural development, sexual dimorphism, degree of stereotypy, and structural correlations to behavior or neural activity, all with molecular contrast.
24: The Repertoire of Mutational Signatures in Human CancerLudmil B. Alexandrov, Jaegil Kim et al.
Posted to bioRxiv 15 May 2018
Abstract: Somatic mutations in cancer genomes are caused by multiple mutational processes each of which generates a characteristic mutational signature. Using 84,729,690 somatic mutations from 4,645 whole cancer genome and 19,184 exome sequences encompassing most cancer types we characterised 49 single base substitution, 11 doublet base substitution, four clustered base substitution, and 17 small insertion and deletion mutational signatures. The substantial dataset size compared to previous analyses enabled discovery of new signatures, separation of overlapping signatures and decomposition of signatures into components that may represent associated, but distinct, DNA damage, repair and/or replication mechanisms. Estimation of the contribution of each signature to the mutational catalogues of individual cancer genomes revealed associations with exogenous and endogenous exposures and defective DNA maintenance processes. However, many signatures are of unknown cause. This analysis provides a systematic perspective on the repertoire of mutational processes contributing to the development of human cancer including a comprehensive reference set of mutational signatures in human cancer.
25: Integrating single-cell RNA-Seq with spatial transcriptomics in pancreatic ductal adenocarcinoma using multimodal intersection analysisReuben Moncada, Florian Wagner et al.
Posted to bioRxiv 26 Jan 2018
Abstract: To understand the architecture of a tissue it is necessary to know both the cell populations and their physical relationships to one another. Single-cell RNA-Seq (scRNA-Seq) has made significant progress towards the unbiased and systematic characterization of the cell populations within a tissue, as well as their cellular states, by studying hundreds and thousands of cells in a single experiment. However, the characterization of the spatial organization of individual cells within a tissue has been more elusive. The recently introduced "spatial transcriptomics" method (ST) reveals the spatial pattern of gene expression within a tissue section at a resolution of one thousand 100 µm spots, each capturing the transcriptomes of ~10-20 cells. Here, we present an approach for the integration of scRNA-Seq and ST data generated from the same sample of pancreatic cancer tissue. Using markers for cell-types identified by scRNA-Seq, we robustly deconvolved the cell-type composition of each ST spot, to generate a spatial atlas of cell proportions across the tissue. Studying this atlas, we found that distinct spatial localizations accompany each of the three cancer cell populations that we identified. Strikingly, we find that subpopulations defined in the scRNA-Seq data also exhibit spatial segregation in the atlas, suggesting such an atlas may be used to study the functional attributes of subpopulations. Our results provide a framework for creating a tumor atlas by mapping single-cell populations to their spatial region, as well as the inference of cell architecture in any tissue.
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!