Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 84,712 bioRxiv papers from 364,450 authors.
Most downloaded bioRxiv papers, since beginning of last month
in category genomics
5,337 results found. For more information, click each entry to expand.
15,145 downloads genomics
Fabiana Gámbaro, Sylvie Behillil, Artem Baidaliuk, Flora Donati, Mélanie Albert, Andreea Alexandru, Maud Vanpeene, Méline Bizard, Angela Brisebarre, Marion Barbet, Fawzi Derrar, Sylvie van der Werf, Vincent Enouf, Etienne Simon-Loriere
Following the emergence of coronavirus disease (COVID-19) in Wuhan, China in December 2019, specific COVID-19 surveillance was launched in France on January 10, 2020. Two weeks later, the first three imported cases of COVID-19 into Europe were diagnosed in France. We sequenced 97 severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genomes from samples collected between January 24 and March 24, 2020 from infected patients in France. Phylogenetic analysis identified several early independent SARS-CoV-2 introductions without local transmission, highlighting the efficacy of the measures taken to prevent virus spread from symptomatic cases. In parallel, our genomic data reveals the later predominant circulation of a major clade in many French regions, and implies local circulation of the virus in undocumented infections prior to the wave of COVID-19 cases. This study emphasizes the importance of continuous and geographically broad genomic sequencing and calls for further efforts with inclusion of asymptomatic infections. ### Competing Interest Statement The authors have declared no competing interest.
4,887 downloads genomics
COVID-19 pandemic is a major human tragedy. Worldwide, SARS-CoV-2 has already infected over 3 million and has killed about 230,000 people. SARS-CoV-2 originated in China and, within three months, has evolved to an additional 10 subtypes. One particular subtype with a non-silent (Aspartate to Glycine) mutation at 614th position of the Spike protein (D614G) rapidly outcompeted other pre-existing subtypes, including the ancestral. We assessed that D614G mutation generates an additional serine protease (Elastase) cleavage site near the S1-S2 junction of the Spike protein. We also identified that a single nucleotide deletion (delC) at a known variant site (rs35074065) in a cis-eQTL of TMPRSS2, is extremely rare in East Asians but is common in Europeans and North Americans. The delC allele facilitates entry of the 614G subtype into host cells, thus accelerating the spread of 614G subtype in Europe and North America where the delC allele is common. The delC allele at the cis-eQTL locus rs35074065 of TMPRSS2 leads to overexpression of both TMPRSS2 and a nearby gene MX1. The cis-eQTL site, rs35074065 overlaps with a transcription factor binding site of an activator (IRF1) and a repressor (IRF2). IRF1 activator can bind to variant delC allele, but IRF2 repressor fails to bind. Thus, in an individual carrying the delC allele, there is only activation, but no repression. On viral entry, IRF1 mediated upregulation of MX1 leads to neutrophil infiltration and processing of 614G mutated Spike protein by neutrophil Elastase. The simultaneous processing of 614G spike protein by TMPRSS2 and Elastase serine proteases facilitates the entry of the 614G subtype into host cells. Thus, SARS-CoV-2, particularly the 614G subtype, has spread more easily and with higher frequency to Europe and North America where the delC allele regulating expression of TMPRSS2 and MX1 host proteins is common, but not to East Asia where this allele is rare. ### Competing Interest Statement The authors have declared no competing interest.
4,213 downloads genomics
Background: SARS-CoV-2 most likely evolved from a bat beta-coronavirus and started infecting humans in December 2019. Since then it has rapidly infected people around the world, with more than 3 million confirmed cases by the end of April 2020. Early genome sequencing of the virus has enabled the development of molecular diagnostics and the commencement of therapy and vaccine development. The analysis of the early sequences showed relatively few evolutionary selection pressures. However, with the rapid worldwide expansion into diverse human populations, significant genetic variations are becoming increasingly likely. The current limitations on social movement between countries also offers the opportunity for these viral variants to become distinct strains with potential implications for diagnostics, therapies and vaccines. Methods: We used the current sequencing archives (NCBI and GISAID) to investigate 5,349 whole genomes, looking for evidence of strain diversification and selective pressure. Results: We used 3,958 SNPs to build a phylogenetic tree of SARS-CoV-2 diversity and noted strong evidence for the existence of two major clades and six sub-clades, unevenly distributed across the world. We also noted that convergent evolution has potentially occurred across several locations in the genome, showing selection pressures, including on the spike glycoprotein where we noted a potentially critical mutation that could affect its binding to the ACE2 receptor. We also report on mutations that could prevent current molecular diagnostics from detecting some of the sub-clades. Conclusions: The worldwide whole genome sequencing effort is revealing the challenge of developing SARS-CoV-2 containment tools suitable for everyone and the need for data to be continually evaluated to ensure accuracy in outbreak estimations. ### Competing Interest Statement The authors have declared no competing interest.
3,744 downloads genomics
Shahar Alon, Daniel R Goodwin, Anubhav Sinha, Asmamaw T. Wassie, Fei Chen, Evan R Daugharthy, Yosuke Bando, Atsushi Kajita, Andrew G Xue, Karl Marrett, Robert Prior, Yi Cui, Andrew C Payne, Chun-Chen Yao, Ho-Jun Suk, Ru Wang, Chih-Chieh (Jay) Yu, Paul Tillberg, Paul Reginato, Nikita Pak, Songlei Liu, Sukanya Punthambaker, Eswar P. R. Iyer, Richie E. Kohman, Jeremy A. Miller, Ed S Lein, Ana Lako, Nicole Cullen, Scott Rodig, Karla Helvie, Daniel L Abravanel, Nikhil Wagle, Bruce E. Johnson, Johanna Klughammer, Michal Slyper, Julia Waldman, Judit Jané-Valbuena, Orit Rozenblatt-Rosen, Aviv Regev, IMAXT Consortium, George M. Church, Adam H Marblestone, Edward S. Boyden
Methods for highly multiplexed RNA imaging are limited in spatial resolution, and thus in their ability to localize transcripts to nanoscale and subcellular compartments. We adapt expansion microscopy, which physically expands biological specimens, for long-read untargeted and targeted in situ RNA sequencing. We applied untargeted expansion sequencing (ExSeq) to mouse brain, yielding readout of thousands of genes, including splice variants and novel transcripts. Targeted ExSeq yielded nanoscale-resolution maps of RNAs throughout dendrites and spines in neurons of the mouse hippocampus, revealing patterns across multiple cell types; layer-specific cell types across mouse visual cortex; and the organization and position-dependent states of tumor and immune cells in a human metastatic breast cancer biopsy. Thus ExSeq enables highly multiplexed mapping of RNAs, from nanoscale to system scale. ### Competing Interest Statement The authors have declared no competing interest.
3,281 downloads genomics
Meriem Laamarti, Tarek Alouane, Souad Kartti, M.W. Chemao-Elfihri, Mohammed Hakmi, Abdelomunim Essabbar, Mohamed Laamarti, Haitam Hlali, Loubna Allam, Naima El Hafidi, Rachid El Jaoudi, Imane Allali, Nabila Marchoudi, Jamal Fekkak, Houda Benrahma, Chakib Nejjari, Saaid Amzazi, Lahcen Belyamani, Azeddine Ibrahimi
In late December 2019, an emerging viral infection COVID-19 was identified in Wuhan, China, and became a global pandemic. Characterization of the genetic variants of SARS-CoV-2 is crucial in following and evaluating it spread across countries. In this study, we collected and analyzed 3,067 SARS-CoV-2 genomes isolated from 55 countries during the first three months after the onset of this virus. Using comparative genomics analysis, we traced the profiles of the whole-genome mutations and compared the frequency of each mutation in the studied population. The accumulation of mutations during the epidemic period with their geographic locations was also monitored. The results showed 782 variant sites, of which 512 (65.47%) had a non-synonymous effect. Frequencies of mutated alleles revealed the presence of 38 recurrent non-synonymous mutations, including ten hotspot mutations with a prevalence higher than 0.10 in this population and distributed in six SARS-CoV-2 genes. The distribution of these recurrent mutations on the world map revealed certain genotypes specific to the geographic location. We also found co-occurring mutations resulting in the presence of several haplotypes. Moreover, evolution over time has shown a mechanism of mutation co-accumulation which might affect the severity and spread of the SARS-CoV-2. On the other hand, analysis of the selective pressure revealed the presence of negatively selected residues that could be taken into considerations as therapeutic targets. We have also created an inclusive unified database (http://genoma.ma/covid-19/) that lists all of the genetic variants of the SARS-CoV-2 genomes found in this study with phylogeographic analysis around the world. ### Competing Interest Statement The authors have declared no competing interest.
2,427 downloads genomics
COVID-19 has effectively spread worldwide. As of May 2020, Turkey is among the top ten countries with the most cases. A comprehensive genomic characterization of the virus isolates in Turkey is yet to be carried out. Here, we built a phylogenetic tree with 15,277 severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genomes. We identified the subtypes based on the phylogenetic clustering in comparison with the previously annotated classifications. We performed a phylogenetic analysis of the first thirty SARS-CoV-2 genomes isolated and sequenced in Turkey. Our results suggest that the first introduction of the virus to the country is earlier than the first reported case of infection. Virus genomes isolated from Turkey are dispersed among most types in the phylogenetic tree. Two of the seventeen sub-clusters were found enriched with the isolates of Turkey, which likely have spread expansively in the country. Finally, we traced virus genomes based on their phylogenetic placements. This analysis suggested multiple independent international introductions of the virus and revealed a hub for the inland transmission. We released a web application to track the global and interprovincial virus spread of the isolates from Turkey in comparison to thousands of genomes worldwide. ### Competing Interest Statement The authors have declared no competing interest.
2,098 downloads genomics
Brian Glenn St Hilaire, Neva C. Durand, Namita Mitra, Saul Godinez Pulido, Ragini Mahajan, Alyssa Blackburn, Zane L. Colaric, Joshua W. M. Theisen, David Weisz, Olga Dudchenko, Andreas Gnirke, Suhas S.P. Rao, Parwinder Kaur, Erez Lieberman Aiden, Aviva Presser Aiden
Early detection of infection with SARS-CoV-2 is key to managing the current global pandemic, as evidence shows the virus is most contagious on or before symptom onset. Here, we introduce a low-cost, high-throughput method for diagnosis of SARS-CoV-2 infection, dubbed Pathogen-Oriented Low-Cost Assembly & Re-Sequencing (POLAR), that enhances sensitivity by aiming to amplify the entire SARS-CoV-2 genome rather than targeting particular viral loci, as in typical RT-PCR assays. To achieve this goal, we combine a SARS-CoV-2 enrichment method developed by the ARTIC Network (https://artic.network/) with short-read DNA sequencing and de novo genome assembly. We are able to reliably (>95% accuracy) detect SARS-CoV-2 at concentrations of 84 genome equivalents per milliliter, better than the reported limits of detection of almost all diagnostic methods currently approved by the US Food and Drug Administration. At higher concentrations, we are able to reliably assemble the SARS-CoV-2 genome in the sample, often with no gaps and perfect accuracy. Such genome assemblies enable the spread of the disease to be analyzed much more effectively than would be possible with an ordinary yes/no diagnostic, and can help identify vaccine and drug targets. Finally, we show that POLAR diagnoses on 10 of 10 clinical nasopharyngeal swab samples (half positive, half negative) match those obtained in a CLIA-certified lab using the Center for Disease Control's 2019-Novel Coronavirus test. Using POLAR, a single person can process 192 samples over the course of an 8-hour experiment, at a cost of ~$30/patient, enabling a 24-hour turnaround with sequencing and data analysis time included. Further testing and refinement will likely enable greater enhancements in the sensitivity of the above approach. ### Competing Interest Statement The authors have declared no competing interest.
1,903 downloads genomics
Single cell transcriptomics (scRNA-seq) has transformed our ability to discover and annotate cell types and states, but deep biological understanding requires more than a taxonomic listing of clusters. As new methods arise to measure distinct cellular modalities, including high-dimensional immunophenotypes, chromatin accessibility, and spatial positioning, a key analytical challenge is to integrate these datasets into a harmonized atlas that can be used to better understand cellular identity and function. Here, we develop a computational strategy to "anchor" diverse datasets together, enabling us to integrate and compare single cell measurements not only across scRNA-seq technologies, but different modalities as well. After demonstrating substantial improvement over existing methods for data integration, we anchor scRNA-seq experiments with scATAC-seq datasets to explore chromatin differences in closely related interneuron subsets, and project single cell protein measurements onto a human bone marrow atlas to annotate and characterize lymphocyte populations. Lastly, we demonstrate how anchoring can harmonize in-situ gene expression and scRNA-seq datasets, allowing for the transcriptome-wide imputation of spatial gene expression patterns, and the identification of spatial relationships between mapped cell types in the visual cortex. Our work presents a strategy for comprehensive integration of single cell data, including the assembly of harmonized references, and the transfer of information across datasets. Availability: Installation instructions, documentation, and tutorials are available at: https://www.satijalab.org/seurat
1,873 downloads genomics
Joana Damas, Graham M. Hughes, Kathleen C. Keough, Corrie A. Painter, Nicole S. Persky, Marco Corbo, Michael Hiller, Klaus-Peter Koepfli, Andreas R. Pfenning, Huabin Zhao, Diane P. Genereux, Ross Swofford, Katherine S. Pollard, Oliver A. Ryder, Martin T. Nweeia, Kerstin Lindblad-Toh, Emma C. Teeling, Elinor K. Karlsson, Harris A. Lewin
The novel coronavirus SARS-CoV-2 is the cause of Coronavirus Disease-2019 (COVID-19). As for other coronaviruses, there is transmission between animals and humans. The main receptor of SARS-CoV-2, angiotensin I converting enzyme-2 (ACE2), is now undergoing extensive scrutiny to understand the routes of transmission and sensitivity in different species. Here, we utilized a unique dataset of 410 vertebrates, including 252 mammals, to study cross-species conservation of ACE2 and its likelihood to function as a SARS-CoV-2 receptor. We designed a five-category ranking scheme based on the conservation properties of 25 amino acids important for the binding between receptor and virus, classifying all species from very high to very low. Only mammals fell into the medium to very high categories, and only catarrhine primates in the very high category, suggesting that they are at high risk for SARS-CoV-2 infection. We employed a structural analysis to qualitatively assess whether amino acid changes at variable residues would be likely to disrupt ACE2/SARS-CoV-2 binding, and found the number of predicted unfavorable changes significantly correlated with the risk classification. Extending this analysis to human population data, we found only rare (<0.1%) variants in 10/25 binding sites. In addition, we observed evidence of positive selection in ACE2 in multiple species, including bats. Utilized appropriately, our results may lead to the identification of intermediate host species for SARS-CoV-2, justify the selection of animal models of COVID-19, and assist the conservation of animals both in native habitats and in human care. ### Competing Interest Statement The authors have declared no competing interest.
1,557 downloads genomics
Vagheesh M. Narasimhan, Nick Patterson, Priya Moorjani, Iosif Lazaridis, Mark Lipson, Swapan Mallick, Nadin Rohland, Rebecca Bernardos, Alexander M Kim, Nathan Nakatsuka, Iñigo Olalde, Alfredo Coppa, James Mallory, Vyacheslav Moiseyev, Janet Monge, Luca M Olivieri, Nicole Adamski, Nasreen Broomandkhoshbacht, Francesca Candilio, Olivia Cheronet, Brendan J Culleton, Matthew Ferry, Daniel Fernandes, Beatriz Gamarra, Daniel Gaudio, Mateja Hajdinjak, Éadaoin Harney, Thomas K Harper, Denise Keating, Ann Marie Lawson, Megan Michel, Mario Novak, Jonas Oppenheimer, Niraj Rai, Kendra Sirak, Viviane Slon, Kristin Stewardson, Zhao Zhang, Gaziz Akhatov, Anatoly N Bagashev, Bauryzhan Baitanayev, Gian Luca Bonora, Tatiana Chikisheva, Anatoly Derevianko, Enshin Dmitry, Katerina Douka, Nadezhda Dubova, Andrey Epimakhov, Suzanne Freilich, Dorian Fuller, Alexander Goryachev, Andrey Gromov, Bryan Hanks, Margaret Judd, Erlan Kazizov, Aleksander Khokhlov, Egor Kitov, Elena Kupriyanova, Pavel Kuznetsov, Donata Luiselli, Farhod Maksudov, Christopher Meiklejohn, Deborah Merrett, Roberto Micheli, Oleg Mochalov, Zahir Muhammed, Samariddin Mustafokulov, Ayushi Nayak, Rykun M Petrovna, Davide Pettener, Richard Potts, Dmitry Razhev, Stefania Sarno, Kulyan Sikhymbaeva, Sergey M Slepchenko, Nadezhda Stepanova, Svetlana Svyatko, Sergey Vasilyev, Massimo Vidale, Dmitriy Voyakin, Antonina Yermolayeva, Alisa Zubova, Vasant S Shinde, Carles Lalueza-Fox, Matthias Meyer, David Anthony, Nicole Boivin, Kumarasamy Thangaraj, Douglas J. Kennett, Michael Frachetti, Ron Pinhasi, David Reich
The genetic formation of Central and South Asian populations has been unclear because of an absence of ancient DNA. To address this gap, we generated genome-wide data from 362 ancient individuals, including the first from eastern Iran, Turan (Uzbekistan, Turkmenistan, and Tajikistan), Bronze Age Kazakhstan, and South Asia. Our data reveal a complex set of genetic sources that ultimately combined to form the ancestry of South Asians today. We document a southward spread of genetic ancestry from the Eurasian Steppe, correlating with the archaeologically known expansion of pastoralist sites from the Steppe to Turan in the Middle Bronze Age (2300-1500 BCE). These Steppe communities mixed genetically with peoples of the Bactria Margiana Archaeological Complex (BMAC) whom they encountered in Turan (primarily descendants of earlier agriculturalists of Iran), but there is no evidence that the main BMAC population contributed genetically to later South Asians. Instead, Steppe communities integrated farther south throughout the 2nd millennium BCE, and we show that they mixed with a more southern population that we document at multiple sites as outlier individuals exhibiting a distinctive mixture of ancestry related to Iranian agriculturalists and South Asian hunter-gathers. We call this group Indus Periphery because they were found at sites in cultural contact with the Indus Valley Civilization (IVC) and along its northern fringe, and also because they were genetically similar to post-IVC groups in the Swat Valley of Pakistan. By co-analyzing ancient DNA and genomic data from diverse present-day South Asians, we show that Indus Periphery-related people are the single most important source of ancestry in South Asia — consistent with the idea that the Indus Periphery individuals are providing us with the first direct look at the ancestry of peoples of the IVC — and we develop a model for the formation of present-day South Asians in terms of the temporally and geographically proximate sources of Indus Periphery-related, Steppe, and local South Asian hunter-gatherer-related ancestry. Our results show how ancestry from the Steppe genetically linked Europe and South Asia in the Bronze Age, and identifies the populations that almost certainly were responsible for spreading Indo-European languages across much of Eurasia.
1,550 downloads genomics
SARS-CoV-2 is a betacoronavirus that is responsible for the COVID-19 pandemic. The genome of SARS-CoV-2 was reported recently, but its transcriptomic architecture is unknown. Utilizing two complementary sequencing techniques, we here present a high-resolution map of the SARS-CoV-2 transcriptome and epitranscriptome. DNA nanoball sequencing shows that the transcriptome is highly complex owing to numerous recombination events, both canonical and noncanonical. In addition to the genomic RNA and subgenomic RNAs common in all coronaviruses, SARS-CoV-2 produces a large number of transcripts encoding unknown ORFs with fusion, deletion, and/or frameshift. Using nanopore direct RNA sequencing, we further find at least 41 RNA modification sites on viral transcripts, with the most frequent motif being AAGAA. Modified RNAs have shorter poly(A) tails than unmodified RNAs, suggesting a link between the internal modification and the 3′ tail. Functional investigation of the unknown ORFs and RNA modifications discovered in this study will open new directions to our understanding of the life cycle and pathogenicity of SARS-CoV-2. Highlights
1,458 downloads genomics
Hanqing Liu, Jingtian Zhou, Wei Tian, Chongyuan Luo, Anna Bartlett, Andrew I. Aldridge, Jacinta D. Lucero, Julia K. Osteen, Joseph R. Nery, Huaming Chen, Angeline Rivkin, Rosa G. Castanon, Ben Clock, Yang Eric Li, Xiaomeng Hou, Olivier Poirion, Sebastian Preissl, Carolyn O’Connor, Lara Boggeman, Conor Fitzpatrick, Michael Nunn, Eran A. Mukamel, Zhuzhu Zhang, Edward M. Callaway, Bing Ren, Jesse R. Dixon, M. Margarita Behrens, J. R. Ecker
Mammalian brain cells are remarkably diverse in gene expression, anatomy, and function, yet the regulatory DNA landscape underlying this extensive heterogeneity is poorly understood. We carried out a comprehensive assessment of the epigenomes of mouse brain cell types by applying single nucleus DNA methylation sequencing to profile 110,294 nuclei from 45 regions of the mouse cortex, hippocampus, striatum, pallidum, and olfactory areas. We identified 161 cell clusters with distinct spatial locations and projection targets. We constructed taxonomies of these epigenetic types, annotated with signature genes, regulatory elements, and transcription factors. These features indicate the potential regulatory landscape supporting the assignment of putative cell types, and reveal repetitive usage of regulators in excitatory and inhibitory cells for determining subtypes. The DNA methylation landscape of excitatory neurons in the cortex and hippocampus varied continuously along spatial gradients. Using this deep dataset, an artificial neural network model was constructed that precisely predicts single neuron cell-type identity and brain area spatial location. Integration of high-resolution DNA methylomes with single-nucleus chromatin accessibility data allowed prediction of high-confidence enhancer-gene interactions for all identified cell types, which were subsequently validated by cell-type-specific chromatin conformation capture experiments. By combining multi-omic datasets (DNA methylation, chromatin contacts, and open chromatin) from single nuclei and annotating the regulatory genome of hundreds of cell types in the mouse brain, our DNA methylation atlas establishes the epigenetic basis for neuronal diversity and spatial organization throughout the mouse brain. ### Competing Interest Statement J.R.E serves on the scientific advisory board of Zymo Research Inc. B.R. is a share holder of Arima Genomics.
1,363 downloads genomics
Kyle J. Travaglini, Ahmad N. Nabhan, Lolita Penland, Rahul Sinha, Astrid Gillich, Rene V Sit, Stephen Chang, Stephanie D Conley, Yasuo Mori, Jun Seita, Gerald J. Berry, Joseph B Shrager, Ross J Metzger, Christin S Kuo, Norma Neff, Irving L. Weissman, Stephen R. Quake, Mark A Krasnow
Although single cell RNA sequencing studies have begun providing compendia of cell expression profiles, it has proven more difficult to systematically identify and localize all molecular cell types in individual organs to create a full molecular cell atlas. Here we describe droplet- and plate-based single cell RNA sequencing applied to ∼75,000 human lung and blood cells, combined with a multi-pronged cell annotation approach, which have allowed us to define the gene expression profiles and anatomical locations of 58 cell populations in the human lung, including 41 of 45 previously known cell types or subtypes and 14 new ones. This comprehensive molecular atlas elucidates the biochemical functions of lung cell types and the cell-selective transcription factors and optimal markers for making and monitoring them; defines the cell targets of circulating hormones and predicts local signaling interactions including sources and targets of chemokines in immune cell trafficking and expression changes on lung homing; and identifies the cell types directly affected by lung disease genes and respiratory viruses. Comparison to mouse identified 17 molecular types that appear to have been gained or lost during lung evolution and others whose expression profiles have been substantially altered, revealing extensive plasticity of cell types and cell-type-specific gene expression during organ evolution including expression switches between cell types. This atlas provides the molecular foundation for investigating how lung cell identities, functions, and interactions are achieved in development and tissue engineering and altered in disease and evolution.
1,350 downloads genomics
The human pathogen severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is responsible for the major pandemic of the 21st century. We analyzed >4,700 SARS-CoV-2 genomes and associated meta-data retrieved from public repositories. SARS-CoV-2 sequences have a high sequence identity (>99.9%), which drops to >96% when compared to bat coronavirus. We built a mutation-annotated reference SARS-CoV-2 phylogeny with two main macro-haplogroups, A and B, both of Asian origin, and >160 sub-branches representing virus strains of variable geographical origins worldwide, revealing a uniform mutation occurrence along branches that could complicate the design of future vaccines. The root of SARS-CoV-2 genomes locates at the Chinese haplogroup B1, with a TMRCA dating to 12 November 2019 - thus matching epidemiological records. Sub-haplogroup A2a originates in China and represents the major non-Asian outbreak. Multiple bottleneck episodes, most likely associated with super-spreader hosts, explain COVID-19 pandemic to a large extent. ### Competing Interest Statement The authors have declared no competing interest.
1,261 downloads genomics
Konrad J. Karczewski, Laurent C Francioli, Grace Tiao, Beryl B. Cummings, Jessica Alföldi, Qingbo Wang, Ryan L. Collins, Kristen M Laricchia, Andrea Ganna, Daniel P. Birnbaum, Laura D Gauthier, Harrison Brand, Matthew Solomonson, Nicholas A Watts, Daniel Rhodes, Moriel Singer-Berk, Eleina M England, Eleanor G Seaby, Jack A. Kosmicki, Raymond K Walters, Katherine Tashman, Yossi Farjoun, Eric Banks, Timothy Poterba, Arcturus Wang, Cotton Seed, Nicola Whiffin, Jessica X. Chong, Kaitlin E. Samocha, Emma Pierce-Hoffman, Zachary Zappala, Anne H. O’Donnell-Luria, Eric V Minikel, Ben Weisburd, Monkol Lek, James S Ware, Christopher Vittal, Irina M Armean, Louis Bergelson, Kristian Cibulskis, Kristen M Connolly, Miguel Covarrubias, Stacey Donnelly, Steven Ferriera, Stacey Gabriel, Jeff Gentry, Namrata Gupta, Thibault Jeandet, Diane Kaplan, Christopher Llanwarne, Ruchi Munshi, Sam Novod, Nikelle Petrillo, David Roazen, Valentin Ruano-Rubio, Andrea Saltzman, Molly Schleicher, Jose Soto, Kathleen Tibbetts, Charlotte Tolonen, Gordon Wade, Michael E. Talkowski, Genome Aggregation Database (gnomAD) Consortium, Benjamin M Neale, Mark J. Daly, Daniel MacArthur
Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes critical for an organism’s function will be depleted for such variants in natural populations, while non-essential genes will tolerate their accumulation. However, predicted loss-of-function (pLoF) variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes. Here, we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence pLoF variants in this cohort after filtering for sequencing and annotation artifacts. Using an improved human mutation rate model, we classify human protein-coding genes along a spectrum representing tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve gene discovery power for both common and rare diseases. ### Competing Interest Statement : #ref-1
1,200 downloads genomics
The application of polygenic risk scores (PRS) has become routine in genetic epidemiological studies. Among a range of applications, PRS are commonly used to assess shared aetiology among different phenotypes and to evaluate the predictive power of genetic data, while they are also now being exploited as part of study design, in which experiments are performed on individuals, or their biological samples (eg. tissues, cells), at the tails of the PRS distribution and contrasted. As GWAS sample sizes increase and PRS become more powerful, they are also set to play a key role in personalised medicine. Despite their growing application and importance, there are limited guidelines for performing PRS analyses, which can lead to inconsistency between studies and misinterpretation of results. Here we provide detailed guidelines for performing polygenic risk score analyses relevant to different methods for their calculation, outlining standard quality control steps and offering recommendations for best-practice. We also discuss different methods for the calculation of PRS, common misconceptions regarding the interpretation of results and future challenges.
1,184 downloads genomics
The severe acute respiratory syndrome virus, SARS-CoV-2 (hereafter COVID-19), rapidly achieved global pandemic status, provoking large-scale screening programs in many nations. Their activation makes it imperative to identify methods that can deliver a diagnostic result at low cost. This paper describes an approach which employs sequence variation in the gene coding for its envelope protein as the basis for a scalable, inexpensive test for COVID-19. It achieves this by coupling a simple RNA extraction protocol with low-volume RT-PCR, followed by E-Gel screening and sequencing on high-throughput platforms to analyze 10,000 samples in a run. Slight modifications to the protocol could support screening programs for other known viruses and for viral discovery. Just as the $1,000 genome is transforming medicine, a $1 diagnostic test for viral and bacterial pathogens would represent a major advance for public health. ### Competing Interest Statement The authors have declared no competing interest.
1,131 downloads genomics
Ahmad Abou Tayoun, Tom Loney, Hamda Khansaheb, Sathishkumar Ramaswamy, Divinlal Harilal, Zulfa Omar Deesi, Rupa Murthy Varghese, Hanan Al Suwaidi, Abdulmajeed Alkhajeh, Laila Mohamed AlDabal, Mohammed Uddin, Rifat Hamoudi, Rabih Halwani, Abiola Senok, Qutayba Hamid, Norbert Nowotny, Alawi Alsheikh-Ali
WWhole genome sequencing and phylogenetic analysis of SARS-CoV-2 strains from the index and early patients with COVID-19 in Dubai (United Arab Emirates; UAE) showed multiple spatiotemporal introductions from Asia, Europe, and the Middle East. We combine genetic, demographic, and clinical information from early patients to show that the majority of introductions were from Europe and the Middle East/Iran, and to provide evidence for early community-based transmission. We further catalogue new mutations in SARS-CoV-2 isolates in the UAE. Our findings can be used to further understand the global transmission network of SARS-CoV-2. ### Competing Interest Statement The authors have declared no competing interest.
1,073 downloads genomics
The advent of large-scale single-cell chromatin accessibility profiling has accelerated our ability to map gene regulatory landscapes, but has outpaced the development of robust, scalable software to rapidly extract biological meaning from these data. Here we present a software suite for single-cell analysis of regulatory chromatin in R (ArchR; www.ArchRProject.com) that enables fast and comprehensive analysis of single-cell chromatin accessibility data. ArchR provides an intuitive, user-focused interface for complex single-cell analyses including doublet removal, single-cell clustering and cell type identification, robust peak set generation, cellular trajectory identification, DNA element to gene linkage, transcription factor footprinting, mRNA expression level prediction from chromatin accessibility, and multi-omic integration with scRNA-seq. Enabling the analysis of over 1.2 million single cells within 8 hours on a standard Unix laptop, ArchR is a comprehensive analytical suite for end-to-end analysis of single-cell chromatin accessibility data that will accelerate the understanding of gene regulation at the resolution of individual cells. ### Competing Interest Statement W.J.G. and H.Y.C. are consultants for 10x Genomics who has licensed IP associated with ATAC-seq. W.J.G. has additional affiliations with Guardant Health (consultant) and Protillion Biosciences (co-founder and consultant). H.Y.C. is a co-founder of Accent Therapeutics, Boundless Bio, and a consultant for Arsenal Biosciences and Spring Discovery.
1,041 downloads genomics
There is a pressing urgency to understand the entry route of SARS-CoV-2 viruses into the human body. SARS-CoV-2 viruses enter through ACE2 receptors after the S proteins of the virus are primed by proteases such as TMPRSS2. Most studies focused on the airway epithelial and lung alveolar cells as the route of infection, while the mode of transmission through the ocular route is not well established. Here, we profiled the presence of SARS-CoV-2 receptors and receptor-associated enzymes at single-cell resolution of thirty-three human ocular cell types. We identified unique populations of corneal cells with high ACE2 expression, among which the conjunctival cells co-expressed both ACE2 and TMPRSS2, suggesting that they could serve as the entry points for the virus. Integrative analysis further models the signaling and transcription regulon networks involved in the infection of distinct corneal cells. Our work constitutes a unique resource for the development of new treatments and management of COVID-19. ### Competing Interest Statement The authors have declared no competing interest.
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!