Rxivist logo

Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 89,194 bioRxiv papers from 382,298 authors.

Most downloaded bioRxiv papers, since beginning of last month

in category genomics

5,570 results found. For more information, click each entry to expand.

5321: Phenotypic and Genotypic Antiviral Resistance Testing of HSV-1 Causing Recurrent Cutaneous Lesions in a Patient with DOCK8 Deficiency
more details view paper

Posted to bioRxiv 17 Oct 2019

Phenotypic and Genotypic Antiviral Resistance Testing of HSV-1 Causing Recurrent Cutaneous Lesions in a Patient with DOCK8 Deficiency
3 downloads genomics

Amanda M. Casto, Sean C. Stout, Rangaraj Selvarangan, Alexandra F. Freeman, Brandon D. Newell, Erin D. Stahl, Alexander L. Greninger, Dwight E. Yin

Antiviral resistance frequently complicates treatment of herpes simplex virus (HSV) infections in immunocompromised patients. Here we review the case of an adolescent boy with dedicator of cytokinesis 8 (DOCK8) deficiency, who experienced recurrent infections with resistant HSV-1. We used both phenotypic and genotypic methodologies to characterize the resistance profile of HSV-1 in the patient and conclude that genotypic testing outperformed phenotypic testing. We also present the first analysis of intrahost HSV-1 evolution in an immunocompromised patient. While HSV-1 can remain static in an immunocompetent individual for decades, the virus from this patient rapidly acquired genetic changes throughout its genome.

5322: Comparison of two African rice species through a new pan-genomic approach on massive data
more details view paper

Posted to bioRxiv 09 Jan 2018

Comparison of two African rice species through a new pan-genomic approach on massive data
3 downloads genomics

Cécile Monat, Christine Tranchant-Dubreuil, Stefan Engelen, Karine Labadie, Emmanuel Paradis, Ndomassi Tando, François Sabot

Pangenome theory implies that individuals from a given group/species share only a given part of their genome (core-genome), the remaining part being the dispensable one. Domestication process implies a small number of founder individuals, and thus a large core-genome compared to dispensable at the first steps of domestication. We sequenced at high depth 120 cultivated African rice Oryza glaberrima and of 74 wild relatives O.barthii, and mapped them on the external reference from Asian rice O. sativa. We then use a novel DepthOfCoverage approach to identif missing genes. After comparing the two species, we shown that the cultivated species has a smaller core-genome than the wild one, as well as an expected smaller dispensable one. This unexpected output however replaces in perspective the inadequacy of cultivated crops to wilderness.

5323: Global transcriptome profiling uncovers footprints of root and shoot development in crop models barley and tomato
more details view paper

Posted to bioRxiv 30 Oct 2019

Global transcriptome profiling uncovers footprints of root and shoot development in crop models barley and tomato
3 downloads genomics

Ali Ahmad Naz, Michael Schneider, Lucia Vedder, Bobby Mathew, Jens Léon

Land plants establish their forms at two development hotspots- root and shoots apices. In this study, we dissected and compared the global transcriptome of these developmental zones in crop models barley and tomato. We employed a state of the art transcriptome analysis technique for deep profiling of expressed genes. This analysis resulted in highly reproducible quantitative expression profiles of 19,441 and 23,388 genes in barley and tomato, respectively. In barley, 16,834 genes were expressed both in root and shoot apices, whereas 1,081 genes were specific to root apex and 1,526 genes were active in shoot apex. With significant variations 20,154 genes were expressed in root and shoot apices of tomato of which 1,858 and 1,376 genes showed root and shoot specificities. Systematic analyses of these genes revealed distinct commonalties and divergence among the active genes for root and shoot development. A deeper insight in these data uncover tissue- and species specific genes, unique footprints of gene ontologies and divergence of auxins pathway genes in root and shoot apices of barley and tomato. These data provide a primary resource to understand intra- and inter-species genetic networks of root and shoot development as well as the evolution of genes in crop plants.

5324: Germline DNA replication timing shapes mammalian genome composition
more details view paper

Posted to bioRxiv 01 Feb 2018

Germline DNA replication timing shapes mammalian genome composition
3 downloads genomics

Yishai Yehuda, Britny Blumenfeld, Nina Mayorek, Kirill Makedonski, Oriya Vardi, Yousef Mansour, Hagit Masika, Marganit Farago, Shulamit Baror-Sebban, Yosef Buganim, Amnon Koren, Itamar Simon

Mammalian DNA is replicated in a highly organized and regulated manner. Large, Mb-sized regions are replicated at defined times along S phase. DNA Replication Timing (RT) has been suggested to play an important role in shaping the mammalian genome by affecting mutation rates. Previous analyses relied on somatic DNA RT profiles, while to fully understand the influences of RT on the mammalian genome, germ cell RT information is necessary, as only germline mutations are passed to offspring and thus affect genomic composition. Using an improved RT mapping technique that allows mapping the RT from limited amounts of cells, we measured RT from two stages in the mouse germline - primordial germ cells (PGCs) and spermatogonial stem cells (SSCs). The germ cell RT profiles were distinct from those of both somatic and embryonic tissues. The correlations between RT and both mutation rate and recombination hotspots were not only confirmed in the germline tissues, but were shown to be stronger compared to correlations with RT of somatic tissues, emphasizing the importance of using RT profiles from the correct tissue of origin. Expanding the analysis to additional genetic features such as GC content, transposable elements (SINEs and LINEs) and gene density, also revealed a stronger correlation with the germ cell RT maps. GC content stratification along with multiple regression analysis revealed the independent contribution of RT to SINE, gene, mutation and recombination hotspot densities. Taken together, our results point to the centrality of RT in shaping multiple levels of mammalian genome composition.

5325: A chromosome-level assembly of the Atlantic herring – detection of a supergene and other signals of selection
more details view paper

Posted to bioRxiv 11 Jun 2019

A chromosome-level assembly of the Atlantic herring – detection of a supergene and other signals of selection
3 downloads genomics

Mats E. Pettersson, Christina M Rochus, Fan Han, Junfeng Chen, Jason Hill, Ola Wallerman, Guangyi Fan, Xiaoning Hong, Qiwu Xu, He Zhang, Shanshan Liu, Xin Liu, Leanne Haggerty, Toby Hunt, Fergal J. Martin, Paul Flicek, Ignas Bunikis, Arild Folkvord, Leif Andersson

The Atlantic herring is a model species for exploring the genetic basis for ecological adaptation, due to its huge population size and extremely low genetic differentiation at selectively neutral loci. However, such studies have so far been hampered because of a highly fragmented genome assembly. Here, we deliver a chromosome-level genome assembly based on a hybrid approach combining a de novo PacBio assembly with Hi-C-supported scaffolding. The assembly comprises 26 autosomes with sizes ranging from 12.4 to 33.1 Mb and a total size, in chromosomes, of 726 Mb. The development of a high-resolution linkage map confirmed the global chromosome organization and the linear order of genomic segments along the chromosomes. A comparison between the herring genome assembly with other high-quality assemblies from bony fishes revealed few interchromosomal but frequent intrachromosomal rearrangements. The improved assembly makes the analysis of previously intractable large-scale structural variation more feasible; allowing, for example, the detection of a 7.8 Mb inversion on chromosome 12 underlying ecological adaptation. This supergene shows strong genetic differentiation between populations from the northern and southern parts of the species distribution. The chromosome-based assembly also markedly improves the interpretation of previously detected signals of selection, allowing us to reveal hundreds of independent loci associated with ecological adaptation in the Atlantic herring.

5326: A classification framework for Bacillus anthracis defined by global genomic structure
more details view paper

Posted to bioRxiv 20 Jun 2019

A classification framework for Bacillus anthracis defined by global genomic structure
3 downloads genomics

Spencer A. Bruce, Nicholas J Schiraldi, Pauline L. Kamath, W. Ryan Easterday, Wendy C. Turner

Bacillus anthracis, the causative agent of anthrax, is a considerable global health threat affecting wildlife, livestock, and the general public. In this study whole-genome sequence analysis of over 350 B. anthracis isolates was used to establish a new high-resolution global genotyping framework that is both biogeographically informative, and compatible with multiple genomic assays. The data presented in this study shed new light on the diverse global dissemination of this species and indicate that many lineages may be uniquely suited to the geographic regions in which they are found. In addition, we demonstrate that plasmid genomic structure for this species is largely consistent with chromosomal population structure, suggesting vertical inheritance in this bacterium has contributed to its evolutionary persistence. This classification methodology is the first based on population genomic structure for this species and has potential use for local and broader institutions seeking to understand both disease outbreak origins and recent introductions. In addition, we provide access to a newly developed genotyping script as well as the full whole genome sequence analyses output for this study, allowing future studies to rapidly employ and append their data in the context of this global collection. This framework may act as a powerful tool for public health agencies, wildlife disease laboratories, and researchers seeking to utilize and expand this classification scheme for further investigations into B. anthracis evolution.

5327: Dysregulation of EMT Drives the Progression to Clinically Aggressive Sarcomatoid Bladder Cancer
more details view paper

Posted to bioRxiv 09 Aug 2018

Dysregulation of EMT Drives the Progression to Clinically Aggressive Sarcomatoid Bladder Cancer
3 downloads genomics

Charles C Guo, Tadeusz Majewski, Li Zhang, Hui Yao, Jolanta Bondaruk, Yan Wang, Shizhen Zhang, Ziqiao Wang, June Goo Lee, Sangkyou Lee, David Cogdell, Miao Zhang, Peng Wei, H. Barton Grossman, Ashish Kamat, Jonathan James Duplisea, James Edward Ferguson, He Huang, Vipulkumar Dadhania, Colin Dinney, John N. Weinstein, Keith Baggerly, David McConkey, Bogdan Czerniak

The sarcomatoid variant of urothelial bladder cancer (SARC) displays a high propensity for distant metastasis and is associated with short survival. We report a comprehensive genomic analysis of 28 cases of SARCs and 84 cases of conventional urothelial carcinomas (UCs), with the TCGA cohort of 408 muscle-invasive bladder cancers serving as the reference. SARCs showed a distinct mutational landscape with enrichment of TP53, RB1, and PIK3CA mutations. They were related to the basal molecular subtype of conventional UCs and could be divided into epithelial/basal and more clinically aggressive mesenchymal subsets based on TP63 and its target genes expression levels. Other analyses revealed that SARCs are driven by downregulation of homotypic adherence genes and dysregulation of cell cycle and EMT networks, and nearly half exhibited a heavily infiltrated immune phenotype. Our observations have important implications for prognostication and the development of more effective therapies for this highly lethal variant of bladder cancer.

5328: Chromosome-scale comparative sequence analysis unravels molecular mechanisms of genome evolution between two wheat cultivars
more details view paper

Posted to bioRxiv 05 Feb 2018

Chromosome-scale comparative sequence analysis unravels molecular mechanisms of genome evolution between two wheat cultivars
3 downloads genomics

Anupriya Kaur Thind, Thomas Wicker, Thomas Müller, Patrick M Ackermann, Burkhard Steuernagel, Brande B.H. Wulff, Manuel Spannagl, Sven O. Twardziok, Marius Felder, Thomas Lux, Klaus FX Mayer, International Wheat Genome Sequencing Consortium, Beat Keller, Simon G. Krattinger

Background: Recent improvements in DNA sequencing and genome scaffolding have paved the way to generate high-quality de novo assemblies of pseudomolecules representing complete chromosomes of wheat and its wild relatives. These assemblies form the basis to compare the evolutionary dynamics of wheat genomes on a megabase-scale. Results: Here, we provide a comparative sequence analysis of the 700-megabase chromosome 2D between two bread wheat genotypes, the old landrace Chinese Spring and the elite Swiss spring wheat line CH Campala Lr22a. There was a high degree of sequence conservation between the two chromosomes. Analysis of large structural variations revealed four large insertions/deletions (InDels) of >100 kb. Based on the molecular signatures at the breakpoints, unequal crossing over and double-strand break repair were identified as the evolutionary mechanisms that caused these InDels. Three of the large InDels affected copy number of NLRs, a gene family involved in plant immunity. Analysis of single nucleotide polymorphism (SNP) density revealed three haploblocks of 8 Mb, 9 Mb and 48 Mb with a 35-fold increased SNP density compared to the rest of the chromosome. Conclusions: This comparative analysis of two high-quality chromosome assemblies enabled a comprehensive assessment of large structural variations. The insight obtained from this analysis will form the basis of future wheat pan-genome studies.

5329: Sequencing and comparative analysis of three Chlorella genomes provide insights into strain-specific adaptation to wastewater
more details view paper

Posted to bioRxiv 13 May 2019

Sequencing and comparative analysis of three Chlorella genomes provide insights into strain-specific adaptation to wastewater
3 downloads genomics

Tian Wu, Linzhou Li, Xiaosen Jiang, Yong Yang, Yanzi Song, Liang Chen, Xun Xu, Yue Shen, Ying Gu

Microalgal Chlorella has been demonstrated to process wastewater efficiently from piggery industry, yet optimization through genetic engineering of such a bio-treatment is currently challenging, largely due to the limited data and knowledge in genomics. In this study, we first investigated the differential growth rates among three wastewater-processing Chlorella strains: Chlorella sorokiniana BD09, Chlorella sorokiniana BD08 and Chlorella sp. Dachan, and the previously published Chlorella sorokiniana UTEX 1602, showing us that BD09 maintains the best tolerance in synthetic wastewater. We then performed genome sequencing and analysis, resulting in a high-quality assembly for each genome with scaffold N50 >2 Mb and genomic completeness ≥ 91%, as well as genome annotation with 9,668, 10,240, 9,821 high-confidence gene models predicted for BD09, BD08, and Dachan, respectively. Comparative genomics study unravels that metabolic pathways, which are involved in nitrogen and phosphorus assimilation, were enriched in the faster-growing strains. We found that gene structural variation and genomic rearrangement might contribute to differential capabilities in wastewater tolerance among the strains, as indicated by gene copy number variation, domain reshuffling of orthologs involved, as well as a ~1 Mb-length chromosomal inversion we observed in BD08 and Dachan. In addition, we speculated that an associated bacterium, Microbacterium chocolatum, which was identified within Dachan, play a possible role in synergizing nutrient removal. Our three newly sequenced Chlorella genomes provide a fundamental foundation to understand the molecular basis of abiotic stress tolerance in wastewater treatment, which is essential for future genetic engineering and strain improvement.

5330: Accelerated DNA methylation aging and increased resilience in veterans: the biological cost for soldiering on
more details view paper

Posted to bioRxiv 23 Oct 2017

Accelerated DNA methylation aging and increased resilience in veterans: the biological cost for soldiering on
3 downloads genomics

Divya Mehta, Dagmar Bruenig, Bruce Lawford, Wendy Harvey, Tania Carrillo-Roa, Charles P. Morris, Tanja Jovanovic, Ross McD. Young, Elisabeth B. Binder, Joanne Voisey

Accelerated epigenetic aging, the difference between the DNA methylation-predicted age (DNAm age) and the chronological age, is associated with a myriad of diseases. This study investigates the relationship between epigenetic aging and risk and protective factors of PTSD. Genome-wide DNA methylation analysis was performed in 211 individuals including combat-exposed Australian veterans (discovery cohort, n = 96 males) and trauma-exposed civilian males from the Grady Trauma Project (replication cohort, n = 115 males). Primary measures included the Clinician Administered PTSD Scale for DSM-5 and the Connor-Davidson Resilience Scale (CDRISC). DNAm age prediction was performed using the validated epigenetic clock calculator. Veterans with PTSD had increased PTSD symptom severity (P-value = 3.75 x10-34) and lower CDRISC scores (P-value = 7.5 x10-8) than veterans without PTSD. DNAm age was significantly correlated with the chronological age (P-value = 3.3 x 10-6), but DNAm age acceleration was not different between the PTSD and non-PTSD groups (P-value = 0.24). Evaluating potential protective factors, we found that DNAm age acceleration was significantly associated with CDRISC resilience scores in veterans with PTSD, these results remained significant after multiple testing correction (P-value = 0.023; r = 0.32). This finding was also replicated in an independent trauma-exposed civilian cohort (P-value = 0.02; r = 0.23). Post-hoc factor analyses revealed that this association was driven by 'self-efficacy' items within the CDRISC (P-value = 0.015). These results suggest that among individuals already suffering from PTSD, some aspects of increased resilience might come at a biological cost.

5331: GPhase: Greedy Approach for Accurate Haplotype Inferencing
more details view paper

Posted to bioRxiv 04 Sep 2016

GPhase: Greedy Approach for Accurate Haplotype Inferencing
3 downloads genomics

Kshitij Tayal, Naveen Sivadasan, Rajgopal Srinivasan

We consider the computational problem of phasing an individual genotype sample given a collection of known haplotypes in the population. We give a fast and accurate algorithm GPhase for reconstructing haplotype pair consistent with input genotype. It uses the coalescent based mutation model of Stephens and Donnelly (2000). Computing optimal solution under this model is expensive and our algorithm uses a greedy approximation for fast and accurate estimation. Our algorithm is simple, efficient and has linear time and space complexity. Experiments on real datasets revealed improved gene level phasing accuracy for GPhase tool compared to other widely used tools such as SHAPEIT, Beagle, MaCH and Impute2. On simulated data, GPhase tool was able to phase samples each containing more than 1700 markers with high accuracy. GPhase can be used for gene level phasing of individual samples using publicly available haplotype datasets such as HapMap data or 1000 genome data. This finds applications in studies on recessive Mendelian disorders where parent data is lacking. GPhase is freely available for download and use from https://github.com/kshitijtayal/GPhase/.

5332: Oropouche virus cases identified in Ecuador using an optimised rRT-PCR informed by metagenomic sequencing
more details view paper

Posted to bioRxiv 27 Jun 2019

Oropouche virus cases identified in Ecuador using an optimised rRT-PCR informed by metagenomic sequencing
3 downloads genomics

Emma L. Wise, Sully Márquez, Jack Mellors, Verónica Paz, Barry Atkinson, Bernardo Gutierrez, Sonia Zapata, Josefina Coloma, Oliver G. Pybus, Simon K Jackson, Gabriel Trueba, Gyorgy Fejer, Christopher H Logue, Steven T Pullan

Oropouche virus (OROV) is responsible for outbreaks of Oropouche fever in parts of South America. We recently identified and isolated OROV from a febrile Ecuadorian patient, however, a previously published rRT-PCR assay did not detect OROV in the patient sample. A primer mismatch to the Ecuadorian OROV lineage was identified from metagenomic sequencing data. We report the optimisation of an rRT-PCR assay for the Ecuadorian OROV lineage, which subsequently identified a further five cases in a cohort of 196 febrile patients. We isolated OROV via cell culture and developed an algorithmically-designed primer set for whole-genome amplification of the virus. Metagenomic sequencing of the patient samples provided OROV genome coverage ranging from 68 - 99%. The additional cases formed a single phylogenetic cluster together with the initial case. OROV should be considered as a differential diagnosis for Ecuadorian patients with febrile illness to avoid mis-diagnosis with other circulating pathogens.

5333: Robustness of Transposable Element regulation but no genomic shock observed in interspecific Arabidopsis hybrids
more details view paper

Posted to bioRxiv 01 Feb 2018

Robustness of Transposable Element regulation but no genomic shock observed in interspecific Arabidopsis hybrids
3 downloads genomics

Ulrike Göbel, Agustin Arce, Fei He, Alain Rico, Gregor Schmitz, Juliette de Meaux

The merging of two divergent genomes in a hybrid is believed to trigger a genomic shock, disrupting gene regulation and transposable element (TE) silencing. Here, we tested this expectation by comparing the pattern of expression of transposable elements in their native and hybrid genomic context. For this, we sequenced the transcriptome of the Arabidopsis thaliana genotype Col-0, the A. lyrata genotype MN47 and their F1 hybrid. Contrary to expectations, we observe that the level of TE expression in the hybrid is strongly correlated to levels in the parental species. We detect that at most 1.1% of expressed transposable elements belonging to two specific subfamilies change their expression level upon hybridization. Most of these changes, however, are of small magnitude. We observe that the few hybrid-specific modifications in TE expression are more likely to occur when TE insertions are close to genes. In addition, changes in epigenetic histone marks H3K9me2 and H3K27me3 following hybridization do not coincide with TEs with changed expression. Finally, we further examined TE expression in parents and hybrids exposed to severe dehydration stress. Despite the major reorganization of gene and TE expression by stress, we observe that hybridization does not lead to increased disorganization of TE expression in the hybrid. We conclude that TE expression is globally robust to hybridization and that the term genomic shock is no longer appropriate to describe the anticipated consequences of merging divergent genomes in a hybrid.

5334: Identification of potential biomarkers associated with pathogenesis of primary prostate cancer based on meta-analysis approaches
more details view paper

Posted to bioRxiv 05 Mar 2020

Identification of potential biomarkers associated with pathogenesis of primary prostate cancer based on meta-analysis approaches
3 downloads genomics

Neda Sepahi, Mehrdad Piran, Mehran Piran, Ali Ghanbariasad

Worldwide prostate cancer (PCa) is recognized as the second most common diagnosed cancer and the fifth leading cause of cancer death among men globally. Rising incidence rates of PCa have been observed over the last few decades. It is necessary to improve prostate cancer detection, diagnosis, treatment and survival. However, there are few reliable biomarkers for early prostate cancer diagnosis and prognosis. In the current study, systems biology method was applied for transcriptomic data analysis to identify potential biomarkers for primary PCa. We firstly identified differentially expressed genes (DEGs) between primary PCa and normal samples. Then the DEGs were mapped in Wikipathways and gene ontology database to conduct functional categories enrichment analysis. 1575 unique DEGs with adjusted p-value < 0.05 were achieved from two sets of DEGs. 132 common DEGs between two sets of DEGs were retrieved. The final DEGs were selected from 60 common upregulated and 72 common downregulated genes between datasets. In conclusion, we demonstrated some potential biomarkers (FOXA1, AGR2, EPCAM, CLDN3, ERBB3, GDF15, FHL1, NPY, DPP4, and GADD45A) and HIST2H2BE as a candidate one which are tightly correlated with the pathogenesis of PCa.

5335: Genome-wide detection of genes under positive selection in worldwide populations of the barley scald pathogen
more details view paper

Posted to bioRxiv 22 Nov 2017

Genome-wide detection of genes under positive selection in worldwide populations of the barley scald pathogen
3 downloads genomics

Norfarhan Mohd-Assaad, Bruce A. McDonald, Daniel Croll

The coevolution between hosts and pathogens generates strong selection pressures to maintain resistance and infectivity, respectively. Genomes of plant pathogens often encode major effect loci for the ability to successfully infect a specific host. Hence, heterogeneity in the host genotypes and abiotic factors leads to locally adapted pathogen populations. However, the genetic basis of local adaptation is poorly understood. We analyzed global field populations of Rhynchosporium commune, the pathogen causing barley scald disease, to identify candidate genes for local adaptation. Whole genome sequencing data generated for 125 isolates showed that the pathogen is subdivided into three genetic clusters associated with distinct geographic and climatic regions. Using haplotype-based selection scans applied independently to each genetic cluster, we found strong evidence for selective sweeps throughout the genome. Comparisons of loci under selection among clusters revealed little overlap, suggesting that ecological differences associated with each cluster led to variable selection regimes. The strongest signals of selection were found predominantly in the two clusters composed of isolates from Central Europe and Ethiopia. The strongest selective sweep regions encoded proteins with functions related to both biotic and abiotic stresses. We found that selective sweep regions were enriched in genes encoding functions in cellular localization, protein transport activity, and DNA damage responses. In contrast to the prevailing view that a small number of gene-for-gene interactions govern plant pathogen evolution, our analyses suggest that the evolutionary trajectory is largely determined by spatially heterogeneous biotic and abiotic selection pressures.

5336: Mapping eQTL by leveraging multiple tissues and DNA methylation
more details view paper

Posted to bioRxiv 14 Aug 2016

Mapping eQTL by leveraging multiple tissues and DNA methylation
3 downloads genomics

Chaitanya R Acharya, Kouros Owzar, Andrew S. Allen

Background: DNA methylation is an important tissue-specific epigenetic event that influences transcriptional regulation of gene expression. Differentially methylated CpG sites may act as mediators between genetic variation and gene expression, and this relationship can be exploited while mapping multi-tissue expression quantitative trait loci (eQTL). Current multi-tissue eQTL mapping techniques are limited to only exploiting gene expression patterns across multiple tissues either in a joint tissue or tissue-by-tissue frameworks. We present a new statistical approach that enables us to model the effect of germ-line variation on tissue-specific gene expression in the presence of effects due to DNA methylation. Results: Our method efficiently models genetic and epigenetic variation to identify genomic regions of interest containing combinations of mRNA transcripts, CpG sites, and SNPs by jointly testing for genotypic effect and higher order interaction effects between genotype, methylation and tissues. We demonstrate using Monte Carlo simulations that our approach, in the presence of both genetic and DNA methylation effects, gives an improved performance (in terms of statistical power) to detect eQTLs over the current eQTL mapping approaches. When applied to an array-based dataset from 150 neuropathologically normal adult human brains, our method identifies eQTLs that were undetected using standard tissue-by-tissue or joint tissue eQTL mapping techniques. As an example, our method identifies eQTLs in a BAX inhibiting gene (TMBIM1), which may have a role in the pathogenesis of Alzheimer disease. Conclusions: Our score test-based approach does not need parameter estimation under the alternative hypothesis. As a result, our model parameters are estimated only once for each mRNA - CpG pair. Our model specifically studies the effects of non-coding regions of DNA (in this case, CpG sites) on mapping eQTLs. However, we can easily model micro-RNAs instead of CpG sites to study the effects of post-transcriptional events in mapping eQTL. Our model's flexible framework also allows us to investigate other genomic events such as alternative gene splicing by extending our model to include gene isoform-specific data.

5337: Correcting values of DNA sequence similarity for errors in sequencing
more details view paper

Posted to bioRxiv 22 Dec 2017

Correcting values of DNA sequence similarity for errors in sequencing
3 downloads genomics

Timothy J. Hackmann

The similarity between two DNA sequences is one of the most important measures in bioinformatics, but errors introduced during sequencing make values of similarity lower than they should be. Here we develop a method to correct raw sequence similarity for sequencing errors and estimate the original sequence similarity. Our method is simple and consists of a single equation with terms for 1) raw sequence similarity and 2) error rates (e.g., from Phred quality scores). We show the importance of this correction for 16S ribosomal DNA sequences from bacterial communities, where 97% similarity is a frequent threshold for clustering sequences for analysis. At that threshold and typical error rate of 0.2%, correcting for error increases similarity by 0.36 percentage points. This result shows that, if uncorrected, sequencing error would increase similarity thresholds and generate false clusters for analysis. Our method could be used to adjust thresholds for cluster-based analyses. Alternatively, because it requires no clustering to correct sequence similarity, it could usher in a new age of analyzing ribosomal DNA sequences without clustering.

5338: Functional characterization of 3D-protein structures informed by human genetic diversity
more details view paper

Posted to bioRxiv 29 Aug 2017

Functional characterization of 3D-protein structures informed by human genetic diversity
3 downloads genomics

Michael Hicks, Istvan Bartha, Julia di Iulio, Ruben Abagyan, J. Craig Venter, Amalio Telenti

Sequence variation data of the human proteome can be used to analyze 3-dimensional (3D) protein structures to derive functional insights. We used genetic variant data from nearly 150,000 individuals to analyze 3D positional conservation in 4,390 protein structures using 481,708 missense and 264,257 synonymous variants. Sixty percent of protein structures harbor at least one intolerant 3D site as defined by significant depletion of observed over expected missense variation. We established an Angstrom-scale distribution of annotated pathogenic missense variants and showed that they accumulate in proximity to the most intolerant 3D sites. Structural intolerance data correlated with experimental functional read-outs in vitro. The 3D structural intolerance analysis revealed characteristic features of ligand binding pockets, orthosteric and allosteric sites. The identification of novel functional 3D sites based on human genetic data helps to validate, rank or predict drug target binding sites in vivo.

5339: A High-Resolution Landscape of Mutations in the BCL6 Super-Enhancer in Normal Human B-Cells
more details view paper

Posted to bioRxiv 26 Sep 2019

A High-Resolution Landscape of Mutations in the BCL6 Super-Enhancer in Normal Human B-Cells
3 downloads genomics

Jiang-Cheng Shen, Ashwini S Kamath-Loeb, Brendan F. Kohrn, Keith R. Loeb, Bradley D Preston, Lawrence A Loeb

The super-enhancers (SE) of lineage-specific genes in B-cells are off-target sites of somatic hypermutation. However, the inability to detect sufficient numbers of mutations in normal human B-cells has precluded the generation of a high-resolution mutational landscape of SEs. Here, we captured and sequenced 12 B-cell SEs at single-nucleotide resolution from ten healthy individuals across diverse ethnicities. We detected a total of ~9000 subclonal mutations (allele frequencies <0.1%); of these, ~8000 are present in the BCL6 SE alone. Within the BCL6 SE, we identified three regions of clustered mutations where the mutation frequency is ~7X10-4. Mutational spectra show a predominance of C>T/G>A and A>G/T>C substitutions, consistent with the activities of activation-induced-cytidine deaminase (AID) and the A-T mutator, DNA Polymerase η, respectively, in mutagenesis in normal B-cells. Analyses of mutational signatures further corroborate the participation of these factors in this process. Single base substitution signature SBS85, SBS37, and SBS39 were found in the BCL6 SE. While SBS85 is a denoted signature of AID in lymphoid cells, the etiologies of SBS37 and SBS39 are still unknown. Our analysis suggests the contribution of error-prone DNA polymerases to the latter signatures. The high-resolution mutation landscape has enabled accurate profiling of subclonal mutations in B-cell SEs in normal individuals. By virtue of the fact that subclonal SE mutations are clonally expanded in B-cell lymphomas, our studies also offer the potential for early detection of neoplastic alterations.

5340: Transcription initiation RNAs are associated with chromatin activation mark H3K4me3
more details view paper

Posted to bioRxiv 15 Feb 2018

Transcription initiation RNAs are associated with chromatin activation mark H3K4me3
3 downloads genomics

Matthew Hobbs, Christine Ender, Gregory J. Baillie, Joanna Crawford, Kelin Ru, Ryan J Taft, John S Mattick

Transcription initiation RNAs (tiRNAs) are small, predominantly 18 nt, RNAs whose biogenesis is associated with nucleosomes adjacent to active transcription initiation sites. These loci usually contain modified histones associated with transcription initiation, including histone H3 trimethylated at lysine 4 (H3K4me3). To further characterize the relationship of tiRNAs and H3K4me3 marked nucleosomes, H3K4me3-targeted RNA:chromatin immunoprecipitations were performed in a murine macrophage cell line, and small RNA sequence libraries were constructed and subjected to deep sequencing. The H3K4me3 libraries exhibited a distinct profile of read lengths with a noticeable enrichment of sequences 17-26 nt in length, with a peak at ~18nt that included tiRNAs. These RNAs show clear enrichment of sequences that map to genomic features known to be associated with transcription initiation, including CAGE transcription initiation sites (TSSs), sites of RNAPII occupancy, and H3K4me3 sites. The distribution of sequences that map in the vicinity of TSSs is consistent with previous descriptions of tiRNAs; viz. a major peak at approximately 40 nt downstream of the TSS, and a minor broader peak approximately 150-200 nt upstream of, and on the opposite strand to, the TSS. These results show that tiRNAs are physically associated with H3K4me3-marked chromatin. tiRNAs may be markers of RNAPII pausing and it remains a possibility that their association with H3K4me3 is part of an epigenetic signaling system.

Previous page 1 . . . 265 266 267 268 269 270 271 . . . 279 Next page

PanLingua

Sign up for the Rxivist weekly newsletter! (Click here for more details.)


News