Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 62,745 bioRxiv papers from 278,406 authors.
Most downloaded bioRxiv papers, all time
in category genomics
4,231 results found. For more information, click each entry to expand.
102 downloads genomics
Tetraselmis desikacharyi is a marine alga, known as an important plankton for aquaculture as a feed organism. However, the genomic study on this class is rare. Here, we present a complete Chlorodendrophyceae chloroplast genome of T. desikacharyi, belonging to Chlorodendrophyceae with a full length of 149,934bp, characterized by a very small single-copy (SSC) region without any genes and a large inverted repeat (IR) region. A maximum-likelihood (ML) phylogenetic analysis was performed using three kinds of data comprising 50 protein-coding genes, which placed the Chlorodendrophyceae as a deep-diverging lineage of the core Chlorophyta.
102 downloads genomics
We are reporting a novel sequencing technology, RepSeq (Repetitive Sequence), that has high sensitivity, specificity and quick turn-around time. This new sequencing technology is developed by modifying traditional Sanger sequencing technology in several aspects. The first, a homopolymer tail is added to the PCR primer(s), which makes interpreting electropherograms a lot easier than that in traditional Sanger sequencing. The second, an indicator nucleotide is added at the 5’end of the homopolymer tail. In the presence of a deletion, the position of the indicator nucleotide in relation to the wild type confirms the deletion. At the same time, the indicator of the wild type serves as the internal control. Furthermore, the specific design of the PCR and/or sequencing primers will specifically enrich/select mutant alleles, which increases sensitivity and specificity significantly. Based on serial dilution studies, the analytical lower limit of detection was 1.47 copies. A total of 89 samples were tested for EGFR exon 19 deletion, of which 21 were normal blood samples and 68 were samples previously tested by either pyrosequencing or TruSeq Next Generation Sequencing Cancer Panel. There was 100 % concordance among all the samples tested. RepSeq technology has overcome the shortcomings of Sanger sequencing and offers an easy-to-use novel sequencing method for personalized precision medicine.
101 downloads genomics
The long-spine porcupinefish, Diodon holocanthus (Diodontidae, Tetraodontiformes, Actinopterygii), also known as the freckled porcupinefish, attracts great interest of ecology and economy. Its distinct characteristics including inflation reaction, spiny skin and tetradotoxin, however, have not been fully studied without a complete genome assembly. In this study, the whole genome of a single individual was sequenced using single tube-Long Fragment Read co-barcode reads, generating 154.3 Gb of paired-end data (219.8× depth). The gap was further filled using small amount of Oxford Nanopore MinION long read dataset (11.4Gb, 15.9× depth). Taking full use of long, medium, short-range of genome assembly information, the final assembled sequences with a total length of 650.02 Mb obtained contig and scaffold N50 sizes of 2.15 Mb and 8.13 Mb, respectively, despite of high repetitive content. Benchmarking Universal Single-Copy Orthologs captured 95.7% (2,474) of core genes to assess the completeness. In addition, 206.5 Mb (32.10%) of repetitive sequences were identified, and 20,840 protein-coding genes were annotated, among which 18,281 (87.72%) proteins were assigned with possible functions. This is the first demonstration of de novo genome of the porcupinefish, which will benefit downstream analysis of ontogeny, phylogeny, and evolution, and improve the exploration of its unique defensive mechanism.
101 downloads genomics
We report first complete genome sequence and analysis of an extreme drug resistance (XDR) nosocomial Stenotrophomonas maltophilia that is resistant to the mainstream drugs i.e. trimethoprim/sulfamethoxazole (TMP/SXT) and levofloxacin. Taxonogenomic analysis revealed it to be a novel genomospecies of the Stenotrophomonas maltophilia complex (Smc). Comprehensive genomic investigation revealed fourteen dynamic regions (DRs) exclusive to SM866, consisting of diverse antibiotic resistance genes, efflux pumps, heavy metal resistance, various transcriptional regulators etc. Further, resistome analysis of Smc clearly depicted SM866 to be an enriched strain, having diversified resistome consisting of sul1 and sul2 genes. Interestingly, SM866 does not have any plasmid but it harbors two diverse super-integrons of chromosomal origin. Apart from genes for sulfonamide resistance (sul1 and sul2), both of these integrons harbor an array of antibiotic resistance genes linked to ISCR (IS91-like elements common regions) elements. These integrons also harbor genes encoding resistance to commonly used disinfectants like quaternary ammonium compounds and heavy metals like mercury. Hence, isolation of a novel strain belonging to a novel sequence type (ST) and genomospecies with diverse array of resistance from a tertiary care unit of India indicates extent and nature of selection pressure driving XDRs in hospital settings. There is an urgent need to employ complete genome based investigation using emerging technologies for tracking emergence of XDR at the global level and designing strategies of sanitization and antibiotic regime.
100 downloads genomics
Pathogenic effectors inhibit plant resistance responses by interfering with intracellular signaling mechanisms. Plant Nucleotide-binding, Leucine-rich repeat Receptors (NLRs) have evolved highly variable effector-recognition sites to detect these effectors. While many NLRs utilize variable Leucine-Rich Repeats (LRRs) to bind to effectors, some have gained Integrated Domains (IDs) necessary for receptor activation or downstream signaling. While a few studies have identified IDs within NLRs, the homology and regulation of these genes have yet to be elucidated. We identified a diverse set of wheat NLR-ID fusion proteins as candidates for NLR functional diversification through ID effector recognition or signal transduction. NLR-ID diversity corresponds directly with the various signaling components essential to defense responses, expanding the potential functions for immune receptors and removing the need for intermediate signaling factors that are often targeted by effectors. ID homologs (>80% similarity) in other grasses indicate that these domains originated as functional, non-NLR-encoding genes and were incorporated into NLR-encoding genes through duplication. Multiple NLR-ID genes encode experimentally verified alternative transcripts that include or exclude IDs. This indicates that plants employ alternative splicing to regulate IDs, possibly using them as baits, decoys, and functional signaling components. Future studies should aim to elucidate differential expression of NLR-ID alternative transcripts.
100 downloads genomics
Due to the public health importance of flagellar genes for typing, it is important to understand mechanisms that could alter their expression or presence. Phenotypic novelty in flagellar genes arise predominately through accumulation of mutations but horizontal transfer is known to occur. A linear plasmid termed pBSSB1 previously identified in Salmonella Typhi, was found to encode a flagellar operon that can mediate phase variation, which results in the rare z66 flagella phenotype. The identification and tracking of homologs of pBSSB1 is limited because it falls outside the normal replicon typing schemes for plasmids. Here we report the generation of nine new pBSSB1-family sequences using Illumina and Nanopore sequence data. Homologs of pBSSB1 were identified in 154 genomes representing 25 distinct serotypes from 67,758 Salmonella public genomes. Pangenome analysis of pBSSB1-family contigs was performed using Roary and we identified three core genes amenable to a minimal MLST scheme. Population structure analysis based on the newly developed MLST scheme identified three major lineages representing 35 sequence types, and the distribution of these sequence types was found to span multiple serovars across the globe. This MLST scheme has shown utility in tracking and subtyping pBSSB1-family plasmids and it has been incorporated into the plasmid MLST database under the name “pBSSB1-family”.
99 downloads genomics
Different phenotypes of normal cells might influence genetic profiles, epigenetic profiles, and tumorigenicities of their transformed derivatives. In this study, we investigated whether the whole mitochondrial genome of immortalized cells can be attributed to different phenotypes (stem vs non-stem) of their normal epithelial cell originators. To accurately determine mutations, we employed Duplex Sequencing, which exhibits the lowest error rates among currently available DNA sequencing methods. Our results indicate that the vast majority of observed mutations of the whole mitochondrial DNA occur at low-frequency (rare mutations). The most prevalent rare mutation types are C→T/G→A and A→G/T→C transitions. Frequencies and spectra of homoplasmic point mutations are virtually identical between stem cell-derived immortalized (SV1) cells and non-stem cell-derived immortalized (SV22) cells, verifying that both cell types were derived from the same woman. However, frequencies of rare point mutations are significantly lower in SV1 cells (5.79x10-5) than in SV22 cells (1.16x10-4). Additionally, the predicted pathogenicity for rare mutations in the mitochondrial tRNA genes is significantly lower (by 2.5-fold) in SV1 cells than in SV22 cells. Our findings suggest that the immortalization of normal cells with stem cell features leads to decreased mitochondrial mutagenesis, particularly in noncoding RNA regions. The mutation spectra and mutations specific to stem cell-derived immortalized cells (vs non-stem cell derived) have implications in characterizing heterogeneity of tumors and understanding the role of mitochondrial mutations in immortalization and transformation of human cells.
99 downloads genomics
The functions of glycogen synthase kinase 3 (GSK3) have been well-studied in animal, plant and yeast. However, information on its roles in basidiomycetous fungi is still limited. In this study, we used the model mushroom Coprinopsis cinerea to study the characteristics of GSK3 in fruiting body development. Application of a GSK3 inhibitor Lithium chloride (LiCl) induced enhanced mycelial growth and inhibited fruiting body formation in C. cinerea. RNA-Seq of LiCl-treated C. cinerea resulted in a total of 14128 unigenes. There were 1210 differentially expressed genes (DEGs) between the LiCl-treated samples and control samples in the mycelium stage (first time point), whereas 1402 DEGs were detected at the stage when the control samples formed hyphal knots and the treatment samples were still in mycelium (second time point). Kyoto Encyclopedia of Genes and Genome (KEGG) pathway enrichment analysis of the DEGs revealed significant associations between the enhanced mycelium growth in LiCl treated C. cinerea and metabolism pathways such as “biosynthesis of secondary metabolite” and “biosynthesis of antibiotics”. In addition, DEGs involved in cellular process pathways, including “cell cycle-yeast” and “meiosis-yeast”, were identified in C. cinerea fruiting body formation suppressed by LiCl under favorable environmental conditions. Our findings suggest that GSK3 activity is essential for fruiting body formation as it affects the expression of fruiting body induction genes and genes in cellular processes. Further functional studies of GSK3 in basidiomycetous fungi may help understand the relationships between environmental signals and fruiting body development.
98 downloads genomics
Single-cell (scSeq) and single-nucleus sequencing (snSeq) are powerful tools to investigate cancer genomics at single cell resolution. Multiple studies have recently illuminated intratumoral heterogeneity in glioblastoma, however, the majority focused on molecular complexity of tumor cells, without considering unexplored host cell types that contribute to the microenvironment around tumor. To address the glioblastoma microenvironment composition and potential tumor-host interactions, we performed deep coverage sequencing of freshly resected primary GBM patient tissue without implementing any tumor enrichment strategies. The sequencing resulted in 902 cells and 1186 nuclei, respectively, passing quality control and with low mitochondrial gene percentage. We customized reference transcriptome by listing gene transcript loci as exons to take into account immature RNA, which greatly improved the alignment rate for single-nucleus data. We applied Cell Ranger pipelines (Version 3.0.2) and Seurat package (Version 2.3.1) and discovered 10 clusters in both scSeq and snSeq. Pathway analysis of each cluster signature in scSeq data along with known GBM microenvironment cell signatures revealed glioma tumor population along with surrounding microglia/macrophages, astrocytes, pericytes, oligodendrocytes, T cells and endothelial cells. The analysis of snSeq was able to capture the majority of cell types from patient tissues (tumor and microenvironment cells), but interestingly presented different cell type composition in microenvironment cell types such as microglia/macrophages. Integrating single-cell and single-nucleus transcriptomic data using canonical correlation analysis facilitated a comparison of snSeq and scSeq, contrasting depiction for certain cell types (e.g. NKX6-2 gene in Oligodendrocytes). Differential analysis of pathways between tumor and microenvironment cells unveiled potentially rewired pathways such as double strand break repair pathway. Our results demonstrate the cellular diversity of brain tumor microenvironment and lay a foundation to further investigate the individual tumor and host cell transcriptomes that are influenced not only by their cell identity but also by their interaction with surrounding microenvironment.
97 downloads genomics
Norelle L. Sherry, Robyn S Lee, Claire L Gorrie, Jason C Kwong, Rhonda L Stuart, Tony Korman, Caroline Marshall, Charlie Higgs, Hui Tat Chan, Maryza Graham, Paul Johnson, Marcel Leroi, Caroline Reed, Michael Richards, Monica A Slavin, Leon J Worth, Benjamin P. Howden, M. Lindsay Grayson, Controlling Superbugs Study Group
Background: Multidrug-resistant organisms (MDROs) disproportionately affect hospitalized patients due to the combination of comorbidities, frequent antimicrobial use, and in-hospital MDRO transmission. Identification of MDRO transmission by hospital microbiology laboratories is difficult due to limitations of existing typing methods. Methods: We conducted a prospective multicenter genomics implementation study (8 hospitals, 2800 beds) from 24th April to 18th June 2017 in Melbourne, Australia. Clinical and screening isolates from hospital inpatients were collected for six MDROs (vanA VRE, MRSA, ESBL E. coli [ESBL-Ec] and Klebsiella pneumoniae [ESBL-Kp], and carbapenem-resistant Pseudomonas aeruginosa [CRPa] and Acinetobacter baumannii [CRAb]), sequenced (Illumina NextSeq) and analyzed using open-source tools. MDRO transmission was assessed by genomics (core SNP phylogeny, grouped by species and ST) and compared to epidemiologic data. Results: 408 isolates were collected from 358 patients; 47.5% were screening isolates. ESBL-Ec was most common (52.5%), then MRSA (21.6%), vanA VRE (15.7%) and ESBL-Kp (7.6%). We define the transmission rate for each MDRO by genomics and epidemiology; 31.6% of all study patients had potential genomic links to other study isolates; 86% of these were confirmed by epidemiologic links (probable or possible transmission). The highest transmission rates occurred with vanA VRE (88.4% of patients). Conclusions: Combining genomics with high-quality epidemiologic data gives substantial insights into the burden and distribution of critical MDROs in hospitals, including in-hospital transmission. By defining transmission rates by genomics, we hope to enable comparisons over time and between sites, and introduce this as a new outcome measure to assess the efficacy of infection control interventions.
97 downloads genomics
Many studies exclude loci exhibiting linkage disequilibrium (LD); however, high LD can signal reduced recombination around genomic features such as chromosome inversions or sex-determining regions. Chromosome inversions and sex-determining regions are often involved in adaptation, allowing for the inheritance of co-adapted gene complexes and for the resolution of sexually antagonistic selection through sex-specific partitioning of genetic variants. Genomic features such as these can escape detection when loci with LD are removed; in addition, failing to account for these features can introduce bias to analyses. We examined patterns of LD using network analysis to identify an overlapping chromosome inversion and sex-determining region in chum salmon. The signal of the inversion was strong enough to show up as false population substructure when the entire dataset was analyzed, while the signal of the sex-determining region was only obvious after restricting genetic analysis to the sex chromosome. Understanding the extent and geographic distribution of inversions is now a critically important part of genetic analyses of natural populations. The results of this study highlight the importance of analyzing and understanding patterns of LD in genomic dataset and the perils of ignoring or excluding loci exhibiting LD.
97 downloads genomics
Human herpesvirus-6A and 6B (HHV-6A, HHV-6B) are human viruses capable of chromosomal integration. Approximately 1% of the human population carry one copy of HHV-6A/B integrated into every cell in their body, referred to as inherited chromosomally integrated HHV-6A/B (iciHHV-6A/B). Whether iciHHV-6A/B is transcriptionally active in vivo and how it shapes the immunological response is still unclear. Here, we screened DNA-Seq and RNA-Seq data for 650 individuals available through the Genotype-Tissue Expression (GTEx) project and identified 2 iciHHV-6A and 4 iciHHV-6B positive individuals. When corresponding tissue-specific gene expression signatures were analyzed, low levels HHV-6A/B gene expression was found across multiple tissues, with the highest levels of gene expression in the brain (specifically for iciHHV-6A), testis, and esophagus. U90 and U100 were the most highly expressed HHV-6 genes in both iciHHV-6A/B individuals. To assess whether this gene expression influences the HHV-6A/B immune response, a cohort of 15,498 subjects was screened and 85 iciHHV-6A/B+subjects were identified. Plasma samples from iciHHV-6A/B+and age- and sex-matched controls were analyzed for antibodies to control antigens or HHV-6A/B antigens. Our results indicate that iciHHV-6A/B+ subjects have significantly more antibodies against the U90 gene product (IE1) relative to non-iciHHV-6 individuals. Antibody responses against EBV and FLU antigens or HHV-6A/B gene products either not expressed or expressed at low levels, such as U47, U57 or U72, were identical between controls and iciHHV-6A/B+ subjects. These results argue that spontaneous gene expression from integrated HHV-6A/B leads to an increase in antigenic burden that translates into a more robust HHV-6A/B-specific antibody response.
97 downloads genomics
Hfq is a ubiquitous Sm-like RNA binding protein in bacteria involved in physiological fitness and pathogenesis, while its in vivo binding natures still remain elusive. Here we reported the first study of the Hfq-bound RNAs map in Yersinia pestis, the causative agent of a kind of plague, by using Cross-Linking Immunoprecipitation coupled with deep sequencing (CLIP-Seq) approach. We show that Hfq binds over 80% mRNAs of Y. pestis, and also globally binds non-coding sRNAs encoded by the intergenic, antisense, and the 3' regions of mRNAs. Hfq U-rich stretch is highly enriched in sRNAs, while motifs partially complementary to AGAAUAA and GGGGAUUA are enriched in both mRNAs and sRNAs. Hfq binding motifs are enriched at both terminal sites and in the gene body of mRNAs. Surprisingly, a large fraction of the sRNA and mRNA regions bound by Hfq and those downstream are destabilized, likely via a 5'P-activated RNase E degradation pathway and consistent with Hfq-facilitated sRNA-mRNA base-pairing and the coupled degradation in Y. pestis. These results together have presented a high-quality Hfq-RNA interaction map in Y. pestis, which should be important for further deciphering the regulatory role of Hfq-sRNAs in Y. pestis.
97 downloads genomics
Environmental variation in the amount of resources available to populations challenge individuals to optimize the allocation of those resources to key fitness functions. This coordination of resource allocation relative to resource availability is commonly attributed to key nutrient sensing pathways , mainly the insulin/TOR signaling pathway in laboratory model organisms. However, the genetic basis of diet-induced variation in gene expression is less clear. To describe the natural genetic variation underlying nutrient-dependent differences, we used an outbred panel derived from a multiparental population, the Drosophila Population Resource. We analyzed RNA sequence data from multiple female tissue samples, dissected from flies reared in three nutritional conditions: high sugar (HS), dietary restriction (DR), and control (C). A large proportion of genes in the experiment (19.6% or 2471 genes) were significantly differentially expressed for the effect of diet, 7.8% (978) for the effect of the interaction between diet and tissue type (LRT, adj. P < 0.05). Interestingly, we observed similar patterns of gene expression in DR and HS treated flies, response likely due to diet component ratios. Hierarchical clustering identified 21 robust gene modules showing intra-modularly similar patterns of expression across diets, all of which were highly significant for diet or the diet-tissue interaction effects (FDR adj.P < 0.05). Gene set enrichment analysis for different diet-tissue combinations revealed pathways and gene ontology (GO) terms (two-sample t-test, FDR q-value < 0.05). GO analysis on individual coexpressed modules likewise showed a large number of terms encompassing a large number of cellular and nuclear processes were observed (Fisher exact test, Padj. < 0.01). Although a handful of genes in the IIS/TOR pathway including Ilp5, Rheb, and Sirt2 showed significant elevation in expression, known key genes such as InR, chico, and other Ilps, and the nutrient-sensing pathways were not observed. Our results suggest that a broader network of pathways and gene networks mediate the diet response in our population. These results have important implications for future studies of diet responses in natural populations.
97 downloads genomics
Background: Differences between an individual's estimated epigenetic gestational age (EGA) and their actual gestational age (GA) are defined as gestational age acceleration (GAA). GAA is associated with increased birthweight and birth length. Whether these associations persist through childhood is yet to be investigated. Methods: We examined the association between GAA and trajectories of height and weight from birth to 10 years (n=785) in a British birth cohort study, the Avon Longitudinal Study of Parents and Children (ALSPAC). EGA of participants was estimated using DNA methylation data from cord blood using a recently-developed prediction model. GA of participants was gathered in ALSPAC from clinical records and was measured from last menstrual period (LMP) for most participants. GAA of participants, measured in weeks, was calculated as the residuals from a regression model of EGA on actual GA. Height and weight were obtained from several sources including birth records, research clinics, routine child health clinics, links to health visitor records and parent-reported measures from questionnaires. Analyses were performed using linear spline multilevel models and adjusted for maternal age, maternal pre-pregnancy BMI, maternal smoking during pregnancy and maternal education. Results: In adjusted analyses, offspring with a one-week greater GAA were born on average 0.14 kg heavier (95% Confidence Interval (CI) 0.09, 0.19) and 0.55 cm taller (95% CI 0.33, 0.78) at birth. These differences in weight persisted up to approximately age 9 months but thereafter began to attenuate and reduce in magnitude. From age 5 years onwards, the association between GAA and weight reversed such that GAA was associated with lower weight and this association strengthened with age (mean difference at age 10 years -0.60 kg (95% CI, -1.19, -0.01)). Differences in height persisted only up to age 9 months (mean difference at 9 months 0.15 cm, (95% CI -0.09, 0.39)). From age 9 months to age 10 years, offspring with a one-week greater GAA were of comparable height to those with no GAA (mean difference at age 10 years -0.07 cm, (95% CI -0.64, 0.50)). Conclusions: Gestational age acceleration is associated with increased birth weight and length and these differences persist to age 9 months. From 5 years onwards, the association of GAA and weight reverses such that by age 10 years greater GAA is associated with lower childhood weight. Further work is required to examine whether the weight effects of GAA strengthen further through adolescence and into early adulthood.
97 downloads genomics
Effective resource management depends on our ability to partition diversity into biologically meaningful units. Recent evolutionary divergence, however, can often lead to ambiguity in morphological and genetic differentiation, complicating the delineation of valid conservation units. Such is the case with the "coregonine problem," where recent post-glacial radiations of coregonines into lacustrine habitats resulted in the evolution of numerous species flocks, often with ambiguous taxonomy. The application of genomics methods is beginning to shed light on this problem and the evolutionary mechanisms underlying divergence in these ecologically and economically important fishes. Here, we used restriction site-associated DNA (RAD) sequencing to examine genetic diversity and differentiation among sympatric species in the Coregonus artedi complex in the Apostle Islands of Lake Superior, the largest lake in the Laurentian Great Lakes. Using 29,068 SNPs, we were not only able to clearly distinguish the three most common forms for the first time, but putative hybrids and potentially mis-identified specimens as well. Assignment rates to form with our RAD data were 93-100% with the only mis-assignments arising from putative F1 hybrids, an improvement from 62-77% using microsatellites. Estimates of pairwise differentiation ( F ST: 0.045-0.056) were large given the detection of hybrids, suggesting that hybridization among forms may not be successful beyond the F1 state. We also used a newly built C. artedi linkage map to look for islands of adaptive genetic divergence among forms and found widespread differentiation across the genome, a pattern indicative of long-term drift, suggesting that these forms have been reproductively isolated for a substantial amount of time. The results of this study provide valuable information that can be applied to develop well-informed management strategies and stress the importance of re-evaluating conservation units with genomic tools to ensure they accurately reflect species diversity.
97 downloads genomics
Glucose-induced insulin secretion, a peculiar property of fully mature β-cells, is only achieved after birth and is preceded by a phase of intense proliferation. These events occurring in the neonatal period are decisive for the establishment of an appropriate functional β-cell mass that provides the required insulin throughout life. However, key regulators of gene expression involved in cellular reprogramming along pancreatic islet maturation remain to be elucidated. The present study addressed this issue by mapping open chromatin regions in newborn versus adult rat islets using the ATAC-seq assay. Accessible regions were then correlated with the expression profiles of mRNAs to unveil the regulatory networks governing functional islet maturation. This led to the identification of Scrt1, a novel transcriptional repressor controlling β-cell proliferation.
96 downloads genomics
Pseudomonas aeruginosa is one of the most common pathogens related to healthcare-associated infections. The Brazilian isolate, named CCBH4851, is a multidrug-resistant clone belonging to the sequence type 277. The antimicrobial resistance mechanisms of the CCBH4851 strain are associated with the presence of bla SPM-1 gene, encoding a metallo-beta-lactamase, in addition to other exogenously acquired genes. Whole-genome sequencing studies focusing on emerging pathogens are essential to identify physiological key aspects that may lead to the exposure of new targets for therapy. This study was designed to characterize the genome of Pseudomonas aeruginosa CCBH4851 through the detection of genomic features and genome comparison with other Pseudomonas aeruginosa strains. The CCBH4851 closed genome showed features that were consistent with data reported for the specie. However, comparative genomics revealed the absence of genes important for pathogenesis. On the other hand, CCBH4851 genome contained acquired genomic islands that carry additional virulence and antimicrobial resistance-related genes. The presence of single nucleotide polymorphisms in the core genome, mainly those located in resistance-associated genes, suggests that these mutations could influence the multidrug-resistant behavior of CCBH4851. Overall, the characterization of Pseudomonas aeruginosa CCBH4851 complete genome revealed several features that could directly impact the profile of virulence and antibiotic resistance of this pathogen in infectious outbreaks.
96 downloads genomics
Carnobacterium maltaromaticum is a well-known pathogen of bony fish. More recently, C. maltaromaticum have been isolated from the brain and inner ear of disorientated and stranded common thresher (Alopias vulpinus) and salmon shark (Lamna ditropis). While thresher shark strandings are recent, salmon sharks have been stranding for decades, suggesting a long-term association between C. maltaromaticum and sharks. Interestingly, some strains of C. maltaromaticum are used by the food industry for their probiotic and antimicrobial activity. Here, we sequenced the genome of 9 C. maltaromaticum strains (SK-isolates) from diseased common thresher and salmon sharks and compared them to other C. maltaromaticum strains in order to identify the genomic signatures that differentiate the disease-associated from the innocuous C. maltaromaticum isolates. SK strains formed a monophyletic clade, with a conserved gene repertoire, and shared a high degree of pseudogenization even though isolates were from different shark species, locations, and across years. In addition, these strains displayed few virulence associated genes and unique genomic regions, some resulting from horizontal gene transfer. The association of diseased sharks and SK strains suggests their role as potential pathogens. Although the high degree of pseudogenization suggests a transition to a host-adapted lifestyle, a set of conserved functional genes highlights the need of essential functions required for a host-independent life style. Globally, this work identifies specific genomic signatures of C. maltaromaticum strains isolated from infected sharks, provides the framework to elucidate the role of SK strains in the development of the disease in sharks, and further investigate the dissemination of SK strains in populations of wild fish.
96 downloads genomics
The origin and evolution of genes that have common base pairs (overlapping genes) are of particular interest due to their influencing each other. Especially intriguing are gene pairs with long overlaps. In prokaryotes, co-directional overlaps longer than 60 bp were shown to be erroneous except for some instances. A few antiparallel prokaryotic genes with long overlaps were described in the literature. We have analyzed putative long antiparallel overlapping genes to determine whether open reading frames (ORFs) located opposite to genes (antiparallel ORFs) can be protein-coding genes. We have confirmed that long antiparallel ORFs (AORFs) are observed reliably to be more frequent than expected. There are 10 472 000 AORFs in 929 analyzed genomes with overlap length more than 180 bp. Stop codons on the opposite to the coding strand are avoided in about 2 850 cases with Benjamini-Hochberg threshold 0.01. Using Ka/Ks ratio calculations, we have revealed that long AORFs do not affect the type of selection acting on genes in a vast majority of cases. This observation indicates that long AORFs translations commonly are not under negative selection. The demonstrative example is 282 longer than 1 800 bp AORFs found opposite to extremely conserved dnaK genes. Translations of these AORFs were annotated "glutamate dehydrogenases" and were included into Pfam database as third protein family of glutamate dehydrogenases, PF10712. Ka/Ks analysis has demonstrated that if these translations correspond to proteins, they are not subjected by negative selection while dnaK genes are under strong stabilizing selection. Moreover, we have found other arguments against the hypothesis that these AORFs encode essential proteins, proteins indispensable for cellular machinery. However, some AORFs, in particular, dnaK related, have been found slightly resisting to synonymous changes in genes. It indicates the possibility of their translation. We speculate that translations of certain AORFs might have a functional role other than encoding essential proteins. Essential genes are unlikely to be encoded by AORFs in prokaryotic genomes. Nevertheless, some AORFs might have biological significance associated with their translations.
- Top preprints of 2018
- Paper search
- Author leaderboards
- Overall metrics
- The API
- Email newsletter
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!