Rxivist logo

Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 84,557 bioRxiv papers from 363,914 authors.

Most downloaded bioRxiv papers, all time

in category bioinformatics

7,962 results found. For more information, click each entry to expand.

6001: Proteome-Scale Relationships Between Local Amino Acid Composition and Protein Fates and Functions
more details view paper

Posted to bioRxiv 04 Jun 2018

Proteome-Scale Relationships Between Local Amino Acid Composition and Protein Fates and Functions
268 downloads bioinformatics

Sean M. Cascarina, Eric D. Ross

Proteins with low-complexity domains continue to emerge as key players in both normal and pathological cellular processes. Although low-complexity domains are often grouped into a single class, individual low-complexity domains can differ substantially with respect to amino acid composition. These differences may strongly influence the physical properties, cellular regulation, and molecular functions of low-complexity domains. Therefore, we developed a bioinformatic approach to explore relationships between amino acid composition, protein metabolism, and protein function. We find that local compositional enrichment within protein sequences affects the translation efficiency, abundance, half-life, subcellular localization, and molecular functions of proteins on a proteome-wide scale. However, these effects depend upon the type of amino acid enriched in a given sequence, highlighting the importance of distinguishing between different types of low-complexity domains. Furthermore, many of these effects are discernible at amino acid compositions below those required for classification as low-complexity or statistically-biased by traditional methods and in the absence of homopolymeric amino acid repeats, indicating that thresholds employed by classical methods may not reflect biologically relevant criteria. Application of our analyses to composition-driven processes, such as the formation of membraneless organelles, reveals distinct composition profiles even for closely related organelles. Collectively, these results provide a unique perspective and detailed insights into relationships between amino acid composition, protein metabolism, and protein functions.

6002: GPU accelerated partial order multiple sequence alignment for long reads self-correction
more details view paper

Posted to bioRxiv 15 Feb 2020

GPU accelerated partial order multiple sequence alignment for long reads self-correction
268 downloads bioinformatics

Francesco Peverelli, Lorenzo Di Tucci, Marco D. Santambrogio, Nan Ding, Steven Hofmeyr, Aydın Buluç, Leonid Oliker, Katherine Yelick

As third generation sequencing technologies become more reliable and widely used to solve several genome-related problems, self-correction of long reads is becoming the preferred method to reduce the error rate of Pacific Biosciences and Oxford Nanopore long reads, that is now around 10-12%. Several of these self-correction methods rely on some form of Multiple Sequence Alignment (MSA) to obtain a consensus sequence for the original reads. In particular, error-correction tools such as RACON and CONSENT use Partial Order (PO) graph alignment to accomplish this task. PO graph alignment, which is computationally more expensive than optimal global pairwise alignment between two sequences, needs to be performed several times for each read during the error correction process. GPUs have proven very effective in accelerating several compute-intensive tasks in different scientific fields. We harnessed the power of these architectures to accelerate the error correction process of existing self-correction tools, to improve the efficiency of this step of genome analysis. In this paper, we introduce a GPU-accelerated version of the PO alignment presented in the POA v2 software library, implemented on an NVIDIA Tesla V100 GPU. We obtain up to 6.5x speedup compared to 64 CPU threads run on two 2.3 GHz 16-core Intel Xeon Processors E5-2698 v3. In our implementation we focused on the alignment of smaller sequences, as the CONSENT segmentation strategy based on k-mer chaining provides an optimal opportunity to exploit the parallel-processing power of GPUs. To demonstrate this, we have integrated our kernel in the CONSENT software. This accelerated version of CONSENT provides a speedup for the whole error correction step that ranges from 1.95x to 8.5x depending on the input reads.

6003: The Resistome: updating a standardized resource for analyzing resistance phenotypes
more details view paper

Posted to bioRxiv 17 Sep 2018

The Resistome: updating a standardized resource for analyzing resistance phenotypes
268 downloads bioinformatics

J.D. Winkler

Advances in genome engineering have enabled routine engineering and interrogation of microbial resistance on a scale previously impossible, but developing an integrated understanding of resistance from these data remains challenging. As part of our continued efforts to address this challenge, we present a significant update of our previously released Resistome database of standardized genotype-resistance phenotype relationships, along with a new web interface to enable facile searches of genomic, transcriptomic, and phenotypic data within the database. Revisiting our previous analysis of resistance, we again find distinct mutational biases associated with random selection versus genome-scale libraries, along with pervasive pleiotropy among resistant mutants. Attempts to predict mutant phenotypes using machine learning identified the lack of comprehensive phenotype screening and small size of the Resistome corpus as challenges for effective model training. Overall, the Resistome represents a unique platform for understanding the interconnections between both current and future resistant mutants, and is available for use at https://resistome-web-interface.herokuapp.com.

6004: Classification and monomer-by-monomer annotation of suprachromosomal family 1 alpha satellite higher-order repeats in hg38 human genome assembly
more details view paper

Posted to bioRxiv 07 Sep 2018

Classification and monomer-by-monomer annotation of suprachromosomal family 1 alpha satellite higher-order repeats in hg38 human genome assembly
268 downloads bioinformatics

L. Uralsky, V.A. Shepelev, A.A. Alexandrov, Y.B. Yurov, E.I. Rogaev, I.A. Alexandrov

In the latest hg38 human genome assembly, centromeric gaps has been filled in by alpha satellite (AS) reference models (RMs) which are statistical representations of homogeneous higher-order repeat (HOR) arrays that make up the bulk of the centromeric regions. We studied these models to compose an atlas of human HORs where each monomer of a HOR could be characterized and represented by a number of its polymorphic sequence variants. We further used these data and HMMER sequence analysis platform to annotate AS HORs in the assembly. This led to discovery and annotation of a new type of low copy number highly divergent HORs which were not represented by RMs. The annotation can be viewed as UCSC Genome Browser custom track (the HOR-track) and used together with our previous annotation of AS SFs in the same assembly where each AS monomer can be viewed in its genomic context together with its classification into one of the 5 major SFs (the SF-track). To catalog the diversity of AS HORs in the human genome we introduced a new naming system. Each HOR received a name which showed its SF, chromosomal location and index number. Here we present the first installment of the HOR-track covering only the 17 HORs that belong to SF1 which forms live functional centromeres in chromosomes 1, 3, 5, 6, 7, 10, 12, 16 and 19 and also a large number of minor dead HOR domains, both homogeneous (pseudo) and divergent (relic). The 4 newly discovered divergent SF1 HORs have provided the missing links in SF1 early evolution and substantiated its partition into 2 generations, archaic and modern, which we reported earlier. Additionally, we demonstrated that monomer-by-monomer HOR annotation was useful for mapping and quantification of various structural variants of AS HORs which would be important for studies of inter-individual polymorphism of AS including centromeric functional epialleles.

6005: Ensembles from ordered and disordered proteins reveal similar structural constraints during evolution
more details view paper

Posted to bioRxiv 13 Nov 2018

Ensembles from ordered and disordered proteins reveal similar structural constraints during evolution
268 downloads bioinformatics

Julia Marchetti, Alexander Miguel Monzon, Silvio C.E. Tosatto, Gustavo Parisi, María Silvina Fornasari

Inter-residue contacts determine the structural properties for each conformer in the ensembles describing the native state of proteins. Structural constraints during evolution could then provide biologically relevant information about the conformational ensembles and their relationship with protein function. Here, we studied the proportion of sites evolving under structural constraints in two very different types of ensembles, those coming from ordered or disordered proteins. Using a structurally constrained model of protein evolution we found that both types of ensembles show comparable, near 40%, number of positions evolving under structural constraints. Among these sites, ~68% are in disordered regions and ~57% of them show long-range inter-residue contacts. Also, we found that disordered ensembles are redundant in reference to their structurally constrained evolutionary information and could be described on average with ~11 conformers. Despite the different complexity of the studied ensembles and proteins, the similar constraints reveal a comparable level of selective pressure to maintain their biological functions. These results highlight the importance of the evolutionary information to recover meaningful biological information to further characterize conformational ensembles.

6006: Detecting Inversions with PCA in the Presence of Population Structure
more details view paper

Posted to bioRxiv 15 Aug 2019

Detecting Inversions with PCA in the Presence of Population Structure
268 downloads bioinformatics

Ronald J Nowling, Krystal R Manke, Scott J. Emrich

Chromosomal inversions are associated with reproductive isolation and adaptation in insects such as Drosophila melanogaster and the malaria vectors Anopheles gambiae and Anopheles coluzzii. While methods based on read alignment have been useful in humans for detecting inversions, these methods are less successful in insects due to long repeated sequences at the breakpoints. Alternatively, inversions can be detected using principal component analysis (PCA) of single nucleotide polymorphisms (SNPs). We apply PCA-based inversion detection to a simulated data set and real data from multiple insect species, which vary in complexity from a single inversion in samples drawn from a single population to analyzing multiple overlapping inversions occurring in closely-related species, samples of which that were generated from multiple geographic locations. We show empirically that proper analysis of these data can be challenging when multiple inversions or populations are present, and that our alternative framework is more robust in these more difficult scenarios.

6007: Patch-DCA: Improved Protein Interface Prediction by utilizing Structural Information and Clustering DCA scores
more details view paper

Posted to bioRxiv 02 Jun 2019

Patch-DCA: Improved Protein Interface Prediction by utilizing Structural Information and Clustering DCA scores
268 downloads bioinformatics

Amir Vajdi, Kourosh Zarringhalam, Nurit Haspel

Over the past decade there have been impressive advances in determining the 3D structures of protein complexes. However, there are still many complexes with unknown structures, even when the structures of the individual proteins are known. The advent of protein sequence information provides an opportunity to leverage evolutionary information to enhance the accuracy of protein-protein interface prediction. To this end, several statistical and machine learning methods have been proposed. In particular, direct coupling analysis has recently emerged as a promising approach for identification of protein contact maps from sequential information. However, the ability of these methods to detect protein-protein inter-residue contacts remains relatively limited. In this work, we propose a method to integrate sequential and co-evolution information with structural and functional information to increase the performance of protein-protein interface prediction. Further, we present a post-processing clustering method that improves the average relative F1 score by 70 % and 24 % and the precision by 80 % and 36 % in comparison with two state-of-the-art methods PSICOV and GREMLIN.

6008: RBPSponge: genome-wide identification of lncRNAs that sponge RBPs
more details view paper

Posted to bioRxiv 15 Mar 2019

RBPSponge: genome-wide identification of lncRNAs that sponge RBPs
267 downloads bioinformatics

Saber HafezQorani, Aissa Houdjedj, Mehmet Arici, Abdesselam Said, Hilal Kazan

Long noncoding RNAs (lncRNAs) can act as molecular sponges for an RNA-binding protein (RBP) through their RBP binding sites, thereby modulating the expression of all target genes of the corresponding RBP of interest. Here, we present a web tool named RBPSponge to explore lncRNAs based on their potential to act as a sponge for an RBP of interest. RBPSponge identifies the occurrences of RBP binding sites and CLIP peaks on lncRNAs, and enables users to run statistical analyses to investigate the regulatory network between lncRNAs, RBPs and targets of RBPs.

6009: SimiC: A Single Cell Gene Regulatory Network Inference method with Similarity Constraints
more details view paper

Posted to bioRxiv 04 Apr 2020

SimiC: A Single Cell Gene Regulatory Network Inference method with Similarity Constraints
267 downloads bioinformatics

Jianhao Peng, Ullas V. Chembazhi, Sushant Bangru, Ian M. Traniello, Auinash Kalsotra, Idoia Ochoa, Mikel Hernaez

With the use of single-cell RNA sequencing (scRNA-Seq) technologies, it is now possible to acquire gene expression data for each individual cell in samples containing up to millions of cells. These cells can be further grouped into different states along an inferred cell differentiation path, which are potentially characterized by similar, but distinct enough, gene regulatory networks (GRNs). Hence, it would be desirable for scRNA-Seq GRN inference methods to capture the GRN dynamics across cell states. However, current GRN inference methods produce a unique GRN per input dataset (or independent GRNs per cell state), failing to capture these regulatory dynamics. We propose a novel single-cell GRN inference method, named SimiC, that jointly infers the GRNs corresponding to each state. SimiC models the GRN inference problem as a LASSO optimization problem with an added similarity constraint, on the GRNs associated with contiguous cell states, that captures the inter-cell-state homogeneity. We show on a mouse hepatocyte single-cell data generated after partial hepatectomy that, contrary to previous GRN methods for scRNA-Seq data, SimiC is able to capture the transcription factor (TF) dynamics across liver regeneration, as well as the cell-level behavior for the regulatory program of each TF across cell states. In addition, on a honey bee scRNA-Seq experiment, SimiC is able to capture the increased heterogeneity of cells on whole-brain tissue with respect to a regional analysis tissue, and the TFs associated specifically to each sequenced tissue.

6010: VenomKB v2.0: A knowledge repository for computational toxinology
more details view paper

Posted to bioRxiv 06 Apr 2018

VenomKB v2.0: A knowledge repository for computational toxinology
267 downloads bioinformatics

Joseph D. Romano, Victor Nwankwo, Nicholas Tatonetti

Motivation: Venom peptides comprise one of the richest sources of bioactive compounds available for drug discovery. However, venom data and knowledge are fragmentary and poorly structured, and fail to capitalize on the important characteristics of venoms that make them so interesting to the biomedical community. Results: We present VenomKB v2.0, a new open-access resource for knowledge representation and retrieval of venom bioactivities, sequences, structures, and classifications. VenomKB provides a complete infrastructure for computational toxinology, with a focus on drug discovery and effects that venoms have on the human body. VenomKB is accompanied by a suite of tools for programmatic access, and, in this article, we highlight scenarios demonstrating its usefulness and novel contributions to toxinology, pharmacology, and informatics. Availability: VenomKB can be accessed online at http://venomkb.org/, and the code can be found at https://github.com/tatonetti-lab/venomkb/. All code and data are available under open-source and open-access licenses.

6011: Nationwide prediction of type 2 diabetes comorbidities
more details view paper

Posted to bioRxiv 14 Jun 2019

Nationwide prediction of type 2 diabetes comorbidities
267 downloads bioinformatics

Piotr Dworzynski, Martin Aasbrenn, Klaus Rostgaard, Mads Melbye, Thomas Alexander Gerds, Henrik Hjalgrim, Tune H Pers

Identification of individuals at risk of developing disease comorbidities represents an important task in tackling the growing personal and societal burdens associated with chronic diseases. We employed machine learning techniques to investigate to what extent data from longitudinal, nationwide Danish health registers can be used to predict individuals at high risk of developing type 2 diabetes (T2D) comorbidities. Based on register data spanning hospitalizations, drug prescriptions and contacts with primary health contractors from >200,000 individuals newly diagnosed with T2D, we used logistic regression-, random forest- and gradient boosting models to predict five-year risk of heart failure (HF), myocardial infarction (MI), stroke (ST), cardiovascular disease (CVD) and chronic kidney disease (CKD). For HF, MI, CVD, and CKD, register-based models outperformed a reference model leveraging canonical individual characteristics by achieving an area under the receiver operating characteristic curve improvements of 0.06, 0.03, 0.06, and 0.07, respectively. The top 1,000 patients predicted to be at highest risk exhibited observed incidence ratios exceeding 4.99, 3.52, 2.92 and 4.71, respectively. In summary, prediction of T2D comorbidities utilizing Danish registers led to consistent albeit modest performance improvements over reference models, suggesting that register data could be leveraged to systematically identify individuals at risk of developing disease comorbidities.

6012: Mammogram Segmentation using Multi-atlas Deformable Registration
more details view paper

Posted to bioRxiv 06 Feb 2019

Mammogram Segmentation using Multi-atlas Deformable Registration
267 downloads bioinformatics

Manish Kumar Sharma, Mainak Jas, Vikrant Karale, Anup Sadhu, Sudipta Mukhopadhyay

Accurate breast region segmentation is an important step in various automated algorithms involving detection of lesions like masses and microcalcifications, and efficient telemammography. Existing segmentation algorithms underperform due to variations in image quality and shape of the breast region. In this paper, we propose to segment breast region by combining data-driven clustering with deformable image registration. In the first phase of the approach, we identify atlas images from a dataset of mammograms using data-driven clustering. Then, we segment these atlas images and use in the next phase of the algorithm. The second phase is atlas-based registration. For a candidate image, we find the most similar atlas image from the set of atlases identified in phase one. We deform the selected atlas image to match the given test image using the Demon's registration algorithm. Then, the segmentation mask of the deformed atlas is transferred to the mammogram in consideration. Finally, we refine the segmentation mask with some morphological operations in order to obtain accurate breast region boundary. We evaluated the performance of our method using ground-truth segmentation masks verified by an expert radiologist. We compared the proposed method with three existing state-of-the-art algorithms for breast region segmentation and the proposed approach outperformed all three in most of the cases.

6013: Evaluating the transcriptional fidelity of cancer models
more details view paper

Posted to bioRxiv 29 Mar 2020

Evaluating the transcriptional fidelity of cancer models
267 downloads bioinformatics

Da Peng, Rachel Gleyzer, Wen-Hsin Tai, Pavithra Kumar, Qin Bian, Bradley Issacs, Edroaldo Lummertz da Rocha, Stephanie Cai, Kathleen DiNapoli, Franklin W. Huang, Patrick Cahan

Cancer researchers use cell lines, patient derived xenografts, and genetically engineered mice as models to investigate tumor biology and to identify therapies. The generalizability and power of a model derives from the fidelity with which it represents the tumor type of investigation, however, the extent to which this is true is often unclear. The preponderance of models and the ability to readily generate new ones has created a demand for tools that can measure the extent and ways in which cancer models resemble or diverge from native tumors. Here, we present a computational tool, CancerCellNet, that measures the similarity of cancer models to 22 naturally occurring tumor types and 36 subtypes, in a platform and species agnostic manner. We applied this tool to 657 cancer cell lines, 415 patient derived xenografts, and 26 distinct genetically engineered mouse models, documenting the most faithful models, identifying cancers underserved by adequate models, and finding models with annotations that do not match their classification. By comparing models across modalities, we find that genetically engineered mice have higher transcriptional fidelity than patient derived xenografts and cell lines in four out of five tumor types. We have made CancerCellNet available as freely downloadable software and as a web application that can be applied to new cancer models. ### Competing Interest Statement The authors have declared no competing interest.

6014: MS-EmpiRe utilizes peptide-level noise distributions for ultra sensitive detection of differentially abundant proteins
more details view paper

Posted to bioRxiv 08 Jan 2019

MS-EmpiRe utilizes peptide-level noise distributions for ultra sensitive detection of differentially abundant proteins
267 downloads bioinformatics

Constantin Ammar, Markus Gruber, Gergely Csaba, Ralf Zimmer

Mass spectrometry based proteomics is the method of choice for quantifying genome-wide differential changes of proteins in a wide range of biological and biomedical applications. Protein changes need to be reliably derived from a large number of measured peptide intensities and their corresponding fold changes. These fold changes vary considerably for a given protein. Numerous instrumental setups aim to reduce this variability, while current computational methods only implicitly account for this problem. We introduce a new method, MS-EmpiRe (github.com/zimmerlab/MS-EmpiRe), which explicitly accounts for the noise underlying peptide fold changes. We derive dataset-specific, intensity-dependent empirical error distributions, which are used for individual weighing of peptide fold changes to detect differentially abundant proteins. The method requires only peptide intensities mapped to proteins and, thus, can be applied to any common quantitative proteomics setup. In a recently published proteome-wide benchmarking dataset, MS-EmpiRe doubles the number of correctly identified changing proteins at a correctly estimated FDR cutoff in comparison to state-of-the-art tools. We confirm the superior performance of MS-EmpiRe on simulated data. MS-EmpiRe provides rapid processing (< 2min) and is an easy to use, general-purpose tool.

6015: KS-Burden: Assessing distributional differences of rare variants in dichotomous traits
more details view paper

Posted to bioRxiv 13 Jul 2018

KS-Burden: Assessing distributional differences of rare variants in dichotomous traits
267 downloads bioinformatics

Robert Milan Porsch, Timothy Mak, Clara Tang, Pak Chung Sham

A number of rare variant tests have been developed to explore the effect of low frequency genetic variations on complex phenotypes. However, an often neglected aspect in these tests is the position of genetic variations. Here we are proposing a way to assess the differences in spatial organization of rare variants by assessing their distributional differences between affected and unaffected subjects. To do so, we have formulated an adaptation of the well know Kolmogorov-Smirnov (KS) test, combining both KS and a simple gene burden approach, called KS-Burden. The performance of our test was evaluated under a comprehensive simulations framework using real data and various scenarios. Our results show that the KS-Burden test is able to outperform the commonly used SKAT-O test, as well as others, in the presents of clusters of causal variants within a genomic region. Furthermore, our test is able to maintain competitive statistical power in scenarios unfavorable to its original assumptions. Hence, the KS-Burden test is a valuable alternative to existing tests and provides better statistical power in the presents of causal clusters within a gene.

6016: Computational insights into mechanism of AIM4-mediated inhibition of aggregation of TDP-43 protein implicated in ALS and evidence for in vitro inhibition of liquid-liquid phase separation (LLPS) of TDP-432C-A315T by AIM4.
more details view paper

Posted to bioRxiv 08 Oct 2019

Computational insights into mechanism of AIM4-mediated inhibition of aggregation of TDP-43 protein implicated in ALS and evidence for in vitro inhibition of liquid-liquid phase separation (LLPS) of TDP-432C-A315T by AIM4.
267 downloads bioinformatics

Amandeep Girdhar, Vidhya Bharathi, Vikas Ramyagya Tiwari, Suman Abhishek, Usha Saraswat Mahawar, Gembali Raju, Sandeep Kumar Singh, Ganesan Prabusankar, Eerappa Rajakumara, Basant K Patel

TDP-43 is an RNA/DNA-binding protein of versatile physiological functions and it is also implicated in the pathogenesis of amyotrophic lateral sclerosis (ALS) disease in addition to several other implicated proteins such as mutant SOD1 and FUS etc. Cytoplasmic mis-localization, liquid-liquid phase separation (LLPS) due to RNA depletion and aggregation of TDP-43 are suggested to be important TDP-43-toxicity causing mechanisms for the ALS manifestation. So far, therapeutic options for ALS are extremely minimal and ineffective therefore, multi-faceted approaches such as treating the oxidative stress and inhibiting the TDP-43 aggregation are being actively pursued. In our recent study, an acridine imidazolium derivative compound, AIM4, has been identified to have anti-TDP-43 aggregation propensity however, its mechanism of inhibition is not deciphered. In this study, we have utilized computational methods to examine binding site(s) of AIM4 in the TDP-43 structure and have also compared its binding efficiency with several other relevant compounds. We find that AIM4 has a binding site in the C-terminal amyloidogenic core region of amino acids aa: 288-319, which coincides with one of the key residue motifs that could potentially mediate liquid-liquid phase separation (LLPS) of TDP-43. Importantly, alike to the previously reported effects exerted by RNA molecules, we found that AIM4 could also inhibit the in vitro LLPS of a recombinantly purified C-terminal fragment TDP-43-2C bearing an A315T familial mutation. Antagonistic effects of AIM4 towards LLPS which is believed as the precursor process to the TDP-43 aggregation and the in silico prediction of a binding site of AIM4 on TDP-43 occurring in the same region, assert that AIM4 could be an important molecule for further investigations on TDP-43 anti-aggregation effects with relevance to the ALS pathogenesis.

6017: A new workflow combining R packages for statistical analysis of metabolites
more details view paper

Posted to bioRxiv 28 Nov 2019

A new workflow combining R packages for statistical analysis of metabolites
267 downloads bioinformatics

Paola G. Ferrario

In metabolomics, the investigation of an association between many metabolites and one trait (such as age in humans or cultivar in foods) is a central research question. For this issue, we present a complete statistical analysis, combining selected R packages in a new workflow, which we share completely, according to modern standards and research reproducibility requirements. We demonstrate the workflow on a large-scale study with public data, available on repositories. Hence, the workflow can directly be re-used on quite different metabolomics data, when searching for association with one covariate of interest.

6018: TopicNet: a framework for measuring transcriptional regulatory network change
more details view paper

Posted to bioRxiv 02 Dec 2019

TopicNet: a framework for measuring transcriptional regulatory network change
267 downloads bioinformatics

Shaoke Lou, Tianxiao Li, Xiangmeng Kong, Jing Zhang, Jason Liu, Donghoon Lee, Mark Gerstein

Next generation sequencing data highlights comprehensive and dynamic changes in the human gene regulatory network. Moreover, changes in regulatory network connectivity (network “rewiring”) manifest different regulatory programs in multiple cellular states. However, due to the dense and noisy nature of the connectivity in regulatory networks, directly comparing the gains and losses of targets of key TFs is not that informative. Thus, here, we seek a abstracted lower-dimensional representation to understand the main features of network change. In particular, we propose a method called TopicNet that applies latent Dirichlet allocation (LDA) to extract meaningful functional topics for a collection of genes regulated by a TF. We then define a rewiring score to quantify the large-scale changes in the regulatory network in terms of topic change for a TF. Using this framework, we can pinpoint particular TFs that change greatly in network connectivity between different cellular states. This is particularly relevant in oncogenesis. Also, incorporating gene-expression data, we define a topic activity score that gives the degree that a topic is active in a particular cellular state. Furthermore, we show how activity differences can highlight differential survival in certain cancers.

6019: RaptRanker: in silico RNA aptamer selection from HT-SELEX experiment based on local sequence and structure information
more details view paper

Posted to bioRxiv 31 Dec 2019

RaptRanker: in silico RNA aptamer selection from HT-SELEX experiment based on local sequence and structure information
267 downloads bioinformatics

Ryoga Ishida, Tatsuo Adachi, Aya Yokota, Hidehito Yoshihara, Kazuteru Aoki, Yoshikazu Nakamura, Michiaki Hamada

Aptamers are short single-stranded RNA/DNA molecules that bind to specific target molecules. Aptamers with high binding-affinity and target specificity are identified using an in vitro procedure called high throughput systematic evolution of ligands by exponential enrichment (HT-SELEX). However, the development of aptamer affinity reagents takes a considerable amount of time and is costly because HT-SELEX produces a large dataset of candidate sequences, some of which have insufficient binding-affinity. Here, we present RNA aptamer Ranker (RaptRanker), a novel in silico method for identifying high binding-affinity aptamers from HT-SELEX data by scoring and ranking. RaptRanker analyzes HT- SELEX data by evaluating the nucleotide sequence and secondary structure simultaneously, and by ranking according to scores reflecting local structure and sequence frequencies. To evaluate the performance of RaptRanker, we performed two new HT-SELEX experiments, and evaluated binding affinities of a part of sequences that include aptamers with low binding-affinity. In both datasets, the performance of RaptRanker was superior to Frequency, Enrichment and MPBind. We also confirmed that the consideration of secondary structures is effective in HT-SELEX data analysis, and that RaptRanker successfully predicted the essential subsequence motifs in each identified sequence. ### Competing Interest Statement The authors have declared no competing interest.

6020: Unearthing Regulatory Axes of Breast Cancer circRNAs Networks to Find Novel Targets and Fathom Pivotal Mechanisms
more details view paper

Posted to bioRxiv 06 Mar 2019

Unearthing Regulatory Axes of Breast Cancer circRNAs Networks to Find Novel Targets and Fathom Pivotal Mechanisms
267 downloads bioinformatics

Farzaneh Afzali, Mahdieh Salimi

Circular RNAs (circRNAs) along other complementary regulatory elements in ceRNAs networks possess valuable characteristics for both diagnosis and treatment of several human cancers including breast cancer (BC). In this study, we combined several systems biology tools and approaches to identify influential BC circRNAs, RNA binding proteins (RBPs), miRNAs, and related mRNAs to study and decipher the BC triggering biological processes and pathways. Rooting from the identified total of 25 co-differentially expressed circRNAs (DECs) between triple negative (TN) and luminal A subtypes of BC from microarray analysis, five hub DECs (hsa_circ_0003227, hsa_circ_0001955, hsa_circ_0020080, hsa_circ_0001666, and hsa_circ_0065173) and top eleven RBPs (AGO1, AGO2, EIF4A3, FMRP, HuR (ELAVL1), IGF2BP1, IGF2BP2, IGF2BP3, EWSR1, FUS, and PTB) were explored to form the upper stream regulatory elements. All the hub circRNAs were regarded as super sponge having multiple miRNA response elements (MREs) for numerous miRNAs. Then four leading miRNAs (hsa-miR-149, hsa-miR-182, hsa-miR-383, and hsa-miR-873) accountable for BC progression were also introduced from merging several ceRNAs networks. The predicted 7- and 8-mer MREs matches between hub circRNAs and leading miRNAs ensured their enduring regulatory capability. The mined downstream mRNAs of the circRNAs-miRNAs network then were presented to STRING database to form the PPI network and deciphering the issue from another point of view. The BC interconnected enriched pathways and processes guarantee the merits of the ceRNAs networks' members as targetable therapeutic elements. This study suggested extensive panels of novel covering therapeutic targets that are in charge of BC progression in every aspect, hence their impressive role cannot be excluded and needs deeper empirical laboratory designs.

Previous page 1 . . . 299 300 301 302 303 304 305 . . . 399 Next page

PanLingua

Sign up for the Rxivist weekly newsletter! (Click here for more details.)


News