Non-protein-coding RNAs are ubiquitous in cell physiology, with a diverse repertoire of known functions. In fact, the majority of the eukaryotic genome does not code for proteins, and thousands of conserved long non-protein-coding RNAs of currently unkown function have been identified. When available, knowledge of their 3D structure is very helpful in elucidating the function of these RNAs. However, despite some outstanding structure elucidation of RNAs using X-ray crystallography, NMR and cryoEM, learning RNA 3D structures remains low-throughput. RNA structure prediction in silico is a promising alternative approach and works well for double-helical stems, but full 3D structure determination requires tertiary contacts outside of secondary structures that are difficult to infer from sequence information. Here, based only on information from RNA multiple sequence alignments, we use a global statistical sequence probability model of co-variation in a pairs of nucleotide positions to detect 3D contacts, in analogy to recently developed breakthrough methods for computational protein folding. In blinded tests on 22 known RNA structures ranging in size from 65 to 1800 nucleotides, the predicted contacts matched physical nucleotide interactions with 65-95% true positive prediction accuracy. Importantly, we infer many long-range tertiary contacts, including non-Watson-Crick interactions, where secondary structure elements assemble in 3D. When used as restraints in molecular dynamics simulations, the inferred contacts improve RNA 3D structure prediction to a coordinate error as low as 6 to 10 angstrom rmsd deviation in atom positions, with potential for further refinement by molecular dynamics. These contacts include functionally important interactions, such as those that distinguish the active and inactive conformations of four riboswitches. In blind prediction mode, we present evolutionary couplings suitable for folding simulations for 180 RNAs of unknown structure, available at https://marks.hms.harvard.edu/ev_rna/. We anticipate that this approach can help shed light on the structure and function of non-protein-coding RNAs as well as 3D-structured mRNAs.
- Downloaded 460 times
- Download rankings, all-time:
- Site-wide: 40,364 out of 101,121
- In bioinformatics: 4,908 out of 9,284
- Year to date:
- Site-wide: 91,354 out of 101,121
- Since beginning of last month:
- Site-wide: 80,187 out of 101,121
Downloads over time
Distribution of downloads per paper, site-wide
- 20 Oct 2020: Support for sorting preprints using Twitter activity has been removed, at least temporarily, until a new source of social media activity data becomes available.
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!