Evaluating single-subject study methods for personal transcriptomic interpretations to advance precision medicine
Background: Gene expression profiling has benefited medicine by providing clinically relevant insights at the molecular candidate and systems levels. However, to adopt a more precision approach that integrates individual variability including omics data into risk assessments, diagnoses, and therapeutic decision making, whole transcriptome expression analysis requires methodological advancements. One need is for users to confidently be able to make individual-level inferences from whole transcriptome data. We propose that biological replicates in isogenic conditions can provide a framework for testing differentially expressed genes (DEGs) in a single subject (ss) in absence of an appropriate external reference standard or replicates. Methods: Eight ss methods for identifying genes with differential expression (NOISeq, DEGseq, edgeR, mixture model, DESeq, DESeq2, iDEG, and ensemble) were compared in Yeast (parental line versus snf2 deletion mutant; n=42/condition) and MCF7 breast-cancer cell (baseline and stimulated with estradiol; n=7/condition) RNA-Seq datasets where replicate analysis was used to build reference standards from NOISeq, DEGseq, edgeR, DESeq, DESeq2. Each dataset was randomly partitioned so that approximately two-thirds of the paired samples were used to construct reference standards and the remainder were treated separately as single-subject sample pairs and DEGs were assayed using ss methods. Receiver-operator characteristic (ROC) and precision-recall plots were determined for all ss methods against each RSs in both datasets (525 combinations). Results: Consistent with prior analyses of these data, ~50% and ~15% DEGs were respectively obtained in Yeast and MCF7 reference standard datasets regardless of the analytical method. NOISeq, edgeR and DESeq were the most concordant and robust methods for creating a reference standard. Single-subject versions of NOISeq, DEGseq, and an ensemble learner achieved the best median ROC-area-under-the-curve to compare two transcriptomes without replicates regardless of the type of reference standard (>90% in Yeast, >0.75 in MCF7). Conclusion: Better and more consistent accuracies are obtained by an ensemble method applied to single-subject studies across different conditions. In addition, distinct specific sing-subject methods perform better according to different proportions of DEGs. Single-subject methods for identifying DEGs from paired samples need improvement, as no method performs with both precision>90% and recall>90%. http://www.lussiergroup.org/publications/EnsembleBiomarker
- Downloaded 265 times
- Download rankings, all-time:
- Site-wide: 77,652 out of 118,149
- In bioinformatics: 7,171 out of 9,572
- Year to date:
- Site-wide: 89,589 out of 118,149
- Since beginning of last month:
- Site-wide: 88,107 out of 118,149
Downloads over time
Distribution of downloads per paper, site-wide
- 27 Nov 2020: The website and API now include results pulled from medRxiv as well as bioRxiv.
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!