Rxivist logo

Evaluating single-subject study methods for personal transcriptomic interpretations to advance precision medicine

By Samir Rachid Zaim, Colleen Kenost, Joanne Berghout, Helen Hao Zhang, Yves A. Lussier

Posted 27 Sep 2018
bioRxiv DOI: 10.1101/428581 (published DOI: 10.1186/s12920-019-0513-8)

Background: Gene expression profiling has benefited medicine by providing clinically relevant insights at the molecular candidate and systems levels. However, to adopt a more precision approach that integrates individual variability including omics data into risk assessments, diagnoses, and therapeutic decision making, whole transcriptome expression analysis requires methodological advancements. One need is for users to confidently be able to make individual-level inferences from whole transcriptome data. We propose that biological replicates in isogenic conditions can provide a framework for testing differentially expressed genes (DEGs) in a single subject (ss) in absence of an appropriate external reference standard or replicates. Methods: Eight ss methods for identifying genes with differential expression (NOISeq, DEGseq, edgeR, mixture model, DESeq, DESeq2, iDEG, and ensemble) were compared in Yeast (parental line versus snf2 deletion mutant; n=42/condition) and MCF7 breast-cancer cell (baseline and stimulated with estradiol; n=7/condition) RNA-Seq datasets where replicate analysis was used to build reference standards from NOISeq, DEGseq, edgeR, DESeq, DESeq2. Each dataset was randomly partitioned so that approximately two-thirds of the paired samples were used to construct reference standards and the remainder were treated separately as single-subject sample pairs and DEGs were assayed using ss methods. Receiver-operator characteristic (ROC) and precision-recall plots were determined for all ss methods against each RSs in both datasets (525 combinations). Results: Consistent with prior analyses of these data, ~50% and ~15% DEGs were respectively obtained in Yeast and MCF7 reference standard datasets regardless of the analytical method. NOISeq, edgeR and DESeq were the most concordant and robust methods for creating a reference standard. Single-subject versions of NOISeq, DEGseq, and an ensemble learner achieved the best median ROC-area-under-the-curve to compare two transcriptomes without replicates regardless of the type of reference standard (>90% in Yeast, >0.75 in MCF7). Conclusion: Better and more consistent accuracies are obtained by an ensemble method applied to single-subject studies across different conditions. In addition, distinct specific sing-subject methods perform better according to different proportions of DEGs. Single-subject methods for identifying DEGs from paired samples need improvement, as no method performs with both precision>90% and recall>90%. http://www.lussiergroup.org/publications/EnsembleBiomarker

Download data

  • Downloaded 265 times
  • Download rankings, all-time:
    • Site-wide: 77,652 out of 118,149
    • In bioinformatics: 7,171 out of 9,572
  • Year to date:
    • Site-wide: 89,589 out of 118,149
  • Since beginning of last month:
    • Site-wide: 88,107 out of 118,149

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)