Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 67,594 bioRxiv papers from 298,187 authors.
Machine learning enables detection of early-stage colorectal cancer by whole-genome sequencing of plasma cell-free DNA
Eric A. Ariazi,
Adam M. Drake,
Gabriel L Otte,
Gabriel E Sanderson,
John St. John,
Imran S. Haque
Posted 24 Nov 2018
bioRxiv DOI: 10.1101/478065 (published DOI: 10.1016/S0016-5085(19)38396-9)
Posted 24 Nov 2018
Background: Blood-based methods using cell-free DNA (cfDNA) are under development as an alternative to existing screening tests. However, early-stage detection of cancer using tumor-derived cfDNA has proven challenging because of the small proportion of cfDNA derived from tumor tissue in early-stage disease. A machine learning approach to discover signatures in cfDNA, potentially reflective of both tumor and non-tumor contributions, may represent a promising direction for the early detection of cancer. Methods: Whole-genome sequencing was performed on cfDNA extracted from plasma samples (N=546 colorectal cancer and 271 non-cancer controls). Reads aligning to protein-coding gene bodies were extracted, and read counts were normalized. cfDNA tumor fraction was estimated using IchorCNA. Machine learning models were trained using k-fold cross-validation and confounder-based cross-validation to assess generalization performance. Results: In a colorectal cancer cohort heavily weighted towards early-stage cancer (80% stage I/II), we achieved a mean AUC of 0.92 (95% CI 0.91-0.93) with a mean sensitivity of 85% (95% CI 83-86%) at 85% specificity. Sensitivity generally increased with tumor stage and increasing tumor fraction. Stratification by age, sequencing batch, and institution demonstrated the impact of these confounders and provided a more accurate assessment of generalization performance. Conclusions: A machine learning approach using cfDNA achieved high sensitivity and specificity in a large, predominantly early-stage, colorectal cancer cohort. The possibility of systematic technical and institution-specific biases warrants similar confounder analyses in other studies. Prospective validation of this machine learning method and evaluation of a multi-analyte approach are underway.
- Downloaded 2,336 times
- Download rankings, all-time:
- Site-wide: 1,873 out of 67,594
- In cancer biology: 50 out of 2,278
- Year to date:
- Site-wide: 806 out of 67,594
- Since beginning of last month:
- Site-wide: 5,326 out of 67,594
Downloads over time
Distribution of downloads per paper, site-wide
- Top preprints of 2018
- Paper search
- Author leaderboards
- Overall metrics
- The API
- Email newsletter
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!