Molecular characterization of breast and lung tumors by integration of multiple data types with sparse-factor analysis
Effective cancer treatment is crucially dependent on the identification of the biological processes that drive a tumor. However, multiple processes may be active simultaneously in a tumor. Clustering is inherently unsuitable to this task as it assigns a tumor to a single cluster. In addition, the wide availability of multiple data types per tumor provides the opportunity to profile the processes driving a tumor more comprehensively. Here we introduce Functional Sparse-Factor Analysis (funcSFA) to address these challenges. FuncSFA integrates multiple data types to define a lower dimensional space capturing the relevant variation. A tailor-made module associates biological processes with these factors. FuncSFA is inspired by iCluster, which we improve in several key aspects. First, we increase the convergence efficiency significantly, allowing the analysis of multiple molecular datasets that have not been pre-matched to contain only concordant features. Second, FuncSFA does not assign tumors to discrete clusters, but identifies the dominant driver processes active in each tumor. This is achieved by a regression of the factors on the RNA expression data followed by a functional enrichment analysis and manual curation step. We apply FuncSFA to the TCGA breast and lung datasets. We identify EMT and Immune processes common to both cancer types. In the breast cancer dataset we recover the known intrinsic subtypes and identify additional processes. These include immune infiltration and EMT, and processes driven by copy number gains on the 8q chromosome arm. In lung cancer we recover the major types (adenocarcinoma and squamous cell carcinoma) and processes active in both of these types. These include EMT, two immune processes, and the activity of the NFE2L2 transcription factor. In summary, FuncSFA is a robust method to perform discovery of key driver processes in a collection of tumors through unsupervised integration of multiple molecular data types and functional annotation.
- Downloaded 523 times
- Download rankings, all-time:
- Site-wide: 35,652 out of 103,764
- In bioinformatics: 4,496 out of 9,474
- Year to date:
- Site-wide: 74,367 out of 103,764
- Since beginning of last month:
- Site-wide: 75,510 out of 103,764
Downloads over time
Distribution of downloads per paper, site-wide
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!