The neighbors principle implicit in any machine learning algorithm says that samples with similar labels should be close to one another in feature space as well. For example, while tumors are heterogeneous, tumors that have similar genomics profiles can also be expected to have similar responses to a specific therapy. Simple correlation coefficients provide an effective way to determine whether this principle holds when features and labels are both scalar, but not when either is multivariate. A new class of generalized correlation coefficients based on inter-point distances addresses this need and is called ``distance correlation''. There is only one rank-based distance correlation test available to date, and it is asymmetric in the samples, requiring that one sample be distinguished as a fixed point of reference. Therefore, we introduce a novel, nonparametric statistic, REVA, inspired by the Kendall rank correlation coefficient. We use U-statistic theory to derive the asymptotic distribution of the new correlation coefficient, developing additional large and finite sample properties along the way. To establish the admissibility of the REVA statistic, and explore the utility and limitations of our model, we compared it to the most widely used distance based correlation coefficient in a range of simulated conditions, demonstrating that REVA does not depend on an assumption of linearity, and is robust to high levels of noise, high dimensions, and the presence of outliers. We also present an application to real data, applying REVA to determine whether cancer cells with similar genetic profiles also respond similarly to a targeted therapeutic.
- Downloaded 453 times
- Download rankings, all-time:
- Site-wide: 36,540 out of 92,411
- In bioinformatics: 4,548 out of 8,662
- Year to date:
- Site-wide: 51,701 out of 92,411
- Since beginning of last month:
- Site-wide: 36,062 out of 92,411
Downloads over time
Distribution of downloads per paper, site-wide
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!