【源码】一致流形逼近与投影(UMAP)算法

在这里插入图片描述
给定一组高维数据,run_umap.m生成数据的低维表示,用于数据可视化和探索。请参阅文件run_umap.m顶部的注释。m文件提供了有关如何使用此代码的文档和许多示例。

UMAP算法是利兰·麦克因内斯、约翰·希利和詹姆斯·梅尔维尔的发明。请参阅他们的原始论文,了解详细的形式描述(https://arxiv.org/pdf/1802.03426.pdf)。另请参阅原始Python实现的文档(https://umap-learn.readthedocs.io/en/latest/index.html).

这个MATLAB实现遵循与2019年的Python实现非常相似的结构,许多函数描述几乎相同。

Given a set of high-dimensional data, run_umap.m produces a lower-dimensional representation of the data for purposes of data visualization and exploration. See the comments at the top of the file run_umap.m for documentation and many examples of how to use this code.

The UMAP algorithm is the invention of Leland McInnes, John Healy, and James Melville. See their original paper for a long-form description (https://arxiv.org/pdf/1802.03426.pdf). Also see the documentation for the original Python implementation (https://umap-learn.readthedocs.io/en/latest/index.html).

This MATLAB implementation follows a very similar structure to the Python implementation from 2019, and many of the function descriptions are nearly identical.

Here are some additional tools we have added to our implementation:

  1. The ability to detect clusters in the low-dimensional output of UMAP. As clustering method, we invoke either DBM (described at https://www.hindawi.com/journals/abi/2009/686759/) or DBSCAN (built in to MATLAB R2019a and later).

  2. Visual and computational tools for data group comparisons. Data groups can be defined either by running clustering on the data islands resulting from UMAP’s reduction or by external classification labels. We use a change quantification metric (QFMatch) which detects similarity in both mass & distance (described at https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5818510/) as well as an F-score for measuring overlap when the groups are different classifications for the same data. For visualizing data groups, we provide a dendrogram (described as QF-tree at https://www.nature.com/articles/s42003-019-0467-6) and sortable tables which show each data group’s similarity, overlap, false positive rate and false negative rate. The documentation in run_umap.m and UMAP_extra_results.m describes these and additional related tools provided.

  3. A PredictionAdjudicator feature that helps determine how well one classification’s subsets predict another’s.

  4. A complementary independent classifier named “exhaustive projection pursuit” (EPP) that generates labels both for supervising UMAP as well as for classification comparison research. EPP is described at https://onedrive.live.com/?authkey=%21ALyGEpe8AqP2sMQ&cid=FFEEA79AC523CD46&id=FFEEA79AC523CD46%21209192&parId=FFEEA79AC523CD46%21204865&o=OneUp.

  5. The ability to use neural networks either from MATLAB’s “fitcnet” function or the Python package TensorFlow to learn from a training data set and provide a classification on new data to either compare against or merge with UMAP classification.

Without the aid of any compression, this MATLAB UMAP implementation tends to be faster than the current Python implementation (version 0.5.2 of umap-learn). Due to File Exchange requirements, we only supply the C++ source code for the MEX modules we use to accelerate the computations. Users must download or build the MEX binary files themselves separately. See the fast_approximation argument comments in the run_umap.m file for further speedups. As examples 13 to 15 show, you can test the speed difference between the implementations for yourself on your computer by setting the ‘python’ argument to true.

The Bioinformatics Toolbox is required to change the ‘qf_tree’ argument, which is optional.

This implementation is a work in progress. It has been looked over by Leland McInnes, who in 2019 described it as “a fairly faithful direct translation of the original Python code”. We hope to continue improving it in the future.

Provided by the Herzenberg Lab at Stanford University.

We appreciate all and any help in finding bugs. Our priority has been determining the suitability of our concepts for research publications in flow cytometry for the use of UMAP supervised templates and exhaustive projection pursuit.

下载地址:

https://url92.ctfile.com/f/1850492-547468355-1786cb

(访问密码:3660)

  • 1
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 2
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值