给定一组高维数据,run_umap.m生成数据的低维表示,用于数据可视化和探索。请参阅文件run_umap.m顶部的注释。m文件提供了有关如何使用此代码的文档和许多示例。
UMAP算法是利兰·麦克因内斯、约翰·希利和詹姆斯·梅尔维尔的发明。请参阅他们的原始论文,了解详细的形式描述(https://arxiv.org/pdf/1802.03426.pdf)。另请参阅原始Python实现的文档(https://umap-learn.readthedocs.io/en/latest/index.html).
这个MATLAB实现遵循与2019年的Python实现非常相似的结构,许多函数描述几乎相同。
Given a set of high-dimensional data, run_umap.m produces a lower-dimensional representation of the data for purposes of data visualization and exploration. See the comments at the top of the file run_umap.m for documentation and many examples of how to use this code.
The UMAP algorithm is the invention of Leland McInnes, John Healy, and James Melville. See their original paper for a long-form description (https://arxiv.org/pdf/1802.03426.pdf). Also see the documentation for the original Python implementation (https://umap-learn.readthedocs.io/en/latest/index.html).
This MATLAB implementation follows a very similar structure to the Python implementation from 2019, and many of the function descriptions are nearly identical.
Here are some additional tools we have added to our implementation:
-
The ability to detect clusters in the low-dimensional output of UMAP. As clustering method, we invoke either DBM (described at https://www.hindawi.com/journals/abi/2009/686759/) or DBSCAN (built in to MATLAB R2019a and later).
-
Visual and computational tools for data group comparisons. Data groups can be defined either by running clustering on the data islands resulting from UMAP’s reduction or by external classification labels. We use a change quantification metric (QFMatch) which detects similarity in both mass & distance (described at https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5818510/) as well as an F-score for measuring overlap when the groups are different classifications for the same data. For visualizing data groups, we provide a dendrogram (described as QF-tree at https://www.nature.com/articles/s42003-019-0467-6) and sortable tables which show each data group’s similarity, overlap, false positive rate and false negative rate. The documentation in run_umap.m and UMAP_extra_results.m describes these and additional related tools provided.
-
A PredictionAdjudicator feature that helps determine how well one classification’s subsets predict another’s.
-
A complementary independent classifier named “exhaustive projection pursuit” (EPP) that generates labels both for supervising UMAP as well as for classification comparison research. EPP is described at https://onedrive.live.com/?authkey=%21ALyGEpe8AqP2sMQ&cid=FFEEA79AC523CD46&id=FFEEA79AC523CD46%21209192&parId=FFEEA79AC523CD46%21204865&o=OneUp.
-
The ability to use neural networks either from MATLAB’s “fitcnet” function or the Python package TensorFlow to learn from a training data set and provide a classification on new data to either compare against or merge with UMAP classification.
Without the aid of any compression, this MATLAB UMAP implementation tends to be faster than the current Python implementation (version 0.5.2 of umap-learn). Due to File Exchange requirements, we only supply the C++ source code for the MEX modules we use to accelerate the computations. Users must download or build the MEX binary files themselves separately. See the fast_approximation argument comments in the run_umap.m file for further speedups. As examples 13 to 15 show, you can test the speed difference between the implementations for yourself on your computer by setting the ‘python’ argument to true.
The Bioinformatics Toolbox is required to change the ‘qf_tree’ argument, which is optional.
This implementation is a work in progress. It has been looked over by Leland McInnes, who in 2019 described it as “a fairly faithful direct translation of the original Python code”. We hope to continue improving it in the future.
Provided by the Herzenberg Lab at Stanford University.
We appreciate all and any help in finding bugs. Our priority has been determining the suitability of our concepts for research publications in flow cytometry for the use of UMAP supervised templates and exhaustive projection pursuit.
下载地址:
https://url92.ctfile.com/f/1850492-547468355-1786cb
(访问密码:3660)