mnist数据集数据
Persistent homology is a fascinating mathematical tool that continues to be studied, developed, and applied. The purpose of this article is to give a friendly introduction on how to use the persistent homology that does not require substantial knowledge of topological methods.
持久同源性是一种引人入胜的数学工具,正在继续研究,开发和应用。 本文的目的是对不需要使用大量拓扑方法知识的持久性同源性进行友好介绍。
To illustrate the use of persistent homology in machine learning we apply it to the MNIST data set of handwritten digits. It is an example of the extraction of topological features to distinguish between images of handwritten digits. The diagram in Figure 2 illustrates the main ideas underlying the proposed technique which will be discussed in greater detail in this article.
为了说明持久性同源性在机器学习中的使用,我们将其应用于手写数字的MNIST数据集。 这是提取拓扑特征以区分手写数字图像的一个示例。 图2中的图表说明了所提议技术的主要思想,本文将对此进行详细讨论。
![Image for post](https://i-blog.csdnimg.cn/blog_migrate/976df4daea8eecdef5de72b3812261e4.png)
The aim of this example is to demonstrate the classification potential of the technique and not to outperform the existing models for the classification of handwritten digits.
本示例的目的是演示该技术的分类潜力,而不是超过现有的手写数字分类模型。
For a more interesting example of using this technique on a clinical data set to classify hepatic lesions, see [1]. A very similar approach can be applied to any point cloud data and can be generalized to higher dimensions.
有关在临床数据集上使用此技术对肝病变进行分类的更有趣的示例,请参见[1]。 非常相似的方法可以应用于任何点云数据,并且可以推广到更高的维度。
I made publicly available all scripts that I wrote for this tutorial including a processed version of the data set. I am also using a publicly available package that provides an implementation for the computation of persistent homology.
我公开了我为本教程编写的所有脚本,包括数据集的处理版本。 我还使用了一个公开可用的软件包,该软件包提供了用于计算持久同源性的实现。
动机 (Motivation)
Topology applied to real-world data sets using persistent homology has begun to look for applications in machine learning, including deep learning [2]. It is mainly used as a pre-processing step to provide robust topological features for learning.
使用持久性同源性应用于现实世界数据集的拓扑已经开始寻找在机器学习中的应用,包括深度学习[2]。 它主要用作预处理步骤,以提供强大的学习拓扑功能。
Our data is often a finite set of noisy samples from some underlying space. The developed topological techniques, mostly deal with point clouds, i.e. finite sets of data points in space.
我们的数据通常是来自某些基础空间的有限的噪声样本集。 发达的拓扑技术主要处理点云,即空间中有限的数据点集。
Point clouds are typically produced by a variety of imaging devices, such as MRI or CT scanners. With the greater availability of such data capture devices, this type of data is being generated at an increasing rate. The data sets are often also very noisy and contain a lot of missing information, especially biological data sets.
点云通常由各种成像设备(例如MRI或CT扫描仪)产生。 随着这种数据捕获设备的更大可用性,这种类型的数据正以越来越高的速率生成。 数据集通常也非常嘈杂,并且包含很多丢失的信息,尤其是生物学数据集。
Our ability to analyze this data, both in terms of the amount and the nature of the data, is clearly out of step with the data we generate [3]. Topology can be used to make a useful contribution to the analysis of such data sets and can be especially helpful in studying them qualitatively.
我们分析数据的能力(无论是数据的数量还是性质)显然与我们生成的数据不一致[3]。 拓扑可用于为此类数据集的分析做出有益的贡献,并且在定性研究它们方面尤其有用。