mnist数据集数据_应用于mnist数据集的拓扑特征

mnist数据集数据

Persistent homology is a fascinating mathematical tool that continues to be studied, developed, and applied. The purpose of this article is to give a friendly introduction on how to use the persistent homology that does not require substantial knowledge of topological methods.

持久同源性是一种引人入胜的数学工具,正在继续研究,开发和应用。 本文的目的是对不需要使用大量拓扑方法知识的持久性同源性进行友好介绍。

To illustrate the use of persistent homology in machine learning we apply it to the MNIST data set of handwritten digits. It is an example of the extraction of topological features to distinguish between images of handwritten digits. The diagram in Figure 2 illustrates the main ideas underlying the proposed technique which will be discussed in greater detail in this article.

为了说明持久性同源性在机器学习中的使用,我们将其应用于手写数字的MNIST数据集。 这是提取拓扑特征以区分手写数字图像的一个示例。 图2中的图表说明了所提议技术的主要思想,本文将对此进行详细讨论。

Image for post
Figure 2: An example of the extraction of topological features, discussed in detail below. (Source: author.) 图2 :提取拓扑特征的示例,下面将详细讨论。 (来源:作者。)

The aim of this example is to demonstrate the classification potential of the technique and not to outperform the existing models for the classification of handwritten digits.

本示例的目的是演示该技术的分类潜力,而不是超过现有的手写数字分类模型。

For a more interesting example of using this technique on a clinical data set to classify hepatic lesions, see [1]. A very similar approach can be applied to any point cloud data and can be generalized to higher dimensions.

有关在临床数据集上使用此技术对肝病变进行分类的更有趣的示例,请参见[1]。 非常相似的方法可以应用于任何点云数据,并且可以推广到更高的维度。

I made publicly available all scripts that I wrote for this tutorial including a processed version of the data set. I am also using a publicly available package that provides an implementation for the computation of persistent homology.

我公开了我为本教程编写的所有脚本,包括数据集的处理版本。 我还使用了一个公开可用的软件包,该软件包提供了用于计算持久同源性的实现。

动机 (Motivation)

Topology applied to real-world data sets using persistent homology has begun to look for applications in machine learning, including deep learning [2]. It is mainly used as a pre-processing step to provide robust topological features for learning.

使用持久性同源性应用于现实世界数据集的拓扑已经开始寻找在机器学习中的应用,包括深度学习[2]。 它主要用作预处理步骤,以提供强大的学习拓扑功能。

Our data is often a finite set of noisy samples from some underlying space. The developed topological techniques, mostly deal with point clouds, i.e. finite sets of data points in space.

我们的数据通常是来自某些基础空间的有限的噪声样本集。 发达的拓扑技术主要处理点云,即空间中有限的数据点集。

Point clouds are typically produced by a variety of imaging devices, such as MRI or CT scanners. With the greater availability of such data capture devices, this type of data is being generated at an increasing rate. The data sets are often also very noisy and contain a lot of missing information, especially biological data sets.

点云通常由各种成像设备(例如MRI或CT扫描仪)产生。 随着这种数据捕获设备的更大可用性,这种类型的数据正以越来越高的速率生成。 数据集通常也非常嘈杂,并且包含很多丢失的信息,尤其是生物学数据集。

Our ability to analyze this data, both in terms of the amount and the nature of the data, is clearly out of step with the data we generate [3]. Topology can be used to make a useful contribution to the analysis of such data sets and can be especially helpful in studying them qualitatively.

我们分析数据的能力(无论是数据的数量还是性质)显然与我们生成的数据不一致[3]。 拓扑可用于为此类数据集的分析做出有益的贡献,并且在定性研究它们方面尤其有用。

术语 (Terminology)

  • 1
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值