暑研项目：从计算拓扑方面理解AI及认知（Understanding AI and Cognition in terms of Computational Topology）

feiyu66666

已于 2023-11-02 10:28:49 修改

阅读量63

点赞数

分类专栏：大学专业文章标签：人工智能

于 2023-11-02 09:52:11 首次发布

本文链接：https://blog.csdn.net/feiyu66666/article/details/134175059

版权

大学专业专栏收录该内容

11 篇文章 0 订阅

订阅专栏

emmm，被同组大佬带飞...

I Introduction

Background

Over the past decade or so, artificial intelligence, especially deep learning, has grown by a leap and bound. Researchers are constantly experimenting with new engineering design and different structures of neural networks, tons of AI applications emerge and bring about the dawn of a new era. Disappointingly, however, the theoretical foundations of deep learning are weak and underdeveloped compared to what we have seen in AI engineering.

Scientists have made many attempts at the interpretability of neural networks. From applying TSNE map to the dataset to the visualization of attention map of transformer, these approaches provide intuitive ways to interpreter the intrinsic cause of AI. Explaining AI requires a two-pronged approach: what it is learning and how it is learning it. Topology provides some rationalization for these as well as intuitively interpretations.

Manifold Hypothesis (Methods)

Early in the development of neuroscience and manifold learning, when deep learning has barely evolved, scientists has proposed a hypothesis to explain the intrinsic natural of the datasets and was called Manifold Hypothesis.

The manifold hypothesis posits that many high-dimensional data sets that occur in the real world actually lie along low-dimensional latent manifolds inside that high-dimensional space. This hypothesis was widely accepted in neurology and many interesting theories have been developed on top of it. Neuroscientists believe that human action consists of a neural, and the latent variables that control these neural modes are distributed over the lower dimension manifold. The shared manifold between different individuals forms the root of empathy and perception.

To study the topological properties of these manifolds, scientists have proposed a series of rigorous and efficient methods, one of which is continuous homology. Persistent homology is a method for computing topological features of a space at different spatial resolutions. More persistent features are detected over a wide range of spatial scales and are deemed more likely to represent true features of the underlying space rather than artifacts of sampling, noise, or particular choice of parameters. Some topological features are constantly revealed in the high-dimensional embedding space.

II Topological Interpretations of the Dataset

Auto-encoders, as one of the most classical neural networks, are able to map the samples to the lower space by contraction mapping. The generalization of theoretic foundations of auto-encoder could be applied to most of the neural networks.

The manifold hypothesis states that many high-dimensional data sets that occur in the real world and could be perceive by humans are actually lie along low-dimensional latent manifolds inside that high-dimensional space. On top of this assumption, researchers have created a number of ingenious ML algorithms, such as TSNE, Isomap, LLE and so on. These algorithms were invented on the basis that this assumption holds, and their own effectiveness justifies the assumption. Based on this Manifold Hypothesis, genres such as representation learning, manifold learning, etc. have emerged rapidly.

MNIST Dataset

Without loss of generality, let we specifically concern about the dataset consist of hand-written numbers, MNIST. The full image space, which consist of all possible images equivalent to Rn, could be partitioned into different dataset based on which specific concept we are interested in. These partitions could be any set of images selected by humans based on certain criteria for categorization, like an image with all the cars, an image with all the faces, and so on. The MNIST dataset is the most popular handwritten digit dataset and consists of a large number of handwritten digit samples. Each sample is a 28*28 grayscale image, which could be regarded as a point embedded in real space with dimension 784.

But how could we tell that as Manifold Hypothesis states, these entries follow a certain rule of distribution that they are discrete samples of low-dimensional manifold embedded in high dimensional space?

First reasoning logically, we know that not all 28*28 grayscale images can be perceived by humans. If we randomly pick sample from the full image space, most of the images are noise with random distribution of pixel illuminance. Some of them are perceivable but don’t have meaning, for example, two digits overlap with each other and be able to be seen that the it represents some kind of number, but there is no way to determine exactly which number it is. Hence, we could tell that the dataset is somehow constraint to a certain low rank space. To intuitively interpreter it, an approach is to “flatten” the entries onto much lower dimensional human-readable space, and TSNE map is a good example of it.

The TSNE maps the dataset to the two-dimensional plane by a function which satisfies the criteria of homeomorphism: bijective, continuous and inverse of the function is continuous. Therefore, for every point on the R2, TSNE has a corresponding image back to full image space. The figure shows the input MNIST images and the corresponding low-dimensional output.

We could easily tell that the images distributed at the edge of the cluster are only meaningful along the direction of the boundary of the cluster or inside the cluster. If starting from an image on the boundary and add perturbation such that the corresponding low-dimensional point in TSNE map moves away from the cluster, we could see that the images can become difficult to recognize, such as being blurry, or containing features from multiple numbers at the same time. This phenomenon is a good illustration of the limited degrees of freedom of the data distribution. But on the basis of this phenomenon alone, we cannot conclude that there is a low-dimensional representation of the data distribution. Further exploration of the intrinsic nature of the dataset requires the use of an auto-encoder.

III Auto-Encoder in Representation Learning Perspective

An autoencoder is a type of artificial neural network used to learn efficient codings of unlabeled data. An autoencoder learns two functions: an encoding function that transforms the input data, and a decoding function that recreates the input data from the encoded representation. The autoencoder learns an efficient representation (encoding) for a set of data, typically for dimensionality reduction. The following figure shows a typical structure of autoencoder.

We can see from the figure that the number of output neurons in the encoder is significantly less than the number of input neurons. This shows that encoder part is a contraction map, mapping full image space onto a subspace of itself. The vector shown in orange is called embedded vector, it characterizes the information with maximum entropy. We can constrain what the decoder tries to output by setting the loss function. When we define the loss function as reconstruction loss, decoder will try to reconstruct the original input from the intermediate low-dimensional vector.

After training, the autoencoder is able to capture the internal structure of the data manifold and store it within the weights. Then the decoder at this point could be regarded as a local parameterization of the data manifold, and the encoder could be regarded as the inverse function of decoder.

IV Priori Estimation of Datasets by Computational Persistent Homology

Persistent homology is a method for computing topological features of a space at different spatial resolutions. More persistent features are detected over a wide range of spatial scales and are deemed more likely to represent true features of the underlying space rather than artifacts of sampling, noise, or particular choice of parameters.

To find the persistent homology of a space, the space must first be represented as a simplicial complex. A distance function on the underlying space corresponds to a filtration of the simplicial complex, that is a nested sequence of increasing subsets. One common method of doing this is via taking the sublevel filtration of the distance to a point cloud, or equivalently, the offset filtration on the point cloud and taking its nerve in order to get the simplicial filtration known as Čech filtration. A similar construction uses a nested sequence of Vietoris-Rips complexes known as the Vietoris-Rips filtration.

Most of the simplexes are defined on a given graph. The way to get graph is trivial. For a radius r, connect the points up to distance between each other is less than r. By tracing the timing of the birth and death of loops in the graph, we can make an analysis of a data manifold or a point cloud.

As shown in the above, the red loop survives longer as radius increases, and the birth and death of green loop are close. Based on this observation, we could conclude that the red is a more dominate feature of this point cloud. These features enable a preliminary analysis of the characteristics of the dataset. genus and dataset complexity, i.e., number of categories. We can also look at the topological noise distribution of the dataset by looking at loops with short survival times.