【文献阅读】《Clustering with Deep Learning: Taxonomy and New Methods》

​​​​​​​文献链接《Clustering with Deep Learning: Taxonomy and New Methods》

 

标题:《Clustering with Deep Learning: Taxonomy and New Methods》

出处:该文献2018年被人工智能顶会ICLR(International Conference on Learning Representations 国际学习表征会议)收录。

作者:Elie Aljalbout, Vladimir Golkov, Yawar Siddiqui, Maximilian Strobel & Daniel Cremers

学校:Technical University of Munich(德国 慕尼黑工业大学)

《Clustering with Deep Learning: Taxonomy and New Methods》

翻译:深度聚类:分类和新方法

全文包括八个部分,分别是:Abstract、Introduction、Taxonomy、Related methods、Case Study:new method、Experimental Results、Conclusion

第一部分Abstract摘要:​​​​​​​​​​​​​​

Clustering methods based on deep neural networks have proven promising for clustering real-world data because of their high representational power. In this paper we propose a systematic taxonomy of clustering methods that utilize deep neural networks. We base our taxonomy on a comprehensive review of recent work and validate the taxonomy in a case study. In this case study, we show that the taxonomy enables researchers and practitioners to systematically create new clustering methods by selectively recombining and replacing distinct aspects o previous methods with the goal of overcoming their individual limitations. The experimental evaluation confirms this and shows that the method created for the case study achieves state-of-the-art clustering quality and surpasses it in some cases.

翻译:事实证明,基于深度神经网络的聚类方法具有很高的表示能力,可用于聚类现实世界的数据。在本文中,我们提出了一种利用深度神经网络的聚类方法的系统分类。我们的分类基于对近期工作的全面审查,并在案例研究中验证分类。在本案例研究中,我们展示了分类法使研究人员和从业人员能够通过选择性地重组和替换以前方法的不同方面来系统地创建新的聚类方法,以克服它们各自的局限性。实验评估证实了这一点,并表明为案例研究创建的方法实现了最先进的聚类质量,并在某些情况下超越了它。

第二部分Introduction:

The main objective of clustering is to separate data into groups of similar data points. Having a good separation of data points into clusters is fundamental for many applications in data analysis and data visualization.

翻译:聚类的主要目的是将数据分成相似数据点的组。对于数据分析和数据可视化中的许多应用程序来说,将数据点很好地分离到集群中是基础。

分析:聚类作为你一种工具,我们常常根据需要,将数据集中的数据分为几个集群,每个集群中的数据点与该集群中的其他数据具有相似特征。

The performance of current clustering methods is however highly dependent on the input data. Different datasets usually require different similarity measures and separation techniques. As a result,dimensionality reduction and representation learning have been extensively used alongside clustering in order to map the input data into a feature space where separation is easier. By utilizing deep neural networks (DNNs), it is possible to learn non-linear mappings that allow transforming data into more clustering-friendly representations without manual feature extraction/selection.

翻译:然而,当前聚类方法的性能高度依赖于输入数据。不同的数据集通常需要不同的相似性度量和分离技术。因此,降维和表示学习已与聚类一起被广泛使用,以便将输入数据映射到更容易分离的特征空间。通过利用深度神经网络 (DNN),可以学习非线性映射,从而无需手动提取/选择特征即可将数据转换为更适合聚类的表示。

分析:当下出现的问题是:1、传统的聚类方法性能好坏高度依赖于输入数据 2、不同的数据集通常需要不同的相似性度量和分离技术

            解决办法:通过降维和特征学习两种方法使得输入数据映射到更容易分离的特征空间上。

            作者提出第三种方法,即利用深度神经网络学习非线性映射将数据转换为更加适合聚类的表示。

The main contribution of this paper is the formulation of a taxonomy for clustering methods that rely on a deep neural network for representation learning. The proposed taxonomy enables researchers to create new methods in a structured and analytical way by selectively recombining or replacing distinct aspects of existing methods to improve their performance or mitigate limitations. The tax-onomy is in particular also valuable for practitioners who want to create a method from existing building blocks that suits their task at hand. To illustrate the value of the proposed taxonomy, we conducted a case study in which we fuse a new method based on insights from the taxonomy.

翻译:本文的主要贡献是为依赖于深度神经网络进行特征学习的聚类方法制定了分类法。拟议的分类法使研究人员能够通过有选择地重新组合或替换现有方法的不同方面以提高其性能或减轻限制,从而以结构化和分析的方式创建新方法。对于希望从现有构建块创建适合其手头任务的方法的从业者来说,分类法也特别有价值。为了说明所提出的分类法的价值,我们进行了一个案例研究,其中我们融合了一种基于分类法见解的新方法。

分析:这篇文章的主要贡献是为依赖于深度神经网络进行特征学习的聚类方法制定了分类法。该分类法的好处是,使得研究人员能够通过有选择地重新组合或替换现有方法的不同方面以提高其性能或减轻限制。

In this case study, we use a fully convolutional autoencoder to learn clustering-friendly representations of the data by optimizing it with a two-phased training procedure. In the first phase, the autoencoder is trained with the standard mean squared error reconstruction loss. In the second phase, the autoencoder is then fine-tuned with a combined loss function consisting of the autoencoder reconstruction loss and a clustering-specific loss.

翻译:在本案例研究中,我们使用全卷积自动编码器通过两阶段训练过程对其进行优化来学习数据的聚类友好表示。在第一阶段,自动编码器使用标准均方误差重建损失进行训练。在第二阶段,然后使用由自动编码器重建损失和聚类特定损失组成的组合损失函数对自动编码器进行微调。

分析:论文提供了一个实例,在该实例中作者使用全卷积自动编码器通过两阶段训练过程对其进行优化来学习数据的聚类友好表示,然后使用自动编码器重建损失和聚类特定损失组成的组合损失函数来对自动编码器进行微调。

第三部分TAXONOMY

The most successful methods for clustering with deep neural networks all work following the same principle: representation learning using DNNs and using these representations as input for a specific clustering method.

翻译:使用深度神经网络进行聚类的最成功方法都遵循相同的原则:使用 DNN 进行表示学习,并将这些表示作为特定聚类方法的输入。

该部分行文结构如下:

Neural network training procedure, consisting of:

            –Main neural network branch and its usage

                        ∗Architecture of main neural network branch, described in Section 2.1

                        ∗Set of deep features used for clustering, described in Section 2.2

            –Neural network losses:

                        ∗Non-clustering loss, described in Section 2.3

                        ∗Clustering loss, described in Section 2.4

                        ∗Method to combine the two losses, described in Section 2.5

           –Cluster updates, described in Section 2.6

                        •(Optional) Re-run clustering after network training, described in Section2.7

综述类的论文以概括总结为主,后续内容待补充,不足之处请多多指教。

  • 2
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值