文章介绍
2021年AAAI的文章,出自彭玺团队。
附上源码:2021-AAAI-CC
文章创新点
- For the first time, we reveal that the row and column of the feature matrix intrinsically correspond to the instance and cluster representation, respectively. Hence, deep clustering could be elegantly unified into the framework of representation learning; (即特征矩阵的行对应的是instance representation,列对应的是cluster representation)
- To the best of our knowledge, this could be the first work of clustering-specified contrastive learning. Different from existing studies in contrastive learning, the proposed method conducts contrastive learning at not only the instance-level but also the cluster-level. Such a dual contrastive learning framework could produce clustering favorite representations as proved in our experiments; (首次提出了针对聚类的对比学习框架,在cluster -level 实行了对比学习)
- The proposed model works in an online and end-to-end fashion, which only needs batch-wise optimization and thus can be applied to large-scale datasets. Moreover, the proposed method could timely predict the cluster assignment for each new coming data point without accessing the whole dataset, which suits streaming data. (提出的模型以在线和端到端的方式工作,end-end,实时性良好)
本文基础
通过将特征矩阵的行视为实例的软标签(soft labels)(即表示样本 i 属于cluster j 的概率),可以据此将列解释为分布在数据集上的聚类表示。
文章主要框架
Pair Construction Backbone
用的是Resnet 34 主干网络
Instance-Level Contrastive Head
Cluster-Level Contrastive Head
overall loss
主要算法流程
数据集
实验结果
聚类可视化
总结
本文提出了一种名为对比聚类(CC)的在线聚类方法,该方法明确地执行实例级和簇级对比学习。具体来说,对于给定的数据集,通过数据增强构建正负实例对,并将其投影到特征空间中。然后,在特征矩阵的行空间和列空间中分别进行实例级和簇级对比学习,通过最大化正对之间的相似性并最小化负对之间的相似性。我们的关键观察是,特征矩阵的行可以被看作是实例的软标签,相应地,列可以被进一步看作是簇的表示。通过同时优化实例级和簇级对比损失,模型以端到端的方式联合学习表示和簇分配。此外,所提出的方法即使在数据以流的形式出现时,也可以及时计算每个个体的簇分配。广泛的实验结果表明,CC 在六个具有挑战性的图像基准数据集上显著优于 17 种竞争性聚类方法。