近年image annotation方面的最新论文

Tagging like Humans: Diverse and Distinct Image Annotation 2018cvpr

类人化标注:多样性和独特性图像标注

Baoyuan Wu1, Weidong Chen1, Peng Sun1, Wei Liu1, Bernard Ghanem2, and Siwei Lyu3     腾讯AI Lab

本文由 AI Lab主导完成。作者提出了一种全新的自动图像标注的生成式模型,名为多样性和独特性图像标注(D2IA)。受到人类标注集成的启发,D2IA将产生语义相关,独特且多样性的标签。

In this work we propose a new automatic image annotation model, dubbed diverse and distinct image annotation (D2IA). The generative model D2IA is inspired by the ensemble of human annotations, which create semantically relevant, yet distinct and diverse tags. In D2IA, we generate a relevant and distinct tag subset, in which the tags are relevant to the image contents and semantically distinct to each other, using sequential sampling from a determinantal point process (DPP) model. Multiple such tag subsets that cover diverse semantic aspects or diverse semantic levels of the image contents are generated by randomly perturbing the DPP sampling process. We leverage a generative adversarial network (GAN) model to train D2IA. Extensive experiments including quantitative and qualitative comparisons, as well as human subject studies, on two benchmark datasets demonstrate that the proposed model can produce more diverse and distinct tags than the state-of-the-arts.


第一步,利用基于行列式点过程(DPP)的序列采样,产生一个标签子集,使得子集中的每个标签与图像内容相关,且标签之间语义上是独特的(即没有语义冗余)。第二步,对DPP模型加上随机扰动得到不同的概率分布,进而可以通过第一步中的序列采样产生多个不同的标签子集。作者利用生成对抗网络(GAN)来训练D2IA,在两个基准数据集上开展了充分的实验,包括定量和定性的对比,以及人类主观测试。实验结果说明,相对于目前最先进的自动图像标注方法,本文的方法可以产生更加多样和独特的标签。

6. Conclusion

In this work, we have proposed a new image annotation method, called diverse and distinct image annotation (D2IA), to simulate the diversity and distinctiveness of the tags generated by human annotators. D2IA is formulated as a sequential generative model, in which the image feature is firstly incorporated into a determinantal point process (DPP) model that also encodes the weighted semantic paths, from which a sequence of distinct tags are generated by sampling. The diversity among the generated multiple tag subsets is ensured by sampling the DPP model with random noise perturbations to the image feature. In addition,we adopt the generative adversarial network (GAN) model to train the generative model D2IA, and employ the policy gradient algorithm to handle the training difficulty due to the discrete DPP sampling in D2IA. Experimental results and human subject studies on benchmark datasets demonstrate that the diverse and distinct tag subsets generated by the proposed method can provide more comprehensive descriptions of the image contents than those generated by the state-of-the-art methods.

D2IA是一个顺序生成模型,图像特征首先纳入DPP模型,编码加权语义路径,由采样生成一系列独特的标签。多个标签子集的多样性由DPP采样时对图像特征的随机噪音扰动保证。采用GAN模型训练D2IA模型,利用梯度算法处理训练困难。

Diverse Image Annotation 2017CVPR论文  腾讯AI Lab

多样性图像标注

Baoyuan Wu†,‡ Fan Jia† Wei Liu‡ Bernard Ghanem† 

Abstract
In this work, we study a new image annotation task called diverse image annotation (DIA). Its goal is to describe an image using a limited number of tags, whereby the retrieved tags need to cover as much useful information about the image as possible. As compared to the conventional image annotation task, DIA requires the tags to be not only representative of the image but also diverse from each other, so as to reduce redundancy. To this end, we treat DIA as a subset selection problem, based on the conditional determinantal point process (DPP) model, which encodes representation and diversity jointly. We further explore semantic hierarchy and synonyms among candidate tags to define weighted semantic paths. It is encouraged that two tags with the same semantic path are not retrieved simultaneously for the same image. This restriction is embedded into the algorithm used to sample from the learned conditional DPP model. Interestingly, we find that conventional metrics for image annotation (e.g., precision, recall, and F1 score) only consider an overall representative capacity of all the retrieved tags, while ignoring their diversity.
Thus, we propose new semantic metrics based on our proposed weighted semantic paths. An extensive subject study
verifies that the proposed metrics are much more consistent with human evaluation than conventional annotation metrics. Experiments on two benchmark datasets show that the proposed method produces more representative and diverse tags, compared with existing methods

本文提出了一种新的自动图像标注目标,即用少量多样性标签表达尽量多的图像信息,该目标充分利用标签之间的语义关系,使得自动标注结果与人类标注更加接近。

Conclusions

This work studied a new task called diverse image annotation(DIA), where an image is annotated using a limited number of tags that attempt to cover as much semantic image information as possible. This task inherently requires that the few retrieved tags are not only representative of the image but also diverse. To this end, we treated the new task as a subset selection problem and model it using a conditional DPP model, which naturally incorporates the representation and diversity jointly. Further, we proposed a modified DPP sampling algorithm, which incorporates semantic paths. We also proposed new metrics based on these semantic paths to evaluate the quality of the diverse tag list.The experiments on two benchmarks demonstrate that our proposed method is superior to those state-of-the-art image annotation approaches. An extensive subject study validates the claim that our proposed semantic metrics are much more consistent with human annotation than traditional metrics.However, many interesting issues about the new diverse image annotation (DIA) task deserve to be studied in the future. Firstly, the similarity matrix S in the DPP model is assumed to be pre-computed in this work. That is why the contribution of S is not very significant, compared with the contribution of semantic paths in sampling. In future work, we plan to learn S andWjointly. Secondly, there is till a sizeable gap between the semantic metrics and human evaluation. To bridge this gap, we will focus on updating the way that the semantic paths are constructed and weighted,based on more detailed analysis of the path structure and tag weights. We will make the new semantic metrics available to the community as an online toolkit3. Consequently, the evaluation of DIA can be standardized for fair comparison amongst future annotation methods.


Semantic Regularisation for Recurrent Image Annotation  递归图像标注的语义正则化 

 2017CVPR论文

Abstract: The "CNN-RNN" design pattern is increasingly widely applied in a variety of image annotation tasks including multi-label classification and captioning. Existing models use the weakly semantic CNN hidden layer or its transform as the image embedding that provides the interface between the CNN and RNN. This leaves the RNN overstretched with two jobs: predicting the visual concepts and modelling their correlations for generating structured annotation output. Importantly this makes the end-to-end training of the CNN and RNN slow and ineffective due to the difficulty of back propagating gradients through the RNN to train the CNN. We propose a simple modification to the design pattern that makes learning more effective and efficient. Specifically, we propose to use a semantically regularised embedding layer as the interface between the CNN and RNN. Regularising the interface can partially or completely decouple the learning problems, allowing each to be more effectively trained and jointly training much more efficient. Extensive experiments show that state-of-the art performance is achieved on multi-label classification as well as image captioning.

6. Conclusion
We proposed a semantically regularised CNN-RNN model for image annotation. The semantic regularisation makes the CNN-RNN interface semantically meaningful,distributes the label prediction and correlation tasks between the CNN and RNN models, and importantly the deep supervision makes training the full model more stable and efficient. Extensive evaluations on NUS-WIDE and MSCOCO demonstrate the efficacy of the proposed model on both multi-label classification and image captioning.


Deep Determinantal Point Process for Large-Scale Multi-Label Classification  CVPR2017

Pengtao Xie*†, Ruslan Salakhutdinov*, Luntian Mou§ and Eric P. Xing†
*Machine Learning Department, Carnegie Mellon University, USA
†Petuum Inc.

Abstract

We study large-scale multi-label classification (MLC) on two recently released datasets: Youtube-8M and Open Images that contain millions of data instances and thousands of classes. The unprecedented problem scale poses great challenges for MLC. First, finding out the correct label

subset out of exponentially many choices incurs substantial ambiguity and uncertainty. Second, the large data-size and class-size entail considerable computational cost. To address the first challenge, we investigate two strategies:capturing label-correlations from the training data and incorporating label co-occurrence relations obtained from external knowledge, which effectively eliminate semantically
inconsistent labels and provide contextual clues to differentiate visually ambiguous labels. Specifically, we propose a Deep Determinantal Point Process (DDPP) model which seamlessly integrates a DPP with deep neural networks (DNNs) and supports end-to-end multi-label learning and deep representation learning. The DPP is able to capture label-correlations of any order with a polynomial computational cost, while the DNNs learn hierarchical features of images/videos and capture the dependency between input data and labels. To incorporate external knowledge about label co-occurrence relations, we impose relational regularization over the kernel matrix in DDPP. To address the second challenge, we study an efficient low-rank kernel learning algorithm based on inducing point methods. Experiments on the two datasets demonstrate the efficacy and efficiency of the proposed methods.


5. Conclusions and FutureWorks
We study the large-scale multi-label classification on two recently released datasets: Youtube-8M and Open Images. To capture the high-order correlation among labels while retaining computational efficiency, we propose Deep Determinantal Point Process (DDPP) that seamlessly integrates DPP and deep neural networks (DNNs) and supports end-to-end learning. DPP is able to capture label-correlation of arbitrary order within polynomial computational time while DNNs play the role of representation learning of images and videos. To incorporate prior knowledge regarding label co-occurrence relations, we impose relational regularization over DDPP’s kernel matrix. A low-rank kernel learning algorithm is investigated to scale DDPP to millions of instances and thousands of labels. Experiments on the
two datasets demonstrate the efficacy and efficiency of our methods. For future works, we plan to investigate the noisy and missing label problem [54] presenting in Open Images and leverage label hierarchy to improve MLC performance.


用于跨模态检索的自监督对抗哈希网络 2018 CVPR牛文 

据说是当前最好的跨模态哈希

Self-Supervised Adversarial Hashing Networks for Cross-Modal Retrieval


本文与西安电子科技大学、悉尼大学合作完成。由于深度学习的成功,最近跨模式检索获得了显着改进。但是,仍然存在一个关键的瓶颈,即如何缩小多模态之间的距离,进一步提高检索的准确性。本文提出了一种自我监督对抗哈希(SSAH)方法,这是早期试图将对抗性学习纳入以自我监督方式的跨模态哈希研究中。这项工作的主要贡献是作者采用了几个对抗网络来最大化不同模态之间的语义相关性和表示一致性。另外,作者利用自我监督的语义网络以多标签注释的形式发现高级语义信息,指导特征学习过程以保持共同语义空间和海明空间中的模态之间的关系。对三个基准数据集进行的大量实验表明,所提出的SSAH优于最先进的方法。



  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值