计算机视觉的自动标注,基于深度学习的图像自动标注方法综述

1

LI F F , IYER A , KOCH C , et al. What do we perceive in a glance of a real-world scene[J]. Journal of Vision, 2007, 7 (1): 10- 10.

2

KARPATHY A, LI F F.Deep visual-semantic alignments for generating image descriptions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Boston, USA: IEEE, 2015: 3128-3137.

3

HE K, ZHANG X, REN S, et al. Delving deep into rectifiers: surpassing human-level performance on imagenet classification[C]//Proceedings of the IEEE International Conference on Computer Vision.Santiago, Chile: IEEE, 2015: 1026-1034.

4

REN S, HE K, GIRSHICK R, et al. Faster r-cnn: towards real-time object detection with region proposal networks[C]//Advances in Neural Information Processing Systems. Montreal, Canada: NIPS, 2015: 91-99.

5

SUTSKEVER I, VINYALS O, LE Q V. Sequence to sequence learning with neural networks[C]//Advances in Neural Information Processing Systems. Montreal, Canada: NIPS, 2014: 3104-3112.

6

HOSSAIN M , SOHEL F , SHIRATUDDIN M F , et al. A comprehensive study of deep learning for image captioning[J]. Arxiv: Computer Vision and Pattern Recognition, 2018.

7

FU K , LI J , JIN J , et al. Image-text surgery:efficient concept learning in image captioning by generating pseudopairs[J]. IEEE Transactions on Neural Networks and Learning Systems, 2018, (99): 1- 12.

8

GAO L, FAN K, SONG J, et al.Deliberate attention networks for image captioning[C]//AAAI-19. Honolulu, USA: AAAI, 2019: 8320-8327.

9

CHEN F, JI R, SUN X, et al. Groupcap: group-based image captioning with structured relevance and diversity constraints[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018: 1345-1353.

10

彭宇新, 綦金玮, 黄鑫. 多媒体内容理解的研究现状与展望[J]. 计算机研究与发展, 2019, 56 (1): 183- 208.

PENG Y X , QI J W , HUANG X . Current research status and prospects on multimedia content understanding[J]. Journal of Computer Research and Development, 2019, 56 (1): 183- 208.

11

FARHADI A, HEJRATI M, SADEGHI M A, et al. Every picture tells a story: generating sentences from images[C]//European Conference on Computer Vision. Berlin, Germany: Springer, 2010: 15-29.

12

ORDONEZ V, KULKARNI G, BERG T L. Im2text: describing images using 1 million captioned photographs[C]//Advances in Neural Information Processing Systems. Granada, Spain: Curran Associates Inc, 2011: 1143-1151.

13

YANG Y, TEO C L, DAUMÉ H, et al. Corpus-guided sentence generation of natural images[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. Edinburgh, United Kingdom: ACL, 2011: 444-454.

14

LI S, KULKARNI G, BERG T L, et al. Composing simple image descriptions using web-scale n-grams[C]//Proceedings of the Fifteenth Conference on Computa-tional Natural Language Learning. Portland, USA: ACL, 2011: 220-228.

15

LECUN Y , BENGIO Y , HINTON G . Deep learning[J]. Nature, 2015, 521 (7553): 436.

doi: 10.1038/nature14539

16

KIROS R, SALAKHUTDINOV R, ZEMEL R. Multimodal neural language models[C]//International Conference on Machine Learning. Beijing, China: IMLS, 2014: 595-603.

17

KARPATHY A, JOULIN A, LI F F. Deep fragment embeddings for bidirectional image sentence mapping[C]//Advances in Neural Information Processing Systems. Montreal, Canada: NIPS, 2014: 1889-1897.

18

MAO J , XU W , YANG Y , et al. Deep captioning with multimodal recurrent neural networks (m-rnn)[J]. Arxiv: Computer Vision and Pattern Recognition, 2014.

19

JOHNSON J, KARPATHY A, LI F F. Densecap: fully convolutional localization networks for dense captioning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016: 4565-4574.

20

YANG L, TANG K, YANG J, et al. Dense captioning with joint inference and visual context[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE, 2017: 2193-2202.

21

KRISHNA R , ZHU Y , GROTH O , et al. Visual genome: connecting language and vision using crowdsourced dense image annotations[J]. International Journal of Computer Vision, 2017, 123 (1): 32- 73.

22

CHO K , VAN MERRI NBOER B , GULCEHRE C , et al. Learning phrase representations using rnn encoder-decoder for statistical machine translation[J]. Arxiv: Computer Vision and Pattern Recognition, 2014.

23

VINYALS O, TOSHEV A, BENGIO S, et al. Show and tell: a neural image caption generator[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE, 2015: 3156-3164.

24

JIA X, GAVVES E, FERNANDO B, et al. Guiding the long-short term memory model for image caption generation[C]//Proceedings of the IEEE International Conference on Computer Vision. Santiago, Chile: IEEE, 2015: 2407-2415.

25

MAO J, HUANG J, TOSHEV A, et al. Generation and comprehension of unambiguous object descriptions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016: 11-20.

26

WANG C, YANG H, BARTZ C, et al. Image captioning with deep bidirectional LSTMs[C]//Proceedings of the 2016 ACM on Multimedia Conference. Amsterdam, United Kingdom: ACM, 2016: 988-997.

27

XU K, BA J, KIROS R, et al. Show, attend and tell: neural image caption generation with visual attention[C]//International Conference on Machine Learning.Lile, France: IMLS, 2015: 2048-2057.

28

LU J, XIONG C, PARIKH D, et al. Knowing when to look: adaptive attention via a visual sentinel for image captioning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE, 2017: 375-383.

29

PEDERSOLI M, LUCAS T, SCHMID C, et al. Areas of attention for image captioning[C]//Proceedings of the IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2017: 1242-1250.

30

TAVAKOLI H R, SHETTY R, BORJI A, et al. Paying attention to descriptions generated by image captioning models[C]//Proceedings of the IEEE International Conference on Computer Vision.Venice, Italy: IEEE, 2017: 2487-2496.

31

ANDERSON P, HE X, BUEHLER C, et al. Bottom-up and top-down attention for image captioning and visual question answering[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018: 6077-6086.

32

CHUNSEONG Park C, KIM B, KIM G. Attend to you: personalized image captioning with context sequence memory networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE, 2017: 895-903.

33

YOU Q, JIN H, WANG Z, et al. Image captioning with semantic attention[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016: 4651-4659.

34

YAO T, PAN Y, LI Y, et al. Boosting image captioning with attributes[C]//Proceedings of the IEEE International Conference on Computer Vision.Venice, Italy: IEEE, 2017: 4894-4902.

35

REN Z, WANG X, ZHANG N, et al. Deep reinforcement learning-based image captioning with embedding reward[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE, 2017: 1151-1159.

36

RENNIE S J, MARCHERET E, MROUEH Y, et al. Self-critical sequence training for image captioning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Honolulu, USA: IEEE, 2017: 7008-7024.

37

ZHANG L , SUNG F , LIU F , et al. Actor-critic sequence training for image captioning[J]. Arxiv: Computer Vision and Pattern Recognition, 2017.

38

KONDA V R, TSITSIKLIS J N. Actor-critic algorithms[C]//Advances in Neural Information Processing Systems. Denver, USA: NIPS, 2000: 1008-1014.

39

DAI B , FIDLER S , URTASUN R , et al. Towards diverse and natural image descriptions via a conditional gan[J]. Arxiv: Computer Vision and Pattern Recognition, 2017.

40

SHETTY R, ROHRBACH M, HENDRICKS L A, et al. Speaking the same language: matching machine to human captions by adversarial training[C]//2017 IEEE International Conference on Computer Vision (ICCV). Venice, Italy: IEEE, 2017: 4155-4164.

41

LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft coco: common objects in context[C]//European Conference on Computer Vision. Zurich, Switzerland: Springer, 2014: 740-755.

42

HODOSH M , YOUNG P , HOCKENMAIER J . Framing image description as a ranking task: data, models and evaluation metrics[J]. Journal of Artificial Intelligence Research, 2013, 47, 853- 899.

doi: 10.1613/jair.3994

43

PLUMMER B A, WANG L, CERVANTES C M, et al. Flickr30k entities: collecting region-to-phrase correspondences for richer image-to-sentence models[C]//Proceedings of the IEEE International Conference on Computer Vision. Santiago, Chile: IEEE, 2015: 2641-2649.

44

GRUBINGER M, CLOUGH P, MVLLER H, et al. The iapr tc-12 benchmark: a new evaluation resource for visual inform-ation systems[C]//International Workshop Ontoimage. Genoa, Italy: OntoImage, 2006: 13-55.

45

BYCHKOVSKY V, PARIS S, CHAN E, et al. Learning photographic global tonal adjustment with a database of input/output image pairs[C]//CVPR 2011. Piscataway, USA: IEEE, 2011: 97-104.

46

PAPINENI K, ROUKOS S, WARD T, et al. Bleu: a method for automatic evaluation of machine translation[C]//Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Istanbul, Turkey: ACL, 2002: 311-318.

47

LIN C Y . Rouge: a package for automatic evaluation of summaries[J]. Text Summarization Branches Out, 2004, 74- 81.

48

BANERJEE S, LAVIE A. METEOR: an automatic metric for MT evaluation with improved correlation with human judgments[C]//Proceedings of the Acl Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization. Istanbul, Turkey: ACL, 2005: 65-72.

49

VEDANTAM R, LAWRENCE ZITNICK C, Parikh D. Cider: consensus-based image description evaluation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE, 2015: 4566-4575.

50

ANDERSON P, FERNANDO B, JOHNSON M, et al. Spice: semantic propositional image caption evaluation[C]//Euopean Conference on Computer Vision. Amsterdam, The Netherlands: Springer, 2016: 382-398.

51

GAN Z, GAN C, HE X, et al. Semantic compositional networks for visual captioning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE, 2017: 5630-5639.

52

GU J, WANG G, CAI J, et al. An empirical study of language cnn for image captioning[C]//Proceedings of the IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2017: 1222-1231.

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
基于深度学习的红外与可见光图像融合是当前计算机视觉领域的热门研究方向之一。红外图像和见光图像具有不同的物理特性和信息内容,通过将它们融合在一起,可以提高图像的视觉感知和信息提取能力。本文将对基于深度学习的红外与可见光图像融合进行综述。 首先,深度学习在红外与可见光图像融合中的应用已经取得了显著的进展。传统的图像融合方法通常基于手工设计的特征提取和融合规则,而深度学习可以通过自动学习特征表示和融合模式来实现更准确和鲁棒的融合效果。卷积神经网络(CNN)和生成对抗网络(GAN)等深度学习模型被广泛应用于红外与可见光图像融合中,取得了较好的融合效果。 其次,深度学习模型在红外与可见光图像融合中的应用包括特征提取、特征融合和融合结果生成等方面。通过使用深度卷积神经网络,可以提取红外图像和可见光图像的高层次语义特征,并将它们融合在一起。生成对抗网络则可以生成逼真的融合结果,使融合后的图像既保留了红外图像的热信息,又具备了可见光图像的丰富细节。 此外,深度学习模型还可以通过多尺度和多模态的方式来进行红外与可见光图像融合。多尺度融合可以有效地处理不同分辨率和空间尺度的图像信息,提高融合结果的质量。多模态融合则可以将红外图像和可见光图像的不同特征进行有机组合,增强图像的视觉效果和信息内容。 最后,尽管基于深度学习的红外与可见光图像融合已经取得了一定的成果,但仍然存在一些挑战和问题需要解决。例如,数据集的缺乏和标注困难使得深度学习模型的训练和评估变得困难。此外,不同的红外与可见光图像融合任务和应用场景可能需要针对性的模型设计和算法优化。 综上所述,基于深度学习的红外与可见光图像融合是一个具有挑战性和潜力的研究方向。通过深度学习模型的应用,可以实现更准确、鲁棒和逼真的红外与可见光图像融合效果,为红外图像处理和应用提供更多可能性。然而,仍需要进一步的研究和探索来解决现有问题,并推动该领域的发展。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值