计算机视觉的自动标注,基于深度学习的图像自动标注方法综述

最新推荐文章于 2021-11-25 17:41:59 发布

北归啦

最新推荐文章于 2021-11-25 17:41:59 发布

阅读量746

点赞数

文章标签：计算机视觉的自动标注

LI F F , IYER A , KOCH C , et al. What do we perceive in a glance of a real-world scene[J]. Journal of Vision, 2007, 7 (1): 10- 10.

KARPATHY A, LI F F.Deep visual-semantic alignments for generating image descriptions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Boston, USA: IEEE, 2015: 3128-3137.

HE K, ZHANG X, REN S, et al. Delving deep into rectifiers: surpassing human-level performance on imagenet classification[C]//Proceedings of the IEEE International Conference on Computer Vision.Santiago, Chile: IEEE, 2015: 1026-1034.

REN S, HE K, GIRSHICK R, et al. Faster r-cnn: towards real-time object detection with region proposal networks[C]//Advances in Neural Information Processing Systems. Montreal, Canada: NIPS, 2015: 91-99.

SUTSKEVER I, VINYALS O, LE Q V. Sequence to sequence learning with neural networks[C]//Advances in Neural Information Processing Systems. Montreal, Canada: NIPS, 2014: 3104-3112.

HOSSAIN M , SOHEL F , SHIRATUDDIN M F , et al. A comprehensive study of deep learning for image captioning[J]. Arxiv: Computer Vision and Pattern Recognition, 2018.

FU K , LI J , JIN J , et al. Image-text surgery:efficient concept learning in image captioning by generating pseudopairs[J]. IEEE Transactions on Neural Networks and Learning Systems, 2018, (99): 1- 12.

GAO L, FAN K, SONG J, et al.Deliberate attention networks for image captioning[C]//AAAI-19. Honolulu, USA: AAAI, 2019: 8320-8327.

CHEN F, JI R, SUN X, et al. Groupcap: group-based image captioning with structured relevance and diversity constraints[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018: 1345-1353.

彭宇新, 綦金玮, 黄鑫. 多媒体内容理解的研究现状与展望[J]. 计算机研究与发展, 2019, 56 (1): 183- 208.

PENG Y X , QI J W , HUANG X . Current research status and prospects on multimedia content understanding[J]. Journal of Computer Research and Development, 2019, 56 (1): 183- 208.

FARHADI A, HEJRATI M, SADEGHI M A, et al. Every picture tells a story: generating sentences from images[C]//European Conference on Computer Vision. Berlin, Germany: Springer, 2010: 15-29.

ORDONEZ V, KULKARNI G, BERG T L. Im2text: describing images using 1 million captioned photographs[C]//Advances in Neural Information Processing Systems. Granada, Spain: Curran Associates Inc, 2011: 1143-1151.

YANG Y, TEO C L, DAUMÉ H, et al. Corpus-guided sentence generation of natural images[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. Edinburgh, United Kingdom: ACL, 2011: 444-454.

LI S, KULKARNI G, BERG T L, et al. Composing simple image descriptions using web-scale n-grams[C]//Proceedings of the Fifteenth Conference on Computa-tional Natural Language Learning. Portland, USA: ACL, 2011: 220-228.

LECUN Y , BENGIO Y , HINTON G . Deep learning[J]. Nature, 2015, 521 (7553): 436.

doi: 10.1038/nature14539

KIROS R, SALAKHUTDINOV R, ZEMEL R. Multimodal neural language models[C]//International Conference on Machine Learning. Beijing, China: IMLS, 2014: 595-603.

KARPATHY A, JOULIN A, LI F F. Deep fragment embeddings for bidirectional image sentence mapping[C]//Advances in Neural Information Processing Systems. Montreal, Canada: NIPS, 2014: 1889-1897.

MAO J , XU W , YANG Y , et al. Deep captioning with multimodal recurrent neural networks (m-rnn)[J]. Arxiv: Computer Vision and Pattern Recognition, 2014.

JOHNSON J, KARPATHY A, LI F F. Densecap: fully convolutional localization networks for dense captioning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016: 4565-4574.

YANG L, TANG K, YANG J, et al. Dense captioning with joint inference and visual context[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE, 2017: 2193-2202.

KRISHNA R , ZHU Y , GROTH O , et al. Visual genome: connecting language and vision using crowdsourced dense image annotations[J]. International Journal of Computer Vision, 2017, 123 (1): 32- 73.

CHO K , VAN MERRI NBOER B , GULCEHRE C , et al. Learning phrase representations using rnn encoder-decoder for statistical machine translation[J]. Arxiv: Computer Vision and Pattern Recognition, 2014.

VINYALS O, TOSHEV A, BENGIO S, et al. Show and tell: a neural image caption generator[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE, 2015: 3156-3164.

JIA X, GAVVES E, FERNANDO B, et al. Guiding the long-short term memory model for image caption generation[C]//Proceedings of the IEEE International Conference on Computer Vision. Santiago, Chile: IEEE, 2015: 2407-2415.

MAO J, HUANG J, TOSHEV A, et al. Generation and comprehension of unambiguous object descriptions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016: 11-20.

WANG C, YANG H, BARTZ C, et al. Image captioning with deep bidirectional LSTMs[C]//Proceedings of the 2016 ACM on Multimedia Conference. Amsterdam, United Kingdom: ACM, 2016: 988-997.

XU K, BA J, KIROS R, et al. Show, attend and tell: neural image caption generation with visual attention[C]//International Conference on Machine Learning.Lile, France: IMLS, 2015: 2048-2057.

LU J, XIONG C, PARIKH D, et al. Knowing when to look: adaptive attention via a visual sentinel for image captioning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE, 2017: 375-383.

PEDERSOLI M, LUCAS T, SCHMID C, et al. Areas of attention for image captioning[C]//Proceedings of the IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2017: 1242-1250.

TAVAKOLI H R, SHETTY R, BORJI A, et al. Paying attention to descriptions generated by image captioning models[C]//Proceedings of the IEEE International Conference on Computer Vision.Venice, Italy: IEEE, 2017: 2487-2496.

ANDERSON P, HE X, BUEHLER C, et al. Bottom-up and top-down attention for image captioning and visual question answering[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018: 6077-6086.

CHUNSEONG Park C, KIM B, KIM G. Attend to you: personalized image captioning with context sequence memory networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE, 2017: 895-903.

YOU Q, JIN H, WANG Z, et al. Image captioning with semantic attention[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016: 4651-4659.

YAO T, PAN Y, LI Y, et al. Boosting image captioning with attributes[C]//Proceedings of the IEEE International Conference on Computer Vision.Venice, Italy: IEEE, 2017: 4894-4902.

REN Z, WANG X, ZHANG N, et al. Deep reinforcement learning-based image captioning with embedding reward[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE, 2017: 1151-1159.

RENNIE S J, MARCHERET E, MROUEH Y, et al. Self-critical sequence training for image captioning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Honolulu, USA: IEEE, 2017: 7008-7024.

ZHANG L , SUNG F , LIU F , et al. Actor-critic sequence training for image captioning[J]. Arxiv: Computer Vision and Pattern Recognition, 2017.

KONDA V R, TSITSIKLIS J N. Actor-critic algorithms[C]//Advances in Neural Information Processing Systems. Denver, USA: NIPS, 2000: 1008-1014.

DAI B , FIDLER S , URTASUN R , et al. Towards diverse and natural image descriptions via a conditional gan[J]. Arxiv: Computer Vision and Pattern Recognition, 2017.

SHETTY R, ROHRBACH M, HENDRICKS L A, et al. Speaking the same language: matching machine to human captions by adversarial training[C]//2017 IEEE International Conference on Computer Vision (ICCV). Venice, Italy: IEEE, 2017: 4155-4164.

LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft coco: common objects in context[C]//European Conference on Computer Vision. Zurich, Switzerland: Springer, 2014: 740-755.

HODOSH M , YOUNG P , HOCKENMAIER J . Framing image description as a ranking task: data, models and evaluation metrics[J]. Journal of Artificial Intelligence Research, 2013, 47, 853- 899.

doi: 10.1613/jair.3994

PLUMMER B A, WANG L, CERVANTES C M, et al. Flickr30k entities: collecting region-to-phrase correspondences for richer image-to-sentence models[C]//Proceedings of the IEEE International Conference on Computer Vision. Santiago, Chile: IEEE, 2015: 2641-2649.

GRUBINGER M, CLOUGH P, MVLLER H, et al. The iapr tc-12 benchmark: a new evaluation resource for visual inform-ation systems[C]//International Workshop Ontoimage. Genoa, Italy: OntoImage, 2006: 13-55.

BYCHKOVSKY V, PARIS S, CHAN E, et al. Learning photographic global tonal adjustment with a database of input/output image pairs[C]//CVPR 2011. Piscataway, USA: IEEE, 2011: 97-104.

PAPINENI K, ROUKOS S, WARD T, et al. Bleu: a method for automatic evaluation of machine translation[C]//Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Istanbul, Turkey: ACL, 2002: 311-318.

LIN C Y . Rouge: a package for automatic evaluation of summaries[J]. Text Summarization Branches Out, 2004, 74- 81.

BANERJEE S, LAVIE A. METEOR: an automatic metric for MT evaluation with improved correlation with human judgments[C]//Proceedings of the Acl Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization. Istanbul, Turkey: ACL, 2005: 65-72.

VEDANTAM R, LAWRENCE ZITNICK C, Parikh D. Cider: consensus-based image description evaluation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE, 2015: 4566-4575.

ANDERSON P, FERNANDO B, JOHNSON M, et al. Spice: semantic propositional image caption evaluation[C]//Euopean Conference on Computer Vision. Amsterdam, The Netherlands: Springer, 2016: 382-398.

GAN Z, GAN C, HE X, et al. Semantic compositional networks for visual captioning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE, 2017: 5630-5639.

GU J, WANG G, CAI J, et al. An empirical study of language cnn for image captioning[C]//Proceedings of the IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2017: 1222-1231.

北归啦

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
计算机视觉的自动标注,基于深度学习的图像自动标注方法综述

1LI F F , IYER A , KOCH C , et al. What do we perceive in a glance of a real-world scene[J]. Journal of Vision, 2007, 7 (1): 10- 10.2KARPATHY A, LI F F.Deep visual-semantic alignments for generatin...
复制链接

扫一扫