摘要
对文本是单个词语的一般采用词袋特征
图片可以采用多种灵活的处理方式
对文本是句子甚至是段落的则需要采用较为复杂的处理方式,参考文献[1-2]中对Wiki和Pascal Sentence数据集的处理方式
Wiki
参考文献
[1] Wang D, Gao X, Wang X, et al. Multimodal discriminative binary embedding for large-scale cross-modal retrieval[J]. IEEE Transactions on Image Processing, 2016, 25(10): 4540-4554.
参考文献
[2] Wei Y, Zhao Y, Lu C, et al. Cross-modal retrieval with CNN visual features: A new baseline[J]. IEEE transactions on cybernetics, 2016, 47(2): 449-460.
MIRFlickr
参考文献
[1] Wang D, Gao X, Wang X, et al. Multimodal discriminative binary embedding for large-scale cross-modal retrieval[J]. IEEE Transactions on Image Processing, 2016, 25(10): 4540-4554.
注:DCMH中对文本数据的描述有误,以此处描述为准
Pascal Sentence
参考文献
[2] Wei Y, Zhao Y, Lu C, et al. Cross-modal retrieval with CNN visual features: A new baseline[J]. IEEE transactions on cybernetics, 2016, 47(2): 449-460.