MM-Rec: Multimodal News Recommendation MM-Rec:多模态新闻推荐

Multiset

已于 2023-05-29 14:30:22 修改

阅读量530

点赞数 1

分类专栏：多模态推荐文章标签：深度学习计算机视觉人工智能

于 2023-01-11 16:23:43 首次发布

本文链接：https://blog.csdn.net/weixin_43221749/article/details/128646838

版权

多模态推荐专栏收录该内容

6 篇文章 0 订阅

订阅专栏

“MM-Rec: Multimodal News Recommendation” MM-Rec:多模态新闻推荐

Abstract
- 以往的缺点
“Accurate news representation is critical for news recommendation. Most of existing news representation methods learn news representations only from news texts while ignoring the visual information in news like images.” 准确的新闻表征对于新闻推荐至关重要。大多数现有的新闻表征方法只从新闻文本中学习新闻表征，而忽略了新闻中的视觉信息，如图片。
- 本文创新点
“In this paper, we propose a multimodal news recommendation method, which can incorporate both textual and visual information of news to learn multimodal news representations.” 在本文中，我们提出了一种多模态新闻推荐方法，它可以结合新闻的文本和视觉信息来学习多模态新闻表征。

“we propose a crossmodal candidate-aware attention network to select relevant historical clicked news for accurate user modeling by measuring the crossmodal relatedness between clicked news and candidate news.” 我们提出了一个跨模态的候选者意识的注意力网络，通过测量点击新闻和候选新闻之间的跨模态关联性，选择相关的历史点击新闻进行准确的用户建模。

2.Introduction

“In this paper, we present a multimodal news recommendation method named MM-Rec,” 在本文中，我们提出了一种名为MM-Rec的多模态新闻推荐方法。

提及到的算法

“autoencoders” 自动编码器

“personalized attention network” 个性化的注意力网络

“multi-head self-attention networks” 多头自注意网络

“MM-REC” MM-REC

“multimodal news encoder” 多模态新闻编码器

“crossmodal candidate-aware attention.” 跨模态的候选者意识的注意。

3.Related Work

图像部分：使用Mask R-CNN来提取新闻图像的兴趣区域。然后使用ResNet-50模型来提取ROI的特征。

应用预先训练好的视觉语言学模型ViLBERT[14]，在学习新闻标题和图像的表征时，捕捉它们之间固有的关联性。

4.Experiments

“Since there is no high-quality dataset that contains multimodal information of news1, we constructed a dataset based on the logs collected from the Microsoft News website during three weeks (from Feb. 25, 2020 to Mar. 16, 2020)” 由于没有高质量的包含新闻多模态信息的数据集，我们根据三周内（从2020年2月25日至2020年3月16日）从微软新闻网站收集的日志构建了一个数据集。

“Logs in the first week were used to construct user histories and the rest sessions were used to form click and non-click samples” 第一周的日志被用来构建用户历史，其余会话被用来形成点击和非点击样本。

“We sorted these sessions by time, and the first 1,000,000 sessions were used for training, the next 100,000 sessions for validation and the rest for test.” 我们按时间对这些会话进行排序，前100万个会话用于训练，后10万个会话用于验证，其余用于测试。

“In our experiments, we finetuned the last three layers of ViLBERT.” 在我们的实验中，我们对ViL-BERT的最后三层进行了微调。

“We also study the effectiveness of the co-attentional Transformers in the ViLBERT model and the crossmodal candidate-aware attention network for user interest modeling. We compare MM-Rec and its variants without co-attentional Transformers or replacing the crossmodal candidate-aware attention with the vanilla attention mechanism used in [23].” (Wu 等, 2022, p. 4) 我们还研究了ViLBERT模型中的共同注意转换器和跨模块候选感知注意网络在用户兴趣建模中的有效性。我们比较了MM-Rec及其变体，没有共同注意转换器，也没有用【23】中使用的普通注意机制取代跨通道候选感知注意。

提及到的算法

ViL-BERT

“Adam”

“negative sampling ratio”负采样率

“LibFM”

“DSSM”

“Wide&Deep [”

“DeepFM”

“EBNR”

“DKN”

“DAN”

“NAML”

“NRMS”

评价指标

“AUC” Area under curve

“MRR”平均倒数排名

“nDCG@5”

消融研究

只考虑标题或者只考虑图片。结果证明标题和图片对于推荐的准确率都有重要的作用。

5.Conclusion

“We propose to use a visiolinguistic model to encode both news texts and images and capture their inherent crossmodal relatedness.” 我们建议使用视觉语言学模型对新闻文本和图像进行编码，并捕捉它们内在的跨模态关联性。

“In addition, we propose a crossmodal candidate-aware attention network to select relevant clicked news based on their crossmodal relevance to candidate news, which can better model users’ specific interest in candidate news.” 此外，我们提出了一个跨模态的候选者感知的注意力网络，根据其与候选新闻的跨模态相关性来选择相关的点击新闻，这可以更好地模拟用户对候选新闻的具体兴趣。