分步表单_Microsoft新闻推荐大赛分步指南-CSDN博客

分步表单

This post is by Fangzhao Wu, Jingwei Yi, Yu Lei, Ying Qiao, Le Zhang, and Miguel González-Fierro, all at Microsoft.

这篇文章是由Microsoft的Fangzhao Wu，Yingwei Yi，Yu Lei，Yiao Qiao，Le Zhang和MiguelGonzález-Fierro撰写的。

介绍 (Introduction)

News recommendation has become a key machine learning technology for many news services, and an important experience for millions of people when they consume news. To facilitate open research on news recommendation, several teams at Microsoft recently released the Microsoft News Dataset (MIND) and launched the Microsoft News Recommendation Competition. This blog post provides a walk-through about developing an algorithm for the news recommendation problem in the competition and then submit it to competition for assessment. Code described in this post can be found in the Microsoft Recommenders Github repository.

新闻推荐已成为许多新闻服务的关键机器学习技术，并且是数以百万计的人消费新闻时的重要体验。为了促进对新闻推荐的公开研究，Microsoft的几个团队最近发布了Microsoft新闻数据集(MIND)并发起了Microsoft新闻推荐竞赛。该博客文章提供了有关开发竞赛中新闻推荐问题算法，然后将其提交给竞赛进行评估的演练。这篇文章中描述的代码可以在Microsoft Recommenders Github存储库中找到。

比赛基准 (Baselines for the Competition)

To help participants of the Microsoft News Recommendation Competition get started, we have made five baselines available: Deep Knowledge-Aware Network (DKN), Long- and Short-term User Representation (LSTUR), Attentive Multi-View Learning (NAML), Personal Attention (NPA) and Multi-Head Self-Attention (NRMS). The performance of these models on MIND is evaluated in this ACL paper. We use NRMS in this blog post as an example to illustrate the submission process, and code for all five baselines are on Microsoft Recommenders repository.

为了帮助Microsoft News Recommendation竞赛的参与者入门，我们提供了五个基准：深度知识感知网络(DKN)，长期和短期用户表示(LSTUR)，专心多视图学习(NAML)，个人注意(NPA)和多头自我注意(NRMS)。在ACL论文中评价了这些模型在MIND上的性能。我们以该博客文章中的NRMS为例来说明提交过程，并且所有五个基准的代码都位于Microsoft Recommenders存储库中。

NRMS (NRMS)

NRMS is a content-based neural news recommendation algorithm. It uses multi-head self-attention to capture the relatedness between words to learn news representations and capture the interactions between previously clicked news articles to learn user representations. It also uses additive attention to learn informative news and user representations by selecting important words and news, as shown in the figure below.

NRMS是基于内容的神经新闻推荐算法。它使用多头自我关注来捕获单词之间的相关性以学习新闻表示，并捕获先前单击的新闻文章之间的交互以学习用户表示。它还通过选择重要的单词和新闻来增加注意力，以学习信息丰富的新闻和用户表示形式，如下图所示。

Details about the algorithm can be found in this paper and the core NRMS algorithm is available here.

可以在本文中找到有关该算法的详细信息，并且可以在此处找到核心NRMS算法。

代码示例 (Code example)

A Jupyter notebook is provided to help competition participants get started with the NRMS algorithm. In the notebook, the MIND data set is downloaded first. To train a NRMS model, the original dataset should be copied from the competition platform. This step is made convenient with a utility function in the code example. It should be noted that the data set used for the competition is the “MINDlarge” set. It is recommended to get familiar with the “MINDdemo” or “MINDsample” data first.

提供了Jupyter笔记本来帮助竞赛参与者开始使用NRMS算法。在笔记本中，将首先下载MIND数据集。要训练NRMS模型，应从竞赛平台复制原始数据集。在代码示例中，使用实用程序功能使此步骤变得很方便。应当注意，用于比赛的数据集是“ MINDlarge”集。建议先熟悉“ MINDdemo”或“ MINDsample”数据。

More details about the training and evaluating process can be found in the notebook. To make sure the results are compliant with the submission requirement, the prediction scores are saved into zipped folders for uploading.

有关培训和评估过程的更多详细信息，请参阅笔记本。为了确保结果符合提交要求，预测分数将保存到压缩文件夹中以进行上传。

提交MIND比赛 (Submitting to MIND Competition)

Registration should be done before submission. Details about the registration can be found here. Send an email titled “MIND Competition Registration” to mind[at]microsoft.com with your information (CodaLab account nickname, real name, contact email and affiliation) and your agreement of Microsoft MIND News Recommendation Contest Official Rules (please write “I agree to the Microsoft MIND News Recommendation Contest Official Rules” in your email). Registrations should be approved within one or two days if the requested information is provided in full, and a confirmation email will be sent to the participant.

提交前应先注册。有关注册的详细信息可以在这里找到。发送一封名为“ MIND竞赛注册”的电子邮件到microsoft.com，带您的信息(CodaLab帐户的昵称，真实姓名，联系电子邮件和从属关系)以及您对Microsoft MIND新闻推荐大赛官方规则的同意(请输入“我同意”电子邮件中的“ Microsoft MIND新闻推荐大赛官方规则”)。如果完整提供了所要求的信息，则应在一两天内批准注册，然后将确认电子邮件发送给参与者。

The submission of results is allowed once the approval of the participant is finished. There are two phases in the competition, i.e. dev and test phase. In the dev phase, you can submit your results on the dev set to the Codalab system to obtain an official score. In the test phase, we will release the test set, and you can submit your predicted results on it to Codalab before the deadline.

一旦参与者的批准完成，就可以提交结果。竞赛分为两个阶段，即开发阶段和测试阶段。在开发阶段，您可以将开发集上的结果提交给Codalab系统以获得官方评分。在测试阶段，我们将发布测试集，您可以在截止日期之前将测试结果提交给Codalab。

You need several steps to make a submission on CodaLab:

您需要执行几个步骤才能在CodaLab上进行提交：

Navigate to ‘Participate’.
导航到“参与”。
Write a brief description of your model (optional).
编写模型的简短描述(可选)。

Click the button ‘Submit’.
点击“提交”按钮。
Upload your zipped submission. We use the zipped folder obtained in the previous steps (see the notebook) where the NRMS model is trained.
上载您的压缩提交。我们使用在前面的步骤( 请参阅笔记本 )中获得的压缩文件夹，在该文件夹中训练了NRMS模型。

Wait until the evaluation status turns to ‘Finished’ or ‘Failed’. The following figure shows a successful submission. Together with the submission status, the system also returns the scores generated from the evaluation of the model.
等待评估状态变为“完成”或“失败”。下图显示了成功的提交。连同提交状态一起，系统还返回从模型评估中生成的分数。

If the submission status is ‘Failed’, you can click ‘View scoring output log’ and then ‘View scoring error log’ to see the debug logs. When the evaluation is finished, you can decide whether to show your scores on the leaderboard.

如果提交状态为“失败”，则可以单击“查看评分输出日志”，然后单击“查看评分错误日志”以查看调试日志。评估完成后，您可以决定是否在排行榜上显示您的分数。

During the development phase, participants can upload their predictions on the validation set and tune their models according to the results. Although this submission is not mandatory, we highly encourage you to submit in case you have trouble obtaining the normal evaluation results. It can also be a useful practice for those participants new to CodaLab.

在开发阶段，参与者可以将他们的预测上传到验证集中，并根据结果调整模型。尽管此提交不是强制性的，但我们强烈建议您在无法获得正常评估结果的情况下提交。对于刚进入CodaLab的参与者来说，这也是一种有用的做法。

下一步 (Next Steps)

NRMS outperforms other baselines on MIND in our study, but there are still possible improvements:

在我们的研究中，NRMS优于MIND上的其他基准，但仍有可能改进：

Currently, we do not consider the positional information of words and news, but they may be useful for learning more accurate news and user representations.
当前，我们不考虑单词和新闻的位置信息，但是它们对于学习更准确的新闻和用户表示很有用。
Users usually have both long-term preferences and short-term interests. However, our method only learns short-term interests, i.e., learning user representations from the clicked news before current impression. By learning long-term user representations, we can incorporate information in multiple impressions, thus potentially getting a better user representation.
用户通常具有长期的偏好和短期的兴趣。但是，我们的方法仅学习短期兴趣，即从当前印象之前的点击新闻中学习用户表示。通过学习长期的用户表示，我们可以将信息整合到多个印象中，从而有可能获得更好的用户表示。
Recently, Graph Neural Network (GNNs) have been demonstrated to be powerful in learning on graph data. An elaborately constructed graph based on user behaviours may do the trick.
最近，已经证明了图形神经网络(GNN)在学习图形数据方面功能强大。基于用户行为精心构建的图形可以解决问题。

Please register for the competition and happy hacking!

请报名参加比赛，并祝黑客活动愉快！