建立产品目录eBays第二届年度大学机器学习竞赛

最新推荐文章于 2024-06-14 23:48:57 发布

weixin_26720761

最新推荐文章于 2024-06-14 23:48:57 发布

阅读量365

点赞数

文章标签： python 机器学习人工智能

原文链接：https://medium.com/ebaytech/building-a-product-catalog-ebays-2nd-annual-university-machine-learning-competition-2b163e730867

版权

After last year’s success, eBay is once again hosting a machine learning competition on an ecommerce dataset of eBay listings. This challenge is open to college and university students, and the winning team* will be offered a 2021 summer internship with eBay.

在去年取得成功之后， eBay再次在eBay列表的电子商务数据集中举办了一次机器学习竞赛。这项挑战向大学生开放，获胜的团队*将获得2021年夏季在eBay实习的机会。

We invite students to start using our dataset to solve a real-world ecommerce challenge. There are many datasets out there, but the primary focus has been recommender systems, price estimation, computer vision, Natural Language Processing (NLP), and more. None have been at a scale pertaining to mapping unstructured items to well-cataloged products. Like last year, we sincerely hope that making this real-world dataset available will entice students to explore the ecommerce domain further and come up with novel approaches to solve complex problems that can positively impact our platform and services.

我们邀请学生开始使用我们的数据集来解决现实世界中的电子商务挑战。那里有许多数据集，但主要重点是推荐系统，价格估计，计算机视觉，自然语言处理(NLP)等。没有一个关于将非结构化项目映射到目录良好的产品的规模。与去年一样，我们衷心希望能够提供这个真实世界的数据集，以吸引学生进一步探索电子商务领域，并提出新颖的方法来解决可能对我们的平台和服务产生积极影响的复杂问题。

挑战 (The Challenge)

问题 (Problem)

The question we invite students to address is how to identify two or more listings as being for the same product by putting them into the same group. We call this Product Level Equivalency (PLE). That is, if a buyer purchased two items from two different listings in a single group, and assuming the items were in the same condition, they would assess that they had obtained two instances of the same product. PLE is defined over manufacturer specifications. That is, offer specific details such as condition, or item location are to be ignored. For example, a broken phone and a new phone with the exact same specifications (make, model, color, memory size, etc.) are considered to be Product Level Equivalent, while a golden and a gray phone of otherwise the same make and model are not considered Product Level Equivalent.

我们邀请学生解决的问题是如何通过将两个或多个商品归入同一组来识别同一商品。我们称此为产品级别当量(PLE)。也就是说，如果买家从同一组的两个不同清单中购买了两个商品，并且假设这些商品处于相同状态，则他们将评估自己是否已获得同一商品的两个实例。 PLE是根据制造商的规格定义的。也就是说，要忽略诸如条件或项目位置之类的报价特定细节。例如，一部破损的手机和规格完全相同(品牌，型号，颜色，内存大小等)的新手机被视为等同于产品级别，而金色和灰色的手机则具有相同的品牌和型号不视为等效的产品级别。

The objective is thus to produce a clustering of the listings according to PLE. More mathematically, let L be the set of all listings. A clustering C is a partition of L into disjoint subsets:

因此，目标是根据PLE产生清单的聚类。从数学上讲，让L为所有列表的集合。聚类C是L到不相交子集的分区：

Ideally, all listings in each Ci are Product Level Equivalent, and listings from different clusters are not Product Level Equivalent.

理想情况下，每个Ci中的所有清单都是等效的产品级别，而来自不同类别的清单不是等效的产品级别。

The measurable objective, evaluation, submission format, and other details are available on EvalAI.

EvalAI上提供了可衡量的目标，评估，提交格式和其他详细信息。

数据 (Data)

The data set consists of approximately 1 million selected unlabeled public listings. We also provide an Annexure document that describes the columns and parsing logic.

数据集包含大约一百万个未标记的公开列表。我们还提供了附件文档，该文档描述了列和解析逻辑。

Approximately 25,000 of those listings will be clustered by eBay using human judgment (“true clustering”). These clustered listings will be split into three groups: a) Validation set (approximately 12,500 listings), b) Quiz set (approximately 6,250 listings), c) Test set (approximately 6,250 listings).

这些清单中约有25,000个将由eBay使用人工判断进行聚类(“真实聚类”)。这些聚集的清单将分为三组：a)验证集(约12,500个清单)，b)测验集(约6,250个清单)，c)测试集(约6,250个清单)。

The validation set is intended for participants to evaluate their approach. Anonymized identifiers and cluster labels will be provided to the participants. We will release the validation set along with the main dataset.

验证集旨在让参与者评估他们的方法。匿名标识符和集群标签将提供给参与者。我们将与主要数据集一起发布验证集。

The quiz data is used for leaderboard scoring. The test set is used as a factor to determine the winner. For the quiz and the test datasets, neither the listing identifiers nor the cluster labels will be provided to the participants.

测验数据用于排行榜得分。测试集用作确定获胜者的因素。对于测验和测试数据集，列表标识符和聚类标签都不会提供给参与者。

代管 (Hosting)

The challenge will be hosted on the open-source platform EvalAI. College and university students will submit their entries through EvalAI, which will be evaluated for leaderboard scoring. Please checkout the EvalAI challenge page for more details.

挑战赛将在开源平台EvalAI上进行。大学生和大学生将通过EvalAI提交参赛作品，该作品将通过排行榜得分进行评估。请查看EvalAI挑战页面以了解更多详细信息。

时间线 (Timelines)

Dates are subject to change, but expected deadlines will be:

日期可能会有所更改，但是预期的截止日期将是：

August 24th, 2020 — Challenge begins. Access to the dataset is granted. We start accepting submissions through EvalAI and begin the evaluations.

2020年8月24日-挑战开始。授予对数据集的访问权限。我们开始通过EvalAI接受提交并开始评估。

February 1st, 2021 — Challenge ends.

2021年2月1日-挑战结束。

February 22nd, 2021 — We announce winners.

2021年2月22日-我们宣布获奖者。

参与标准和奖励 (Participation Criteria and Prize)

Teams (no more than 5 members per team) must only include students who are interested in an internship.

小组(每个小组不得超过5名成员)只能包括对实习感兴趣的学生。

Assuming eligibility criteria are met, members of the winning team will be offered an internship for Summer 2021 at eBay Inc. eBay’s internship program is a combination of real work experience plus a robust program giving interns exposure to various business verticals, executives and networking opportunities. The internship will also be an excellent opportunity for students to put their ML models into real use.

假设符合资格标准，获胜团队的成员将在eBay Inc.获得2021年夏季的实习机会。eBay的实习计划是实际工作经验与强大计划的结合，使实习生有机会接触各种业务领域，高管和人脉关系。实习也将是学生将ML模型投入实际使用的绝佳机会。

Further details on the participant eligibility criteria, internship prize eligibility criteria, official contest agreement, and rules for the competition, as well as other details, are available as part of the official contest rule package. See eBay Contact details below to receive the official contest rule package.

有关参赛者资格标准，实习奖赏资格标准，正式比赛协议和比赛规则的更多详细信息以及其他详细信息，可作为正式比赛规则包的一部分获得。请参阅下面的eBay联系详细信息，以获取官方竞赛规则包。

eBay联系方式 (eBay Contact)

To find out more about how to participate in the challenge and receive the official contest rule package, please reach out to MLChallenge@ebay.com.

要了解有关如何参与挑战并获得官方竞赛规则包的更多信息，请联系MLChallenge@ebay.com。

*Teams should be no more than five members

*团队人数不得超过五人

Originally published at https://tech.ebayinc.com on August 25, 2020.

最初于 2020年8月25日 在 https://tech.ebayinc.com 上发布。

翻译自: https://medium.com/ebaytech/building-a-product-catalog-ebays-2nd-annual-university-machine-learning-competition-2b163e730867

weixin_26720761

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
建立产品目录eBays第二届年度大学机器学习竞赛

After last year’s success, eBay is once again hosting a machine learning competition on an ecommerce dataset of eBay listings. This challenge is open to college and university students, and the winnin...
复制链接

扫一扫