cv竞赛和kaggle_关于抽象和推理挑战kaggle竞赛的评论

cv竞赛和kaggle

这项比赛由FrançoisChollet主持。 (This competition was hosted by François Chollet.)

This report has been prepared by Somayeh Gholami and Mehran Kazeminia.

该报告由 Somayeh Gholami Mehran Kazeminia编写

Currently, Machine learning techniques can only use the patterns that they have already seen. It means that initially certain patterns are set for the machines and then they are exposed to the pertinent data so that they can learn new skills. But like humans, could machine in the future answer the reasoning questions they have never seen before? Could machines learn complex and abstract tasks just from a few examples? This was exactly the theme of the recent abstraction and reasoning challenge, which terminated recently, and it’s one of Kaggle’s most controversial challenges. In this challenge, participants were asked to develop artificial intelligence, within three months, which can solve reasoning questions that they had not seen before. Introducing this contest, Kaggle wrote:

当前,机器学习技术只能使用他们已经看到的模式。 这意味着首先为机器设置了某些模式,然后将它们暴露于相关数据,以便他们可以学习新技能。 但是,像人类一样,未来的机器能否回答他们从未见过的推理问题? 机器可以仅通过几个示例来学习复杂的抽象任务吗? 这正是最近的抽象和推理挑战的主题,该挑战最近终止,并且是Kaggle最具争议的挑战之一。 在这项挑战中,要求参与者在三个月内开发人工智能,以解决以前从未见过的推理问题。 在介绍此竞赛时,Kaggle写道:

“It provides a glimpse of a future where AI could quickly learn to solve new problems on its own. The Kaggle Abstraction and Reasoning Challenge invites you to try your hand at bringing this future into the present!”

“它提供了一个未来的概览,人工智能可以Swift学会自己独立解决新问题。 Kaggle抽象与推理挑战赛邀请您尝试将这个未来带入现在!”

The reasoning questions of this challenge were like the intelligence tests for humans and included simple, medium, and sometimes rather difficult questions. Of course, an ordinary human was able to answer all the questions within an adequate time, and none of the questions were extremely complex. But the challenge was how to train machines all reasoning concepts like; the color change, resize, change the order, etc, to enable them to pass a human intelligence test which they have never been seen before.

这个挑战的推理问题就像对人类的智力测验一样,包括简单,中等甚至有时是困难的问题。 当然,普通人能够在足够的时间内回答所有问题,而且这些问题都不是非常复杂的。 但是挑战在于如何训练机器,例如所有推理概念。 颜色更改,调整大小,更改顺序等,以使它们能够通过从未见过的人类智力测验。

The prize for this match was a total of twenty thousand dollars, which was divided between the first three (first three teams). But as guessed; Even the results of whom at the top of the list were not promising. The challenge involved nearly a thousand participants, half of whom did not answer any of the questions correctly. If a team’s algorithm did not work at all, it would get a score of one, and if it could answer a few questions correctly, for example, it would get a score of Ninety-eight hundredths or…. However, only twelve teams were able to score less than 0.90. The following is the final table of match scores for on the top thirty.

这场比赛的奖金总额为2万美元,该奖金分配给前三名(前三支球队)。 但正如猜测; 即使是排名第一的人的结果也没有希望。 挑战涉及近一千名参与者,其中一半没有正确回答任何问题。 如果一个团队的算法根本不起作用,它将得到1分,如果能够正确回答一些问题,例如,它将得到98分或…。 但是,只有十二支球队的得分低于0.90。 以下是前30名比赛得分的最终表。

Image for post
Image for post

This match was not a classification challenge, it means that all the answers should be made in the form of the picture (matrix) rather than selected from several visual options that led the competition more complicated. Perhaps, for this reason, those who either thought that they could train machines only by the conventional and classical way, or they could advance the work by speculation, were utterly disappointed. Of course, some participants reasoned out the instances which had simpler solutions and were considered as an exception. It is clear that in the best case they solved only a few numbers of instances and did not have much success.

这场比赛不是分类挑战,它意味着所有答案都应以图片(矩阵)的形式做出,而不是从导致比赛更加复杂的几种视觉选择中进行选择。 也许由于这个原因,那些以为他们只能通过传统和传统的方式来训练机器,或者可以通过推测来推进工作的人,完全失望了。 当然,一些参与者对具有更简单解决方案的实例进行了推理,并被视为例外。 显然,在最佳情况下,它们仅解决了少数几个实例,并没有太大的成功。

Although, in this contest, the winners and participants’ ingenuity and effort are admirable, meanwhile at a glance at the scoreboard, it seems that we are still far from the final answer and there was no guarantee that whether the best approach chosen by participants. However, the winners of this contest have generously described their creative method in the following links, and some of them have provided their complete codes.

尽管在本次比赛中,获胜者和参与者的独创性和努力令人钦佩,但在计分板上一目了然,看来我们离最终答案还差得很远,并且不能保证参与者是否选择了最佳方法。 但是,本次比赛的优胜者在以下链接中慷慨地描述了他们的创作方法,其中一些人提供了完整的代码。

分享的金牌解决方案列表: (List of gold medal solutions shared:)

1st place solution by icecuber

Icecuber 第一名解决方案

2nd place solution by Alejandro de Miquel

Alejandro de Miquel的 第二名解决方案

3rd place solution by Vlad Golubev

Vlad Golubev的 第三名解决方案

3rd place solution by Ilia

伊利亚(Ilia)的 第三名

5th place solution by alijs

alijs 第五名

6th place solution by Zoltan

Zoltan 第六名

8th place solution by Andy Penrose

安迪·彭罗斯(Andy Penrose)的 第八名

8th place solution by Maciej Sypetkowski

Maciej Sypetkowski的 第八名解决方案

8th place solution by Jan Bre

Jan Bre的 第八名解决方案

9th place solution by Hieu Phung

Hieu Phung获得 第9名

10th place solution by Alexander Fritzler

亚历山大·弗里茨勒(Alexander Fritzler) 第十名

If you are interested in this topic, you can get a lot of information about this challenge on the Kaggle website as well as François Cholet’s Github. Of course, if you want to take your initiative and try your approaches, we have some tips for you; To get started, first study Mr. François Cholet’s article page no.64 on measuring intelligence:

如果您对此主题感兴趣,可以在Kaggle网站以及FrançoisCholet的Github上获得有关此挑战的大量信息。 当然,如果您想采取主动并尝试您的方法,我们将为您提供一些技巧。 首先,请先研究FrançoisCholet先生关于测量智能的文章第64页:

On the Measure of Intelligence | François Chollet

关于智力的测量 弗朗索瓦·乔莱特

You can also refer to the Discussion and Notebooks section of this challenge on the Kaggle website and read the recommendations of the host, winners, and all participants directly. Finally, here are some key tips from François Chollet:

您也可以在Kaggle网站上参阅此挑战的“讨论和笔记本”部分,并直接阅读主持人,获奖者和所有参与者的建议。 最后,这是FrançoisChollet的一些关键技巧:

如何开始? (How to get started?)

fchollet — Competition Host:

fchollet-比赛举办地:

If you don’t know how to get started, I would suggest the following template:

如果您不知道如何开始,建议使用以下模板:

Take a bunch of tasks from the training or evaluation set — around 10.For each task, write by hand a simple program that solves it. It doesn’t matter what programming language you use — pick what you’re comfortable with.Now, look at your programs, and ponder the following:1) Could they be expressed more naturally in a different medium (what we call a DSL, a domain-specific language)?2) What would a search process that outputs such programs look like (regardless of conditioning the search on the task data)?3) How could you simplify this search by conditioning it on the task data?4) Once you have a set of generated candidates for a solution program, how do you pick the one most likely to generalize?

从培训或评估集中获取一堆任务-大约10个。对于每个任务,请手动编写一个简单的程序来解决该问题。 使用哪种编程语言都没关系-选择适合的语言。现在,请看一下程序,并思考以下内容:1)是否可以在其他媒介(我们称为DSL, 2)输出此类程序的搜索过程是什么样的(无论是否对任务数据进行搜索限制)?3)如何通过对任务数据进行搜索限制来简化此搜索?4)一旦为解决方案计划生成了一组候选者,您如何选择最可能推广的候选者?

You will not find tutorials online on how to do any of this. The best you can do is read past literature on program synthesis, which will help with step 3). But even that may not be that useful :)

您将无法在线找到有关如何执行此操作的教程。 您可以做的最好的事情就是阅读有关程序合成的以往文献,这对步骤3)有所帮助。 但是即使那样也可能没有用:)

This challenge is something new. You are expected to think on your own and come up with novel, creative ideas. It’s what’s fun about it!

这个挑战是新事物。 您应该自己思考,并提出新颖的创意。 这很有趣!

硬编码规则是否不合格? (Does hard-coding rules disqualify?)

fchollet — Competition Host:

fchollet-比赛举办地:

You can hard-code rules & knowledge, and you can use external data

您可以对规则和知识进行硬编码,还可以使用外部数据

我们可以“探测”排行榜以获得有关测试集的信息吗? (Can we “probe” the leaderboard to get information about the test set?)

fchollet — Competition Host:

fchollet-比赛举办地:

Using your LB score as feedback to guess the exact contents of the test set is against the spirit of the competition. In fact, it is against the spirit of every Kaggle competition. The goal of the competition is to create an algo that will turn the demonstration pairs of a task into a program that solves the task — not to reverse-engineer the private test set.

使用您的LB分数作为反馈来猜测测试集的确切内容与比赛的精神背道而驰。 实际上,这违背了所有Kaggle竞赛的精神。 竞赛的目的是创建一种算法,将算法的演示对转换为解决该任务的程序-而不是对私有测试集进行反向工程。

Further, this is a waste of your time. It is extremely unlikely that you would be able to guess an exact output or an exact task. This is why we decided not to have a separate public and private leaderboard: probing is simply not going to work.

此外,这浪费您的时间。 您极不可能猜出确切的输出或确切的任务。 这就是为什么我们决定不使用单独的公共排行榜和私人排行榜:探究根本行不通的原因。

That is because:1) test tasks have no exact overlap with training and eval tasks (although they look “similar” in the sense that they’re the same kind of puzzle, built on top of Core Knowledge systems)2) the space of all possible ARC tasks is very large, and very diverse.

这是因为:1)测试任务与培训和评估任务没有完全重叠(尽管它们看起来像是“类似”,建立在核心知识系统的基础上,但看上去“相似”)2)所有可能的ARC任务非常大,而且非常多样化。

So you’re not going to get a hit by either trying everything found in the train and eval set, or by just randomly guessing new tasks. You would have better luck trying to guess the exact melodies of the top 100 pop songs of 2021.

因此,无论是尝试训练和评估集合中发现的所有内容,还是仅仅随机猜测新任务,都不会受到打击。 如果您能猜出2021年前100首流行歌曲的确切旋律,那会更好。

评估集和测试集中的难度是否相似? (Is the level of difficulty similar in evaluation set and test set?)

fchollet — Competition Host:

fchollet-比赛举办地:

The difficulty level of the evaluation set and test set are about the same. Both are more difficult than the training set. That is because the training set deliberately contains elementary tasks meant to serve as Core Knowledge concept demonstration.

评估集和测试集的难度级别大致相同。 两者都比训练集更难。 这是因为培训集故意包含了一些基本任务,目的是作为核心知识的概念演示。

我们可以在解决方案中使用来自训练和评估集的数据吗? (Can we use data from both the training and evaluation sets in our solutions?)

fchollet — Competition Host:

fchollet-比赛举办地:

I would recommend only using data from the training set to develop your algorithm. Using data from both the training set and evaluation set isn’t at all against the rules, so could you do it, but it would be bad practice, since it would prevent you from accurately evaluating your algorithms.

我建议仅使用训练集中的数据来开发算法。 使用训练集和评估集中的数据完全不违反规则,您可以这样做,但是这是不好的做法,因为这会阻止您准确地评估算法。

The goal of this competition is to develop an algorithm that can make sense of tasks it has never seen before. You’ll want to be able to check how well your algorithm perform before submitting it. For this purpose, you need a set of tasks that your algorithm has never seen, and further, that you have never seen. That’s the evaluation set. So don’t leak too much into information from the evaluation set into your algorithm, or you won’t be able to evaluate it.

这项竞赛的目的是开发一种算法,该算法可以处理从未见过的任务。 您将希望能够在提交算法之前检查算法的性能。 为此,您需要一组算法从未见过的任务,并且进一步需要您从未见过的任务。 那是评估集。 因此,不要将太多信息从评估集中泄漏到算法中,否则您将无法评估它。

Note that the “test” set is a placeholder (copied from the evaluation set) for you to check that your submission is working as intended. The real test set used for the leaderboard is fully private.

请注意,“测试”集是一个占位符(从评估集复制),供您检查提交的内容是否按预期工作。 用于排行榜的真实测试集是完全私有的。

Image for post
8th place solution by Maciej Sypetkowski 第八名解决方案

So everything is ready. Have a coffee and get started.

一切准备就绪。 喝杯咖啡,开始吧。

Good luck.Somayyeh Gholami & Mehran Kazeminia

祝你好运Somayyeh Gholami和Mehran Kazeminia

翻译自: https://medium.com/swlh/a-commentary-on-the-abstraction-and-reasoning-challenge-kaggle-competition-16ba30fac0ec

cv竞赛和kaggle

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值