疯子的匣子_人工智能扮演``疯子''来像孩子一样学习语法

疯子的匣子

Advanced AI systems can figure out linguistic principles on their own, without first practicing on sentences that humans have labeled for them, according to new research.

一项新的研究表明,先进的人工智能系统可以自己找出语言原理,而无需先实践人类为他们标记的句子。

“In a sense, it’s nothing short of miraculous…”

“从某种意义上说,这简直就是奇迹……”

It’s much closer to how human children learn languages long before adults teach them grammar or syntax, the researchers report.

研究人员报告说,这与人类孩子在成年人教语法或语法之前早就学习语言的方式更加接近。

Even more surprising, however, they found that the AI model appears to infer “universal” grammatical relationships that apply to many different languages.

然而,更令人惊讶的是,他们发现AI模型似乎可以推断出适用于许多不同语言的“通用”语法关系。

人工智能,疯子和学习语言 (AI, Mad Libs, and learning language)

Imagine you’re training a computer with a solid vocabulary and a basic knowledge about parts of speech. How would it understand this sentence: “The chef who ran to the store was out of food”?

想象一下,您正在训练一台具有扎实的词汇量和有关词性基础知识的计算机。 它如何理解这句话:“跑到商店的厨师没饭了?”?

Did the chef run out of food? Did the store? Did the chef run the store that ran out of food?

厨师的食物用完了吗? 商店了吗? 厨师经营着没有食物的商店吗?

Most human English speakers will instantly come up with the right answer, but even advanced artificial intelligence systems can get confused. After all, part of the sentence literally says that “the store was out of food.”

大多数说英语的人会立即想到正确的答案,但是即使是先进的人工智能系统也会感到困惑。 毕竟,部分句子的字面意思是“商店没有食物了”。

Advanced new machine learning models have made enormous progress on these problems, mainly by training on huge datasets or “treebanks” of sentences that humans have hand-labeled to teach grammar, syntax, and other linguistic principles.

先进的新型机器学习模型已经在这些问题上取得了巨大进步,主要是通过对人类手工标记的语法,语法和其他语言原理的人工标注的巨大数据集或“树库”进行训练。

The problem is that treebanks are expensive and labor intensive, and computers still struggle with many ambiguities. The same collection of words can have widely different meanings, depending on the sentence structure and context.

问题在于,树库昂贵且劳动密集型,并且计算机仍然面临许多歧义。 取决于句子的结构和上下文,相同的单词集合可能具有截然不同的含义。

“All we’re doing is having these very large neural networks run these Mad Libs tasks, but that’s sufficient to cause them to start learning grammatical structures.”

“我们正在做的就是让这些非常大的神经网络运行这些Mad Libs任务,但这足以使他们开始学习语法结构。”

The new research has big implications for natural language processing, which is increasingly central to AI systems that answer questions, translate languages, help customers, and even review resumes. It could also facilitate systems that learn languages spoken by very small numbers of people.

这项新研究对自然语言处理产生了重大影响,自然语言处理对于回答问题 ,翻译语言,帮助客户甚至审查简历的 AI系统越来越重要。 它也可以促进学习少量人说的语言的系统。

The key to success? It appears that machines learn a lot about language just by playing billions of fill-in-the-blank games that are reminiscent of “Mad Libs.” In order to get better at predicting the missing words, the systems gradually create their own models about how words relate to each other.

成功的关键? 看来,机器通过玩数十亿种让人联想起“疯狂的库巴”的空白游戏,可以学到很多关于语言的知识 。 为了更好地预测丢失的单词,系统逐渐创建了自己的单词之间如何关联的模型。

“As these models get bigger and more flexible, it turns out that they actually self-organize to discover and learn the structure of human language,” says Christopher Manning, a professor of machine learning, of linguistics, and of computer science at Stanford University, as well as associate director of the Institute for Human-Centered Artificial Intelligence.

斯坦福大学机器学习,语言学和计算机科学教授克里斯托弗·曼宁(Christopher Manning)说:“随着这些模型变得越来越大,越来越灵活,事实证明它们实际上是自组织的,可以发现和学习人类语言的结构。”以及以人为中心的人工智能研究所副所长。

“It’s similar to what a human child does,” he says.

他说:“这与人类孩子的行为相似。”

BERT学习单词 (BERT learns words)

In the first study, the researchers began by using a state-of-the-art language processing model developed by Google that’s nicknamed BERT (short for “Bidirectional Encoder Representations from Transformers”). BERT uses a Mad Libs approach to train itself, but researchers had assumed that the model was simply making associations between nearby words. A sentence that mentions “hoops” and “jump shot,” for example, would prompt the model to search for words tied to basketball.

第一项研究中 ,研究人员首先使用了由Google开发的最新语言处理模型,该模型被称为BERT(“来自变压器的双向编码器表示法”的缩写)。 BERT使用Mad Libs方法进行自我训练,但研究人员假设该模型只是在附近单词之间建立关联。 例如,提及“篮球”和“跳投”的句子将促使模型搜索与篮球相关的词。

However, the researchers found that the system was doing something more profound: It was learning sentence structure in order to identify nouns and verbs as well as subjects, objects and predicates. That in turn improved its ability to untangle the true meaning of sentences that might otherwise be confusing.

但是,研究人员发现,该系统的作用更为深刻:它正在学习句子结构,以便识别名词和动词以及主语,宾语和谓语。 反过来,这也提高了其解开句子本来含义的能力,否则这些句子可能会造成混淆。

“If it knows that ‘she’ refers to Lady Gaga, for example, it will have more of an idea of what ‘she’ is likely doing.”

“例如,如果知道'她'是指Lady Gaga,它将对'她'可能正在做的事情有更多的了解。”

“If it can work out the subject or object of a blanked-out verb, that will help it to predict the verb better than simply knowing the words that appear nearby,” Manning says. “If it knows that ‘she’ refers to Lady Gaga, for example, it will have more of an idea of what ‘she’ is likely doing.”

曼宁说:“如果能够弄清空白动词的主语或宾语,那么将比仅仅知道附近出现的单词更好地预测动词。” “例如,如果知道'她'是指Lady Gaga,它将更多地了解'她'可能在做什么。”

That’s very useful. Take this sentence about promotional literature for mutual funds: “It goes on to plug a few diversified Fidelity funds by name.”

这非常有用。 可以用这句话来描述共同基金的促销文献:“继续按名称插入一些多元化的富达基金。”

The system recognized that “plug” was a verb, even though that word is usually a noun, and that “funds” was a noun and the object of the verb — even though “funds” might look like a verb. Not only that, the system didn’t get distracted by the string of descriptive words — ”a few diversified Fidelity” — between “plug” and “funds.”

系统识别出“ plug”是一个动词,即使该词通常是一个名词,而“ funds”是一个名词和该动词的宾语,即使“ funds”看起来像一个动词。 不仅如此,该系统也不会因“插入”和“资金”之间的一系列描述性词语(“几个多样化的保真度”)而分心。

The system also became good at identifying words that referred to each other. In a passage about meetings between Israelis and Palestinians, the system recognized that the “talks” mentioned in one sentence were the same as “negotiations” in the next sentence. Here, too, the system didn’t mistakenly decide that “talks” was a verb.

该系统还擅长识别彼此引用的单词。 该系统在有关以色列人和巴勒斯坦人之间的会议的一段话中承认,一句话中提到的“对话”与下一句话中的“谈判”相同。 在这里,系统也没有错误地认为“对话”是动词。

“In a sense, it’s nothing short of miraculous,” Manning says. “All we’re doing is having these very large neural networks run these Mad Libs tasks, but that’s sufficient to cause them to start learning grammatical structures.”

“从某种意义上说,这简直就是奇迹,”曼宁说。 “我们正在做的就是让这些非常大的神经网络运行这些Mad Libs任务,但这足以使他们开始学习语法结构 。”

发现通用语言原则 (Discovering Universal Language Principles)

In a separate paper, researchers found evidence that BERT teaches itself universal principles that apply in languages as different as English, French, and Chinese. At the same time, the system learned differences: In English, an adjective usually goes in front of the noun it’s modifying, but in French and many other languages it goes after the noun.

另一篇论文中 ,研究人员发现了证据,证明BERT可以自学通用的原理,这些原理适用于英语,法语和中文等不同的语言。 同时,系统发现了不同之处:在英语中,形容词通常位于它所修饰的名词之前,而在法语和许多其他语言中,它位于名词之后。

The bottom line is that identifying cross-language patterns should make it easier for a system that learns one language to learn more of them — even if they seem to have little in common.

最重要的是,识别跨语言模式应该使学习一种语言的系统更容易学习更多的语言,即使它们似乎没有什么共同点。

“This common grammatical representation across languages suggests that multilingual models trained on 10 languages should be able to learn an eleventh or a twelfth language much more easily,” Manning says. “Indeed, this is exactly what we are starting to find.”

曼宁说:“这种跨语言的通用语法表示法表明,使用10种语言训练的多语言模型应该能够更轻松地学习第11或第12语言。” “实际上,这正是我们开始发现的东西。”

Additional researchers from Stanford and Facebook contributed to the first study.

来自斯坦福大学和Facebook的其他研究人员为第一项研究做出了贡献。

Source: Stanford University

资料来源: 斯坦福大学

Original Study DOI: 10.1073/pnas.1907367117

原始研究 DOI:10.1073 / pnas.1907367117

Find more research news at Futurity.org

Futurity.org上 找到更多研究新闻

翻译自: https://medium.com/@Futurity/ai-plays-mad-libs-to-learn-grammar-the-way-kids-do-e8d921f3035

疯子的匣子

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值