马尔可夫模型和隐马尔可夫_词性标注和隐马尔可夫模型简介

马尔可夫模型和隐马尔可夫

by Divya Godayal

通过Divya Godayal

词性标注和隐马尔可夫模型简介 (An introduction to part-of-speech tagging and the Hidden Markov Model)

by Sachin Malhotra and Divya Godayal

Sachin MalhotraDivya Godayal撰写

Let’s go back into the times when we had no language to communicate. The only way we had was sign language. That’s how we usually communicate with our dog at home, right? When we tell him, “We love you, Jimmy,” he responds by wagging his tail. This doesn’t mean he knows what we are actually saying. Instead, his response is simply because he understands the language of emotions and gestures more than words.

让我们回到没有语言进行交流的时代。 我们唯一的方式是手语。 那就是我们通常在家里与狗交流的方式,对吗? 当我们告诉他“我们爱你,吉米”时,他用摇尾巴回答。 这并不意味着他知道我们实际上在说什么。 相反,他的React仅仅是因为他比单词更能理解情感和手势语言。

We as humans have developed an understanding of a lot of nuances of the natural language more than any animal on this planet. That is why when we say “I LOVE you, honey” vs when we say “Lets make LOVE, honey” we mean different things. Since we understand the basic difference between the two phrases, our responses are very different. It is these very intricacies in natural language understanding that we want to teach to a machine.

作为人类,我们对自然语言的许多细微差别的理解比对地球上任何动物的理解都多。 这就是为什么当我们说“我爱你,亲爱的”而当我们说“让我爱你,亲爱的”时,我们意味着不同的事情。 由于我们了解这两个词组之间的基本区别,因此我们的回答也大不相同。 我们想教一台机器就是自然语言理解中的这些非常复杂的东西。

What this could mean is when your future robot dog hears “I love you, Jimmy”, he would know LOVE is a Verb. He would also realize that it’s an emotion that we are expressing to which he would respond in a certain way. And maybe when you are telling your partner “Lets make LOVE”, the dog would just stay out of your business ?.

这可能意味着当您未来的机器狗听到“我爱您,吉米”时,他会知道爱是一个动词。 他还将意识到,这是我们正在表达的一种情感,他将以某种方式做出回应。 也许当您告诉您的伴侣“让爱成为现实”时,那只狗就不会经营您的生意了?

This is just an example of how teaching a robot to communicate in a language known to us can make things easier.

这只是一个例子,说明如何教机器人以我们已知的语言进行交流可以使事情变得更容易。

The primary use case being highlighted in this example is how important it is to understand the difference in the usage of the word LOVE, in different contexts.

在此示例中突出显示的主要用例是,在不同的上下文中,理解“爱”一词用法的区别有多重要。

词性标记 (Part-of-Speech Tagging)

From a very small age, we have been made accustomed to identifying part of speech tags. For example, reading a sentence and being able to identify what words act as nouns, pronouns, verbs, adverbs, and so on. All these are referred to as the part of speech tags.

从很小的时候起,我们就习惯了识别语音标签的一部分。 例如,阅读一个句子并能够识别哪些词充当名词,代词,动词,副词等。 所有这些都称为语音标签的一部分。

Let’s look at the Wikipedia definition for them:

让我们看看它们的Wikipedia定义:

In corpus linguistics, part-of-speech tagging (POS tagging or PoS tagging or POST), also called grammatical tagging or word-category disambiguation, is the process of marking up a word in a text (corpus) as corresponding to a particular part of speech, based on both its definition and its context — i.e., its relationship with adjacent and related words in a phrase, sentence, or paragraph. A simplified form of this is commonly taught to school-age children, in the identification of words as nouns, verbs, adjectives, adverbs, etc.

在语料库语言学中, 词性标记 ( POS标记PoS标记POST ),也称为语法标记单词类别歧义消除 ,是将文本(语料库)中的单词标记为与特定部分相对应的过程基于其定义和上下文(即,它与短语,句子或段落中相邻和相关单词的关系)的语言表达。 通常将这种简化形式教给学龄儿童,将单词识别为名词,动词,形容词,副词等。

Identifying part of speech tags is much more complicated than simply mapping words to their part of speech tags. This is because POS tagging is not something that is generic. It is quite possible for a single word to have a different part of speech tag in different sentences based on different contexts. That is why it is impossible to have a generic mapping for POS tags.

识别语音标签的一部分比简单地将单词映射到语音标签的部分要复杂得多。 这是因为POS标记不是通用的。 根据不同的上下文,单个单词很有可能在不同的句子中具有不同的语音标签部分。 这就是为什么不可能有POS标签的通用映射的原因。

As you can see, it is not possible to manually find out different part-of-speech tags for a given corpus. New types of contexts and new words keep coming up in dictionaries in various languages, and manual POS tagging is not scalable in itself. That is why we rely on machine-based POS tagging.

如您所见,无法为给定语料库手动找到不同的词性标签。 词典中不断出现各种类型的新上下文和新单词,并且手动POS标记本身无法扩展。 这就是为什么我们依赖基于机器的POS标记。

Before proceeding further and looking at how part-of-speech tagging is done, we should look at why POS tagging is necessary and where it can be used.

在继续进行并研究词性标记的完成方式之前,我们应该查看为什么需要POS标记以及可以在何处使用POS标记。

为什么使用词性标记? (Why Part-of-Speech tagging?)

Part-of-Speech tagging in itself may not be the solution to any particular NLP problem. It is however something that is done as a pre-requisite to simplify a lot of different problems. Let us consider a few applications of POS tagging in various NLP tasks.

词性标记本身可能不能解决任何特定的NLP问题。 但是,这是简化许多不同问题的先决条件。 让我们

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值