Week6-7Word Sense Disambiguation

最新推荐文章于 2020-05-12 07:55:50 发布

zypandora

最新推荐文章于 2020-05-12 07:55:50 发布

阅读量525

点赞数

分类专栏： NLP(Michigan)

本文链接：https://blog.csdn.net/zypandora/article/details/50364786

版权

45 篇文章 0 订阅

订阅专栏

Introduction

Task
- given a word
- and its context
Use for machine translation
- e.g., translate ‘play’ into Spanish
- play the violin = tocar el violin
- play tennis = jugar al tenis
Other uses
- accent restoration
- text to speech generation
- spell correction
- capitalization restoration

look at only 2 senses at a time
ordered rules: collocation gives sense
Formula
$log (p ( s e n s e a ∣ c o l l o c a t i o n i ) p ( s e n s e b ∣ c o l l o c a t i o n i ))$ $\log(\frac{p(sense_a\mid collocation_i)}{p(sense_b \mid collocation_i)})$

we set collocations a window size of the words around the word bass to be disambiguated

for an ambiguous word, start with 2 senses and seeds for each sense, and come up with 1 strong and indicative collocation for each sense
- e.g., plant1:leaf, plant2:factory
Label the ambiguous word in the training set, if it is near leaf, label plant1. But we will only label a small portion of data, say 1% for plant1, 1% for plant2, and 98% for unlabelled.
For unlabelled data, look for additional collocations that appear in training data.
- e.g., plant1:living
go to 2.
continue this process until all of the training set is labelled.
- 2 principles:
- 1 collocation for each word
- 1 collocation for each discourse

Metric
- A = number of assigned senses
- C = number of words assigned correct senses
- T = total number of test words
- Precision = C/A, Recall = C/T
Result
- Best: 77P/77R
- human lexicographer: 97P/96R

关注