Week4-4Earley Parser

最新推荐文章于 2024-06-25 09:36:35 发布

zypandora

最新推荐文章于 2024-06-25 09:36:35 发布

阅读量581

点赞数

分类专栏： NLP(Michigan)

本文链接：https://blog.csdn.net/zypandora/article/details/50073387

版权

NLP(Michigan) 专栏收录该内容

45 篇文章 0 订阅

订阅专栏

Background

Developed by Jay Earley in 1970
No need to convert grammar to CNF
Left to right

Complexity

fast than $O(n^3)$ in many cases

Earley Parser

look for both full and partial constituents
when reading word k, it has already identified all hypotheses that are consistent with words 1 to k-1

Data structure

It uses dynamic programming table, just like CKY
Example entry in column 1:
- [0:1] VP -> VP . PP
- created when processing word 1
- corresponds to words 0 to 1 (the part on the left of . represents the part that we have found, thus VP, and if we found later PP, we will find the whole non terminal)
- the dot(.) separates the completed(known) part from the incomplete(possibly unattainable) part

3 types of entries

‘scan’- for words
‘predict’ - for non-terminals
‘complete’ - otherwise

Example

Take this book.

这里写图片描述

at the end we could find that it is either a verb phrase or a sentence.

The problem of CFG

Agreement

Number
- Chen is/ People are
Person
- I am/ Chen is
was/ is/ will be
Case
Gender

Combinatorial explosion

Many combinations of rules are needed to express agreement
- S -> NP VP
- S -> 1sgNP 1sgVP
- S -> 2sgNP 2sgVP
- …

Subcategorization frames

For different type of words, the rules we have are different.

direct object
prepositional phrase
predictive adjective
bare infinitive
to-infinitive
participial phrase
that-clause
question-form clause

CFG independence assumption

The probability of different non terminals are not independent in the context of rules.

Remark: The solution of it is the Lexicalized CFG(PCFG).

Conclusion

这里写图片描述

Because the possibilities of combinations, the number of the parses of a sentence is exponential, so to find all the parses, the you have to spend exponential time.