自然语言处理:依存句法解析(Dependency Parsing)

自然语言的结构有哪些?怎么建立句法结构模型?大致可将句法结构分为两种:

  • phrase structure (context-free grammars), that organizes words into nested constituents;
  • dependency structure, that shows which words depend on (modify or are arguments of) which other words;

短语结构语法

句子有逐步嵌套的单元构成,我们可以将相邻单元/单词组合为更大的单元/单词,称之为短语或词组,然后继续将组合后的短语或词组组合为更大的单元:


为什么需要依存句法结构?

依存句法可以解释句子不同单元的联系,相同句子可能具有不能的依存结构,不同依存结构可能具有较大的语义差异。 因此,依据依存句法可以更好的理解句子,提升机器翻译等任务的准确性。


介词短语附着歧义(Prepositional phrase attachment ambiguity)

San Jose cops kill man with knife

There are two meanings of this sentence:

  • the cops stabs that guy;
  • the man has a knife;

Scientists count whales from space

There are two meanings of this sentence:

  • scientists counting the whales from space using something like a satellite;
  • the whales come from space;

对等范围歧义(Coordination scope ambiguity)

Shuttle veteran and longtime NASA executive Fred Gregory appointed to board

There are two meanings of this sentence:

  • a man is shuttle veteran and NASA executive;
  • shuttle veteran and NASA executive both of them have been appointed to the board;

依赖路径识别语义关系(Dependency paths identify semantic relations)

The results demonstrated that KaiC interacts rhythmically with SaSA KaiA and KaiB

We can get out of protein-protein interaction in dependency analysis, such as KaiC interacting with there other proteins over there.

The noun subjects here interacts with a noun modifier, and then it’s going to be there things that are beneath that of the SasA, and its conjoin things KaiA and KaiB are the things that interacts with.


依存句法结构

依存句法结构可用两种方法表示:线型结构表示树形结构表示,如下左右两幅图片:

The Rise of Annotated Data: Universal Dependencies Treebanks

依存句法开源标注集,涉及多种语言.


依存句法构建方法:

  • dynamic programming, complexity is O(n3);
  • graph algorithms;
  • constraint satisfaction;
  • “transition-based parsing” or “deterministic dependency parsing”;

Transition-based dependency parsers

Arc-standard transition-based parser

Analysis of Happy children like to play with their friends.

Actually, it had different choices of when to shift and when to reduce. You would’ve explored this exponential size of different possible parsers, that would be able to parse efficiently.

In the 60s, it can be come up with clever dynamic programming algorithms by relatively efficiently explore the space of all possible parsers.

It’s the 2000s (MaltParser), at a particular position in the parse and each action is predicted by a discriminative classifier (e.g. softmax classifier) over each legal more:

  • max of 3 choices/actions when untyped ; max of |R| x 2 + 1 when typed;
  • features: top of stack word, POS; first in buffer word, POS; etc

There is NO search (in the simplest form), but you can profitably do a beam search if you wish (slower but better), keep k good parse prefixes at each time step.


Evaluation of dependency parsing

  • unlabeled attachment score (UAS) , 正确标记关系的比率;
  • Labeled attachment score (LAS), 正确标记关系且关系标签正确的比率 ;

Why train a neural dependency parser?

  • indicated features that people hand-engineer were very sparse, and tend to be incomplete;
  • millions of features computation was just expensive;
  • 0
    点赞
  • 18
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值