关于斯坦福句法剖析器 -- 答网友

转载自:http://blog.sina.com.cn/s/blog_72d083c701017r9t.html

                                   冯志伟

有网友问我:Stanford parser 和Berkeley parser 是哪种类型的分析器?

我觉得,Stanford parser基本上是一个词汇化的概率上下文无关语法分析器,同时也使用了依存分析。根据不同的语法观点可以输出不同的的分析结果。所以,可以认为是一个使用混合分析方法的剖析器。

Berkeley Parser主要是一个概率上下文无关语法的分析器。

下面,我们以Stanford Parser为例,具体地介绍一下。

我们来分析如下的句子。Stanford parser可以给出不同过的结果:

The strongest rain ever recorded in India shut down the financial hub of Mumbai, snapped communication lines, closed airports and forced thousands of people to sleep in their offices or walk home during the night, officials said today.

1. 下面是这句话的词类标注结果 (part-of-speech tagged text):

The/DT strongest/JJS rain/NN ever/RB recorded/VBN in/IN India/NNP
shut/VBD down/RP the/DT financial/JJ hub/NN of/IN Mumbai/NNP ,/,
snapped/VBD communication/NN lines/NNS ,/, closed/VBD airports/NNS
and/CC forced/VBD thousands/NNS of/IN people/NNS to/TO sleep/VB in/IN
their/PRP$ offices/NNS or/CC walk/VB home/NN during/IN the/DT night/NN
,/, officials/NNS said/VBD today/NN ./.
 
2. 下面是上下文无关短语结构语法的树形表示( a context-free phrase structure grammar representation)
(ROOT
  (S
    (S
      (NP
        (NP (DT The) (JJS strongest) (NN rain))
        (VP
          (ADVP (RB ever))
          (VBN recorded)
          (PP (IN in)
            (NP (NNP India)))))
      (VP
        (VP (VBD shut)
          (PRT (RP down))
          (NP
            (NP (DT the) (JJ financial) (NN hub))
            (PP (IN of)
              (NP (NNP Mumbai)))))
        (, ,)
        (VP (VBD snapped)
          (NP (NN communication) (NNS lines)))
        (, ,)
        (VP (VBD closed)
          (NP (NNS airports)))
        (CC and)
        (VP (VBD forced)
          (NP
            (NP (NNS thousands))
            (PP (IN of)
              (NP (NNS people))))
          (S
            (VP (TO to)
              (VP
                (VP (VB sleep)
                  (PP (IN in)
                    (NP (PRP$ their) (NNS offices))))
                (CC or)
                (VP (VB walk)
                  (NP (NN home))
                  (PP (IN during)
                    (NP (DT the) (NN night))))))))))
    (, ,)
    (NP (NNS officials))
    (VP (VBD said)
      (NP-TMP (NN today)))
    (. .)))
 
3.下面是一个类型化的依存表示结果(a typed dependency representation)。
我们首先给句子中的单词标号:

1. The

2. strongest

3. rain

4. ever

5. recorded

6. in

7. India

8. shut

9. down

10. the

11. financial

12. hub

13. of

14. Mumbai

15. ,

16. snapped

17. communication

18. lines

19. ,

20. closed

21. airports

 22. and

23. forced

24. thousands     

25. of

26. people

27. to

28. sleep

29. in

30. their

31. offices

32. or

33. walk

34. home

35. during

36. the

37. night

38. ,

39. officials

40. said

41. today

 

下面是依存关系的分析结果。前项是支配词(governor),后项是从属词(dependent)。
 
det(rain-3, The-1)
amod(rain-3, strongest-2)
nsubj(shut-8, rain-3)
nsubj(snapped-16, rain-3)
nsubj(closed-20, rain-3)
nsubj(forced-23, rain-3)
advmod(recorded-5, ever-4)
partmod(rain-3, recorded-5)
prep_in(recorded-5, India-7)
ccomp(said-40, shut-8)
prt(shut-8, down-9)
det(hub-12, the-10)
amod(hub-12, financial-11)
dobj(shut-8, hub-12)
prep_of(hub-12, Mumbai-14)
conj_and(shut-8, snapped-16)
ccomp(said-40, snapped-16)
nn(lines-18, communication-17)
dobj(snapped-16, lines-18)
conj_and(shut-8, closed-20)
ccomp(said-40, closed-20)
dobj(closed-20, airports-21)
conj_and(shut-8, forced-23)
ccomp(said-40, forced-23)
dobj(forced-23, thousands-24)
prep_of(thousands-24, people-26)
aux(sleep-28, to-27)
xcomp(forced-23, sleep-28)
poss(offices-31, their-30)
prep_in(sleep-28, offices-31)
xcomp(forced-23, walk-33)
conj_or(sleep-28, walk-33)
dobj(walk-33, home-34)
det(night-37, the-36)
prep_during(walk-33, night-37)
nsubj(said-40, officials-39)
tmod(said-40, today-41)

所有这些结果都是根据不同的语法观点输出的不同结果。

 

 

 

北京航空航天大学外国语学院卫乃兴教授悼念著名语料库语言学家Sinclair的英文悼词的前4句是:
We are shocked to hear that Professor John Sinclair has left us.

Undoubtedly, the 13th of March 2007 is a saddest day to the world linguistics, Corpus Linguistics in particular.

The gap left by the departure of this innovative thinker and distinguished linguist will be felt in the hearts of the researchers working along the lines he has set.

In deepest sorrow, we, linguists at Shanghai Jiao Tong University, China, found that we cannot express with words our gratitude and respect to John.

Stanford parser得到结果如下:
Parsed 94 words in 4 sentences (13.73 wds/sec; 0.58 sents/sec).

每句的树形结构如下:

1. We are shocked to hear that Professor John Sinclair has left us.
概率短语结构语法的结果:

(ROOT
(S
(NP (PRP We))
(VP (VBP are)
(ADJP (JJ shocked)
(S
(VP (TO to)
(VP (VB hear)
(SBAR (IN that)
(S
(NP (NNP Professor) (NNP John) (NNP Sinclair))
(VP (VBZ has)
(VP (VBN left)
(NP (PRP us)))))))))))
(. .)))
依存语法的结果:
nsubj(shocked-3, We-1)
cop(shocked-3, are-2)
aux(hear-5, to-4)
xcomp(shocked-3, hear-5)
complm(left-11, that-6)
nn(Sinclair-9, Professor-7)
nn(Sinclair-9, John-8)
nsubj(left-11, Sinclair-9)
aux(left-11, has-10)
ccomp(hear-5, left-11)
dobj(left-11, us-12)



2. Undoubtedly, the 13th of March 2007 is a saddest day to the world linguistics, Corpus Linguistics in particular.
概率短语结构语法的结果:

(ROOT
(S
(ADVP (RB Undoubtedly))
(, ,)
(NP
(NP (DT the) (NN 13th))
(PP (IN of)
(NP (NNP March) (CD 2007))))
(VP (VBZ is)
(NP
(NP (DT a) (JJ saddest) (NN day))
(PP (TO to)
(NP
(NP (DT the) (NN world) (NNS linguistics))
(, ,)
(NP
(NP (NNP Corpus) (NNP Linguistics))
(PP (IN in)
(NP (NN particular))))))))
(. .)))
依存语法的结果:
advmod(day-11, Undoubtedly-1)
det(13th-4, the-3)
nsubj(day-11, 13th-4)
prep_of(13th-4, March-6)
num(March-6, 2007-7)
cop(day-11, is-8)
det(day-11, a-9)
amod(day-11, saddest-10)
det(linguistics-15, the-13)
nn(linguistics-15, world-14)
prep_to(day-11, linguistics-15)
nn(Linguistics-18, Corpus-17)
appos(linguistics-15, Linguistics-18)
prep_in(Linguistics-18, particular-20)


3. The gap left by the departure of this innovative thinker and distinguished linguist will be felt in the hearts of the researchers working along the lines he has set.
概率短语结构语法的结果:

(ROOT
(S
(NP
(NP
(NP (DT The) (NN gap))
(VP (VBN left)
(PP (IN by)
(NP
(NP (DT the) (NN departure))
(PP (IN of)
(NP (DT this) (JJ innovative) (NN thinker)))))))
(CC and)
(NP (VBN distinguished) (NN linguist)))
(VP (MD will)
(VP (VB be)
(VP (VBN felt)
(PP (IN in)
(NP
(NP (DT the) (NNS hearts))
(PP (IN of)
(NP (DT the) (NNS researchers)))))
(S
(VP (VBG working)
(PRT (RP along))
(NP
(NP (DT the) (NNS lines))
(SBAR
(S
(NP (PRP he))
(VP (VBZ has)
(VP (VBN set)))))))))))
(. .)))
依存语法的结果:
det(gap-2, The-1)
nsubjpass(felt-16, gap-2)
partmod(gap-2, left-3)
det(departure-6, the-5)
prep_by(left-3, departure-6)
det(thinker-10, this-8)
amod(thinker-10, innovative-9)
prep_of(departure-6, thinker-10)
amod(linguist-13, distinguished-12)
conj_and(gap-2, linguist-13)
aux(felt-16, will-14)
auxpass(felt-16, be-15)
det(hearts-19, the-18)
prep_in(felt-16, hearts-19)
det(researchers-22, the-21)
prep_of(hearts-19, researchers-22)
partmod(felt-16, working-23)
prt(working-23, along-24)
det(lines-26, the-25)
dobj(working-23, lines-26)
nsubj(set-29, he-27)
aux(set-29, has-28)
rcmod(lines-26, set-29)


4. In deepest sorrow, we, linguists at Shanghai Jiao Tong University, China, found that we cannot express with words our gratitude and respect to John.
概率短语结构语法的结果:

(ROOT
(S
(PP (IN In)
(NP (JJS deepest) (NN sorrow)))
(, ,)
(NP
(NP (PRP we))
(, ,)
(NP
(NP (NNS linguists))
(PP (IN at)
(NP
(NP (NNP Shanghai) (NNP Jiao) (NNP Tong) (NNP University))
(, ,)
(NP (NNP China)))))
(, ,))
(VP (VBD found)
(SBAR (IN that)
(S
(NP (PRP we))
(VP (MD can) (RB not)
(VP (VB express)
(PP (IN with)
(NP (NNS words)))
(NP
(NP (PRP$ our) (NN gratitude)
(CC and)
(NN respect))
(PP (TO to)
(NP (NNP John)))))))))
(. .)))
依存语法的结果:
amod(sorrow-3, deepest-2)
prep_in(found-16, sorrow-3)
nsubj(found-16, we-5)
appos(we-5, linguists-7)
nn(University-12, Shanghai-9)
nn(University-12, Jiao-10)
nn(University-12, Tong-11)
prep_at(linguists-7, University-12)
appos(University-12, China-14)
complm(express-21, that-17)
nsubj(express-21, we-18)
aux(express-21, can-19)
neg(express-21, not-20)
ccomp(found-16, express-21)
prep_with(express-21, words-23)
poss(gratitude-25, our-24)
dobj(express-21, gratitude-25)
conj_and(gratitude-25, respect-27)
prep_to(gratitude-25, John-29)

 

Berkeley Parser主要是一个概率上下文无关语法的分析器。就不详述了。

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值