StanfordParser句法分析断句错误

近期在使用StanfordParser(SD)进行句法分析时,遇到了一些比较头疼的问题,如在处理一下这句话时,就会出现一些问题:

Analysis of the Anticancer Phytochemicals in Andrographis paniculata Nees. under Salinity Stress.

细心的同学会发现,这句话中有两个结束符‘.’,如果直接用StanfordParser处理的话,在cmd的python命令行里输入:

java -mx150m -cp "*;" edu.stanford.nlp.parser.lexparser.LexicalizedParser -outputFormat "penn,typedDependencies" edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz file.txt>result.txt

查看tesult.txt,会看到如下结果:

(ROOT
(NP
(NP (NNP Analysis))
(PP (IN of)
(NP (DT the) (NNP Anticancer) (NNPS Phytochemicals)))
(PP (IN in)
(NP (NNP Andrographis) (NNP paniculata) (NNP Nees)))
(. .)))

root(ROOT-0, Analysis-1)
det(Phytochemicals-5, the-3)
nn(Phytochemicals-5, Anticancer-4)
prep_of(Analysis-1, Phytochemicals-5)
nn(Nees-9, Andrographis-7)
nn(Nees-9, paniculata-8)
prep_in(Analysis-1, Nees-9)

(ROOT
(PP (IN under)
(NP (NNP Salinity) (NNP Stress))))

root(ROOT-0, under-1)
nn(Stress-3, Salinity-2)
pobj(under-1, Stress-3)

跟想象的不一样对不对,很明显,StandfordParser将这个句子视作两个句子了。

查看SD的FAQ文档,可以看到如下解释:

How do I force the parser to use my sentence delimitations? I want to give the parser a list of sentences, one per line, to parse.

Use the -sentences option. If you want to give the parser one sentence per line, include the option -sentences newline in your invocation of LexicalizedParser.

大致意思就是,通过加入-sentences newline可以让SD不执行分句程序,按照你的分句结果进行语法分析。

按照上述方法,在cmd的python命令行里输入:

java -mx150m -cp "*;" edu.stanford.nlp.parser.lexparser.LexicalizedParser -outputFormat "penn,typedDependencies" edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz file.txt>result.txt

查看tesult.txt:

(ROOT
(NP
(NP (NNP Analysis))
(PP (IN of)
(NP (DT the) (NNP Anticancer) (NNPS Phytochemicals)))
(PP (IN in)
(NP
(NP (NNP Andrographis) (NNP paniculata) (NNP Nees) (. .))
(PP (IN under)
(NP (NNP Salinity) (NNP Stress)))))))

root(ROOT-0, Analysis-1)
det(Phytochemicals-5, the-3)
nn(Phytochemicals-5, Anticancer-4)
prep_of(Analysis-1, Phytochemicals-5)
nn(Nees-9, Andrographis-7)
nn(Nees-9, paniculata-8)
prep_in(Analysis-1, Nees-9)
nn(Stress-13, Salinity-12)
prep_under(Nees-9, Stress-13)

可以看到句子被当成一个句子处理了。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值