#保姆级# StanfordCoreNLP -- 句法分析 + 可视化

句法分析(Constituency Parsing)是NLP领域非常重要、也是相对基础的解析任务。

句法分析的任务,就是解析出句子中的短语结构、短语之间的层次句法关系。

通过Stanford提供的CoreNLP,很容易实现句法分析。比如下面这句话:

which province is jack and john electorated in?

 首先,我们可以通过斯坦福提供的在线网站 http://corenlp.run/ 对这句话进行解析,如下:

点击Submit后,可以constituency parse的结果:

 那么,如何通过代码来实现呢?StanfordCoreNLP获取句法解析树的Java代码如下:

class NLPUtils{

    private static Properties properties = new Properties();
    private static StanfordCoreNLP pipeline;

    static{
        init();
    }

    /**
     * 成分分析、语法分析
     * @param singleNL
     */
    public static void constituencyAnalysis(String singleNL){
        if (StringUtils.isBlank(singleNL)){
            return;
        }
        CoreDocument document = getCoreDocument(singleNL);
        CoreSentence sentence = document.sentences().get(0);
        Tree constituencyParse = sentence.constituencyParse();
        System.out.println(constituencyParse);
    }


    /**
     * 获取 NL 的 CoreDocument 对象
     *
     * @param NL
     * @return
     */
    public static CoreDocument getCoreDocument(String NL) {
        return pipeline.processToCoreDocument(NL);
    }

    /*
     * 初始化
     * */
    private static void init() {
        properties.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,depparse,parse");
        properties.setProperty("ner.additional.regexner.ignorecase", "true");
        properties.setProperty("ner.applyNumericClassifiers", "true");
        properties.setProperty("ner.buildEntityMentions", "true");
        properties.setProperty("ner.useNERSpecificTokenization", "true");
        pipeline = new StanfordCoreNLP(properties);
    }

    public static void main(String[] args) {
        String singleNL = "which province is jack and john electorated in ?";
        constituencyAnalysis(NL);
    }
}

 上面的代码执行结果如下:

(ROOT (SBARQ (WHNP (WDT which) (NN province)) (SQ (VBZ is) (NP (NP (NNP jack) (CC and) (NNP john)) (PP (NP (NNP electorated)) (IN in)))) (. ?)))

有兴趣的朋友可以通过代码将上面的结果转化为多叉树的结构。

【提示】

通过CoreSentence.constituencyParse()获取到解析树后,再去获取到它的iterator:

    Iterator<Tree> iterator = constituencyParseTree.stream().iterator();
        while (iterator.hasNext()){
            Tree next = iterator.next();
            System.out.println(next);
            System.out.println(next.numChildren());
        }
    }

输入结果如下,将输入结果同文章开头的图进行比较,便可以得到一些思路:

(ROOT (SBARQ (WHNP (WDT which) (NN province)) (SQ (VBZ is) (NP (NP (NNP jack) (CC and) (NNP john)) (PP (NP (NNP electorated)) (IN in)))) (. ?)))
1
(SBARQ (WHNP (WDT which) (NN province)) (SQ (VBZ is) (NP (NP (NNP jack) (CC and) (NNP john)) (PP (NP (NNP electorated)) (IN in)))) (. ?))
3
(WHNP (WDT which) (NN province))
2
(WDT which)
1
which
0
(NN province)
1
province
0
(SQ (VBZ is) (NP (NP (NNP jack) (CC and) (NNP john)) (PP (NP (NNP electorated)) (IN in))))
2
(VBZ is)
1
is
0
(NP (NP (NNP jack) (CC and) (NNP john)) (PP (NP (NNP electorated)) (IN in)))
2
(NP (NNP jack) (CC and) (NNP john))
3
(NNP jack)
1
jack
0
(CC and)
1
and
0
(NNP john)
1
john
0
(PP (NP (NNP electorated)) (IN in))
2
(NP (NNP electorated))
1
(NNP electorated)
1
electorated
0
(IN in)
1
in
0
(. ?)
1
?
0

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

进击的Coder*

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值