斯坦福ner python_斯坦福大学Corenlp和Java入门(Python程序员)

斯坦福ner python

Hello there! I’m back and I want this to be the first of a series of post on Stanford’s CoreNLP library. In this article I will focus on the installation of the library and an introduction to its basic features for Java newbies like myself. I will firstly go through the installation steps and a couple of tests from the command line. I will later walk you through a two very simple Java scripts that you will be able to easily incorporate into your Python NLP pipeline. You can find the complete code on github!

你好! 我回来了,我希望这是Stanford的CoreNLP库上系列文章的第一 。 在本文中,我将重点介绍该库的安装及其对像我这样的Java新手的基本功能的介绍。 我将首先通过命令行完成安装步骤和一些测试。 稍后,我将向您介绍两个非常简单的Java脚本,您可以将它们轻松地合并到Python NLP管道中。 您可以在github上找到完整的代码!

CoreNLP is a toolkit with which you can generate a quite complete NLP pipeline with only a few lines of code. The library includes pre-built methods for all the main NLP procedures, such as Part of Speech (POS) tagging, Named Entity Recognition (NER), Dependency Parsing or Sentiment Analysis. It also supports other languages apart from English, more specifically Arabic, Chinese, German, French, and Spanish.

CoreNLP是一个工具包,您可以使用它仅用几行代码即可生成相当完整的NLP管道。 该库包括用于所有主要NLP程序的预构建方法,例如词性(POS)标记,命名实体识别(NER),依赖性分析或情感分析。 它还支持英语以外的其他语言,尤其是阿拉伯语,中文,德语,法语和西班牙语。

I am a big fan of the library, mainly because of HOW COOL its Sentiment Analysis model is ❤ (I will talk more about it in the next post). However, I can see why most people would rather use other libraries like NLTK or SpaCy, as CoreNLP can be a bit of an overkill. The reality is that coreNLP can be much more computationally expensive than other libraries, and for shallow NLP processes the results are not even significantly better. Plus it’s written in Java, and getting started with it is a bit of a pain for Python users (however it is doable, as you will see below, and it also has a Python API if you can’t be bothered).

我是该图书馆的忠实拥护者,主要是因为它的情感分析模型是❤(我将在下一篇文章中详细介绍)。 但是,我明白了为什么大多数人宁愿使用NLTK或SpaCy之类的其他库,因为CoreNLP可能有点过头了。 现实情况是,coreNLP的计算量可能比其他库昂贵得多,对于较浅的NLP进程,结果甚至没有明显改善。 加上它是用Java编写的,并且开始使用它对Python用户来说有点痛苦(但是它是可行的,如您将在下面看到的,并且它还有一个Python API,如果您不打扰的话)。

  • CoreNLP Pipeline and Basic Annotators

    CoreNLP管道和基本注释器

The basic building block of coreNLP is the coreNLP pipeline. The pipeline takes an input text, processes it and outputs the results of this processing in the form of a coreDocument object. A coreNLP pipeline can be customised and adapted to the needs of your NLP project. The properties objects allow to do this customization by adding, removing or editing annotators.

coreNLP的基本构建模块是coreNLP 管道 。 管道接收输入文本,对其进行处理,并以coreDocument对象的形式输出该处理的结果。 可以定制coreNLP管道并使其适应NLP项目的需求。 属性对象允许通过添加,删除或编辑注释器来进行此自定义。

That was a lot of jargon, so let’s break it down with an example. All the information and figures were extracted from the official coreNLP page.

那是很多行话,所以让我们用一个例子来分解它。 所有信息和数据均摘自coreNLP官方页面

Image for post
coreNLP site coreNLP网站提取的图

In the figure above we have a basic coreNLP Pipeline, the one that is ran by default when you first run the coreNLP Pipeline class without changing anything. At the very left we have the input text entering the pipeline, this will usually be a plain .txt file. The pipeline itself is composed by 6 annotators. Each of these annotators will process the input text sequentially, the intermediate outputs of the processing sometimes being used as inputs by some other annotator. If we wanted to change this pipeline by adding or removing annotators, we would use the properties object. The final output is a set of annotations in the form of a coreDocument object.

在上图中,我们有一个基本的coreNLP Pipeline,它是在您首次运行coreNLP Pipeline类而不更改任何内容时默认运行的。 在最左侧,有输入文本进入管道,这通常是一个纯文本 .txt文件。 管道本身由6个注释器组成。 这些注

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值