Introduction
After looking at a lot of Java/JVM based NLP libraries listed on 一种wesome AI/ML/DL I decided to pick the 一种pache OpenNLP library. One of the reasons comes from the fact another developer (who had a look at it previously) recommended it. Besides, it’s an 一种pache project, they have been great supporters of F/OSS Java projects for the last two decades or so (see Wikipedia). It also goes without saying that 一种pache OpenNLP is backed by the 一种pache 2.0 license.
另外,来自NLP研究人员的这则推特也增加了对此事的信心:
Linda (Xia) Liu@drliubigdata
In my current research project, I experimented with @ApacheOpennlp and was delighed to see it's a valuable NLP toolkit with a user-friendly API. Too early to draw a comparison between @ApacheOpennlp and @stanfordnlp. I will see how they perform in named entity recognition first.17:03 PM - 02 Mar 2019
I’ll like to say my personal experience has been similar with Apache OpenNLP so far and I echo the simplicity and user-friendly API and design. You will see as we explore it further, that being the case.
Exploring NLP using Apache OpenNLP
Java bindings
We won’t be covering the Java API to Apache OpenNLP tool in this post but you can find a number of examples in their docs. A bit later you will also need some of the resources enlisted in the Resources section at the bottom of this post in order to progress further.
Command-line Interface
我被可用的CLI的简单性吸引了,它可以在需要模型和提供模型的情况下直接使用。 它无需额外配置即可工作。
To make it easier to use and also not have to remember all the CLI parameters it supports I have put together some shell scripts. Have a look at the README to get more insight into what they are and how to use them.
Getting started
从现在开始,您将需要以下内容:
- Git client 2.x or higher (an account on GitHub to fork the repo)
- Java 8 or higher (suggest install GraalVM CE 19.x or higher)
- Docker CE 19.x or higher and check it is running before going further
- Ability to run shell scripts from the CLI
- Understand reading/writing shell scripts (optional)
ñote: At the time of the writing version 1.9.1 of Apache OpenNLP was available.
We have put together scripts to make these steps easy for everyone:
$ git clone git@github.com:valohai/nlp-java-jvm-example.git
or
$ git clone https://github.com/valohai/nlp-java-jvm-example.git
$ cd nlp-java-jvm-example
这将使我们进入包含以下文件的文件夹:
LICENSE.txt
README.md
docker-runner.sh <=== only this one concerns us a