Stanford NER Service使用

最新推荐文章于 2019-06-19 21:26:41 发布

lxg0807

最新推荐文章于 2019-06-19 21:26:41 发布

阅读量1.8k

点赞数

分类专栏： NLP java 文章标签： java

本文链接：https://blog.csdn.net/lxg0807/article/details/51657851

版权

NLP 同时被 2 个专栏收录

12 篇文章 0 订阅

订阅专栏

java

4 篇文章 0 订阅

订阅专栏

在使用stanford ner工具的时候，每次执行程序时，都需要进行load model，这是非常耗时间的。因此希望能实现一次加载模型，多次使用。
这是可以的，官方也提供这样的API。

英文使用

//启动Server端
java -mx1000m -cp stanford-ner.jar edu.stanford.nlp.ie.NERServer  -loadClassifier classifiers/english.conll.4class.distsim.crf.ser.gz  -port 2314 -outputFormat inlineXML 
//目前英文4类识别效果是最好，-port指定端口，也可以使用nohup命令后台执行
//启动Client端
java -cp stanford-ner.jar edu.stanford.nlp.ie.NERServer -port 2314 -client

这里写图片描述

//API使用
import edu.stanford.nlp.ie.NERServer.NERClient //内部类
NERClient.communicateWithNERServer(String host, int port, String charset,BufferedReader input, BufferedWriter output, boolean closeOnblank)
//host: ip地址 port：端口 
//charset：制定字符编码 closeOnblank:输入空行后结束

//写一下处理字符串的过程
StringReader sr = new StringReader(String text); 
BufferedReader br = new BufferedReader(sr);
StringWriter sw = new StringWriter();
BufferedWriter bw = new BufferedWriter(sw);
NERClient.commucicateWithNERServer("localhost",2314,"UTF-8",br,bw,true);
bw.close();
br.close();
String result = sw.toString();

其他语言的使用

类似使用其他的语言，会出现错误。初步认为是model配置的原因。
因此需要自己来写实现的。以中文举例：

中文使用

//抄袭官方的写法

package com.li.cnServer;

import java.io.IOException;
import java.util.Properties;

import edu.stanford.nlp.ie.AbstractSequenceClassifier;
import edu.stanford.nlp.ie.NERServer;
import edu.stanford.nlp.ie.crf.CRFClassifier;

/**
 * Hello world!
 *
 */
public class App 
{
    public static void main( String[] args ) throws ClassCastException, ClassNotFoundException, IOException
    {
        Properties props = new Properties(); 
        props.setProperty("loadClassifier", "edu/stanford/nlp/models/ner/chinese.misc.distsim.crf.ser.gz");
        props.setProperty("port", "2310");

        String loadFile = props.getProperty("loadClassifier");
        String loadJarFile = props.getProperty("loadJarClassifier");

        String client = props.getProperty("client");
        String portStr = props.getProperty("port", "4465");
        props.remove("port"); // so later code doesn't complain
        if (portStr == null || portStr.equals("")) {
//          System.err.println(USAGE);
          return;
        }
        String charset = "utf-8";
        String encoding = props.getProperty("encoding");
        if (encoding != null && ! "".equals(encoding)) {
          charset = encoding;
        }
        int port;
        try {
          port = Integer.parseInt(portStr);
        } catch (NumberFormatException e) {
          System.err.println("Non-numerical port");
//          System.err.println(USAGE);
          return;
        }
        // default output format for if no output format is specified
        if (props.getProperty("outputFormat") == null) {
          props.setProperty("outputFormat", "inlineXML");
        }

        if (client != null && ! client.equals("")) {
          // run a test client for illustration/testing
          String host = props.getProperty("host");
//          NERClient.communicateWithNERServer(host, port, charset);
        } else {
          AbstractSequenceClassifier asc;
          if (loadFile != null && ! loadFile.equals("")) {
            asc = CRFClassifier.getClassifier(loadFile, props);
          } else if (loadJarFile != null && ! loadJarFile.equals("")) {
            asc = CRFClassifier.getJarClassifier(loadJarFile, props);
          } else {
            asc = CRFClassifier.getDefaultClassifier(props);
          }

          new NERServer(port, asc, charset).run();
        }
//        System.out.println( "Hello World!" );
    }
}

使用mvn打成jar包

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>

  <groupId>com.li</groupId>
  <artifactId>cnServer</artifactId>
  <version>0.0.1-SNAPSHOT</version>
  <packaging>jar</packaging>

  <name>cnServer</name>
  <url>http://maven.apache.org</url>

  <build>
    <plugins>
       <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-jar-plugin</artifactId>
        <version>2.6</version>
        <configuration>
            <archive>
                <manifest>
                   //指定主类 根据自己的需要修改
                    <mainClass>com.li.cnServer.App</mainClass>
                    <addClasspath>true</addClasspath>
                    <classpathPrefix>lib/</classpathPrefix>
                </manifest>
            </archive>
        </configuration>
       </plugin> 
       <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-dependency-plugin</artifactId>
        <version>2.10</version>         
        <executions>
            <execution>
                <id>copy-dependencies</id>
                <phase>package</phase>
                <goals>
                    <goal>copy-dependencies</goal>
                </goals>
                <configuration>
                    <outputDirectory>${project.build.directory}/lib</outputDirectory>
                </configuration>                
            </execution>
        </executions> 
       </plugin>   
     </plugins>  
  </build>


  <properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
  </properties>


  <dependencies>
     <dependency>
        <groupId>edu.stanford.nlp</groupId>
        <artifactId>stanford-corenlp</artifactId>
        <version>3.6.0</version>
    </dependency>
    <dependency>
        <groupId>edu.stanford.nlp</groupId>
        <artifactId>stanford-corenlp</artifactId>
    <version>3.6.0</version>
    <classifier>models-chinese</classifier>
</dependency>
    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>3.8.1</version>
      <scope>test</scope>
    </dependency>
  </dependencies>

//启动服务端
java -jar cnServer-0.0.1-SNAPSHOT.jar

这样以后，客户端的使用就跟英文使用是一样的了。

注意：中文的输入文本是需要分词的，分词推荐ansj_seg

lxg0807

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Stanford NER Service使用

在使用stanford ner工具的时候，每次执行程序时，都需要进行load model，这是非常耗时间的。因此希望能实现一次加载模型，多次使用。这是可以的，官方也提供这样的API。英文使用//启动Server端java -mx1000m -cp stanford-ner.jar edu.stanford.nlp.ie.NERServer -loadClassifier classifier
复制链接

扫一扫

专栏目录