java调用ltp分词_自然语言处理工具LTP语言云调用方法

最新推荐文章于 2023-04-07 00:13:04 发布

自动抬杠机

最新推荐文章于 2023-04-07 00:13:04 发布

阅读量286

点赞数

文章标签： java调用ltp分词

本文链接：https://blog.csdn.net/weixin_30674575/article/details/114545648

版权

本文介绍了如何使用Java调用LTP语言云平台进行中文文本的分词处理。首先，需要在指定网站注册获取API Key。然后，展示了一个Python调用示例，通过修改代码中的API Key和处理文本即可进行分词。对于批量处理，可以参考提供的Python代码示例，遍历文件并调用API写入处理结果。

摘要由CSDN通过智能技术生成

前言

LTP语言云平台

不支持离线调用；

支持分词、词性标注、命名实体识别、依存句法分析、语义角色标注；

不支持自定义词表，但是你可以先用其他支持自定义分词的工具(例如

支持C#、Go、Java、JavaScript、Nodejs、PHP、Python、R、Ruby等语言调用；

还有一些错误响应、频率限制、重要说明(这几个我至今也没用到)；

正文

Step1：注册

在这个网址申请一个API key，稍后会用到；

Step2：一个简单的例子(Python版)

(1)复制代码：从Github上复制一段代码(取决于你使用的语言和所需的功能)

(2)修改代码：

<1>把 api_key = "YourApiKey" 中的 "YourApiKey" 修改成你Step1申请的API Key；

<2>把 text = "我爱北京天安门" 修改成你要处理的文本；

<3>根据需求设置不同的参数(其实只需要api_key，text，pattern，format四个参数就够了，仔细看下pattern)：

#-*- coding: utf-8 -*-#!/usr/bin/env python

#This example shows how to use Python to access the LTP API to perform full#stack Chinese text analysis including word segmentation, POS tagging, dep-#endency parsing, name entity recognization and semantic role labeling and#get the result in specified format.

importurllib2, urllibimportsysif __name__ == '__main__':if len(sys.argv) < 2 or sys.argv[1] not in ["xml", "json", "conll"]:print >> sys.stderr, "usage: %s [xml/json/conll]" %sys.argv[0]

sys.exit(1)

uri_base= "http://ltpapi.voicecloud.cn/analysis/?"api_key= "YourApiKey"text= "我爱北京天安门"

#Note that if your text contain special characters such as linefeed or '&',

#you need to use urlencode to encode your data

text =urllib.quote(text)

format= sys.argv[1]

pattern= "all"url=(uri_base+ "api_key=" + api_key + "&"

+ "text=" + text + "&"

+ "format=" + format + "&"

+ "pattern=" + "all")try:

response=urllib2.urlopen(url)

content=response.read().strip()printcontentexcepturllib2.HTTPError, e:print >> sys.stderr, e.reason

Step3：运行

如果要批量处理txt或者xml文件，需要自己写一段批量处理的代码，下边是我之前项目中用到的一段批量处理某一目录下txt文件代码(就是加了一层循环和设置了一个输出)：

1 #-*- coding: utf-8 -*-

2 #!/usr/bin/env python

4 #This example shows how to use Python to access the LTP API to perform full

5 #stack Chinese text analysis including word segmentation, POS tagging, dep-

6 #endency parsing, name entity recognization and semantic role labeling and

7 #get the result in specified format.

9 importurllib2, urllib10 importsys11

12 if __name__ == '__main__':13 uri_base = "http://ltpapi.voicecloud.cn/analysis/?"

14 api_key = "7132G4z1HE3S********DSxtNcmA1jScSE5XumAI"

16 f = open("E:\\PyProj\\Others\\rite_sentence.txt")17 fw = open("E:\\PyProj\\Others\\rite_pos.txt",'w')18

19 line =f.readline()20 while(line):21 text =line22 #Note that if your text contain special characters such as linefeed or '&',

23 #you need to use urlencode to encode your data

24 text =urllib.quote(text)25 format = "plain"

26 pattern = "pos"

28 url =(uri_base29 + "api_key=" + api_key + "&"

30 + "text=" + text + "&"

31 + "format=" + format + "&"

32 + "pattern=" +pattern)33

34 try:35 response =urllib2.urlopen(url)36 content =response.read().strip()37 printcontent38 fw.write(line+content+'\n')39 excepturllib2.HTTPError, e:40 print >>sys.stderr, e.reason41 line =f.readline()42 fw.close()43 f.close()

本文转自ZH奶酪博客园博客，原文链接：http://www.cnblogs.com/CheeseZH/p/4585176.html，如需转载请自行联系原作者