实体链接Entity Linking开源工具：dexter2

最新推荐文章于 2024-08-06 04:35:03 发布

救世腹肌2298

最新推荐文章于 2024-08-06 04:35:03 发布

阅读量6.4k

点赞数 4

本文链接：https://blog.csdn.net/qq_37043191/article/details/81906433

版权

实体链接（Entity Linking）任务是确定文本中实体的身份。Dexter2是一个开源框架，利用维基百科进行实体链接。本文介绍了Dexter2的下载、使用方法，包括annotate、spot、get-id和get-desc等API的示例，以及如何用Python进行批量处理。

摘要由CSDN通过智能技术生成

实体链接(Entity Linking)
Dexter2
- 下载
- 使用

实体链接(Entity Linking)

在自然语言处理中，实体链接，命名实体链接（NEL），命名实体消歧（NED），命名实体识别和消歧（NERD）或命名实体规范化（NEN），都是是确定实体(Entity)的Identity的任务。例如，对于句子“巴黎是法国的首都”，Entity Linking的想法是确定句中“巴黎”指的是巴黎市，而不是巴黎希尔顿或任何其他可称为“巴黎”的实体。再例如，对于句子”James Bond is cool”，我们期望获得“James_Bond”这整个经过链接后的名字。

Dexter2

Dexter是一个Entity Linking的开源框架，其利用维基百科（英文）中的词条来实现实体链接。

下载

dexter on github
这里有编译好的二进制文件和source code，本文直接上编译好的bin file
windows的话在解压后的当前目录:

java -Xmx4000m -jar dexter-2.1.0.jar

或者在linux上

wget http://hpc.isti.cnr.it/~ceccarelli/dexter2.tar.gz
tar -xvzf dexter2.tar.gz
cd dexter2
java -Xmx4000m -jar dexter-2.1.0.jar

于是本地端口8080开启，如果是windows或者有可视化的linux上直接打开浏览器输入http://localhost:8080/dexter-webapp/dev/ 即可查看api。如果dexter是在服务器上的话那么直接用Python request利用url获取结果（见后文）。

使用

所有使用api可以参考本地或者官网。都有可执行的例子。本文举例说明。

1. annotate, spot

annotate
Performs the entity linking on a given text, annotating maximum n entities.
spot
It only performs the first step of the entity linking process, i.e., find all the mentions that could refer to an entity

两者都是对一句query中的词进行entity linking。区别是annotate会找出最相关的前n个linking。按需使用。

例如，查找

Bob Dylan and Johnny Cash had formed a mutual admiration society even before they met in the early 1960s

中的linked entity

当然，可以直接输入网址进行demo查看。linking的confidence设置为0.5：

http://localhost:8080/dexter-webapp/api/rest/annotate?text=Bob%20Dylan%20and%20Johnny%20Cash%20had%20formed%20a%20mutual%20admiration%20society%20even%20before%20they%20met%20in%20the%20early%201960s&n=50&wn=false&debug=false&format=text&min-conf=0.