【阅读笔记】Entity Linking 相关

最新推荐文章于 2022-04-09 17:09:57 发布

SrdLaplaceGua

最新推荐文章于 2022-04-09 17:09:57 发布

阅读量4.8k

点赞数 1

分类专栏：实用技巧机器学习读书笔记文章标签：实体链接知识图谱 yahoo 实体识别 git

本文链接：https://blog.csdn.net/SrdLaplace/article/details/84563480

版权

这篇博客介绍了Yahoo的两篇关于Entity Linking的工作，包括Fast Entity Linking和FEL系统。Fast Entity Linking利用概率模型和用户生成信息链接查询到知识库中的实体，强调快速和空间效率。FEL则是一个轻量级且高效的实体提取和链接系统，使用实体嵌入和搜索点击日志数据进行候选实体检索，平均响应时间小于2毫秒。

摘要由CSDN通过智能技术生成

最近搞一些 Entity Linking 相关的事情看了看下 yahoo 的这两篇工作和开源的(FEL)[https://github.com/yahoo/FEL]

Fast and Space-Efficient Entity Linking in Queries

ABSTRACT

Entity Linking一般需要在在下游的检索之前完成，typically within milliseconds.
In this paper we propose a probabilistic model that leverages user-generated information on the web to link queries to entities in a knowledge base.
算法的快速和高效主要得益于下面三点：

忽略不同实体之间的依赖关系
使用 hashing and compression techniques 减少内存占用
为了使算法具有上下文知识而不牺牲速度，我们将查询词和实体的分布语义之间的距离考虑到模型中。

INTRODUCTION

Linking free text to entities typically comprises three steps:

识别 candidate mentions, i.e., which part(s) of the text to link
识别 candidate entities for each mention
基于一些背景和一致性的概念来消除候选实体的歧义

MODELING ENTITY LINKING

我们的模型通过锚文本或者用户的点击行为建立了从 entities 和 entities 的别名的连接（本文只考虑维基百科中的锚文本以及维基百科结果上的网络搜索结果中的点击）。
解决的问题：

自动分割查询
为每个分割选择正确的实体

我们的快速实体链接器（Fast Entity Linker, FEL）通过计算每个分段-实体对的概率分数，然后优化整个查询的分数来解决此问题。不采用任何监督，让模型和数据以无参数方式运行；也可以添加利用标注的训练数据的附加层，增强模型的性能。

Fast Entity Linker

定义符号：

$S\times E$ ：an event space where $S$ is the set of all sequences and $E$ the set of all entities known to the system.
$s$ ：一系列的term
$\hat{s}$ ：一系列的 $s$ ，分词结果
$\hat{e}$ ：一系列的 $e$ ，实体集合
$a_s$ ：indicates if $s$ is an alias
$a_{s,e}$ ：$indicates if $s$ is an alias pointing (linking/clicked) to $e$ .
$c$ : indicates which collection acts as a source of information query log or Wikipedia ( $c_q$ or $c_w$ )
$n (s, c)$ is the count of $s$ in $c$
$n (e, c)$ is the count of $e$ in $c$