实体搜索

最新推荐文章于 2023-12-21 15:43:29 发布

tomdyq625

最新推荐文章于 2023-12-21 15:43:29 发布

阅读量1.3k

点赞数

本文链接：https://blog.csdn.net/python_learn/article/details/53975076

版权

实体搜索

今年想写实体搜索方面的内容，与时间和地点相关的实体搜索，先搜集一下相关文献。我在semantic scholar搜索Entity Search, 查到以下论文：

Time-Aware Entity Search in DBpedia
Improving Context and Category Matching for Entity Search
Query modeling for entity search based on terms, categories, and examples

Time-Aware Entity Search in DBpedia

@inproceedings{zhang2015time, 
title={Time-Aware Entity Search in DBpedia}, 
author={Zhang, Lei and Chen, Wentao and Tran, Thanh and Rettinger, Achim}, 
booktitle={European Semantic Web Conference}, 
pages={175--179}, 
year={2015}, 
organization={Springer} }

Abstract: Searching for entities is a common user activity on the Web. There is an increasing effort in developing entity search techniques in the research community. Existing approaches are usually based on static measures that do not reflect the time-awareness, which is a factor that should be taken into account in entity search. In this paper, we propose a novel approach to time-aware entity search in DBpedia, which takes into account both popularity and temporality of entities. The experimental results show that our approach can significantly improve the performance of entity search with temporal focus compared with the baselines.

摘要：Web上搜索实体是常见的用户行为，迫切需要实体搜索技术。已有的方法通常基于静态测量，没有考虑时间感知特性。本文提出一种在DBpedia中的时间感知的实体搜索方法，考虑实体的流行性(popularity)和时间性(temporality)。实验结果表明我们方法显著改善时间感知实体搜索的特性。

1 引言

建模时间感知实体搜索(time-aware entity search)任务：给定实体集合 $E={e_1, e_2, ... ,e_N}$ ，输入是用户查询 $q=<s,t>$ ， $q$ 包含一个实体名 $s$ 和一个连续天数的时间范围 $t=\{d_1, d_2, ... , d_M\}$ ，其中 $d_i$ 表示具体某一天，输出是在时间段 $t$ 内匹配 $s$ 的实体。

实际生活中，用户往往不能显示指定时间范围。这种情况下，系统容易使用用户提交查询当天和之前的一段时间作为时间间隔（比如，一个星期）。假设用户在2014-02-21搜索实体名Irving，希望搜到的实体是Kyrie Irving，此人在2014-02-17获得NBA总明星MVP奖。时间间隔可以被系统指定为从2014-02-15到2014-02-21之间的一个星期。

2 方法

每个DBpedia实体对应一个Wikipedia文章。为了对查询 $q=<s,t>$ 实体排序，对每个实体 $e$ 基于不同分量(componenets)计算评分 $Score(e,s,t)$ 。

候选实体产生（Candidae Entity Generation）

给定一个查询实体名 $s$ ，产生匹配 $s$ 的候选实体集，记为 $E_s$ 。这需要提取每个实体表象形式(surface forms)，也就是指向相应实体的单词或短语。Wikipedia提供一些结构关联实体和表象形式。我们利用Wikipedia中的下列结构：

(1)文章题目。每个Wikipedia文章标题一般代表实体最常用名字。
(2)重导向页面。一个重导向页面表示一个实体存在别名。
(3)消歧页面。当多个实体有相同名字，创建这些Wikipedia中消歧页面。
(4)超链接锚文本。Wikipedia文章包含带锚文本的超链接，这些超链接指向指代实体。

Improving Context and Category Matching for Entity Search

@inproceedings{Chen2014ImprovingCA,
  title={Improving Context and Category Matching for Entity Search},
  author={Yueguo Chen and Lexi Gao and Shuming Shi and Xiaoyong Du and Ji-Rong Wen},
  booktitle={AAAI},
  year={2014}
}

Abstract: Entity search is to retrieve a ranked list of named entities of target types to a given query. In this paper, we propose an approach of entity search by formalizing both context matching and category matching. In addition, we propose a result re-ranking strategy that can be easily adapted to achieve a hybrid of two context matching strategies. Experiments on the INEX 2009 entity ranking task show that the proposed approach achieves a significant improvement of the entity search performance (xinfAP from 0.27 to 0.39) over the existing solutions.

摘要：实体搜索是给定一个查询提取目标类型命名实体的有序列表。本文通过构建上下文匹配和类别匹配进行实体搜索。提出再排序策略，可以容易适应两种上下文匹配策略的混合。在INEX2009实体排序任务的实验表明提出方法获得显著改善。

Query modeling for entity search based on terms, categories, and examples

@article{Balog2011QueryMF,
  title={Query modeling for entity search based on terms, categories, and examples},
  author={Krisztian Balog and Marc Bron and Maarten de Rijke},
  journal={ACM Trans. Inf. Syst.},
  year={2011},
  volume={29},
  pages={22}
}

Abstract: Users often search for entities instead of documents, and in this setting, are willing to provide extra input, in addition to a series of query terms, such as category information and example entities. We propose a general probabilistic framework for entity search to evaluate and provide insights in the many ways of using these types of input for query modeling. We focus on the use of category information and show the advantage of a category-based representation over a term-based representation, and also demonstrate the effectiveness of category-based expansion using example entities. Our best performing model shows very competitive performance on the INEX-XER entity ranking and list completion tasks.

加粗 Ctrl + B
斜体 Ctrl + I
引用 Ctrl + Q
插入链接 Ctrl + L
插入代码 Ctrl + K
插入图片 Ctrl + G
提升标题 Ctrl + H
有序列表 Ctrl + O
无序列表 Ctrl + U
横线 Ctrl + R
撤销 Ctrl + Z
重做 Ctrl + Y

Markdown及扩展

Markdown 是一种轻量级标记语言，它允许人们使用易读易写的纯文本格式编写文档，然后转换成格式丰富的HTML页面。 —— [ 维基百科 ]

使用简单的符号标识不同的标题，将某些文字标记为粗体或者斜体，创建一个链接等，详细语法参考帮助？。

本编辑器支持 Markdown Extra , 　扩展了很多好用的功能。具体请参考Github.

表格

Markdown　Extra　表格语法：

项目	价格
Computer	$1600
Phone	$12
Pipe	$1

可以使用冒号来定义对齐方式：

项目	价格	数量
Computer	1600 元	5
Phone	12 元	12
Pipe	1 元	234

定义列表

Markdown　Extra　定义列表语法：项目１项目２

定义 A

定义 B

项目３

定义 C

定义 D

定义D内容

代码块

代码块语法遵循标准markdown代码，例如：

@requires_authorization
def somefunc(param1='', param2=0):
    '''A docstring'''
    if param1 > param2: # interesting
        print 'Greater'
    return (param2 - param1 + 1) or None
class SomeClass:
    pass
>>> message = '''interpreter
... prompt'''