After reading the paper, i get a lot of novel knowledge about the topk algrithm. there are a lot of work to study the top-k query processing.
1. why?
Top-K quriies are broadly used in many application areas ranging from queries combining text, metadata, and multinedia: queries combining network statistics; and many more. Consequentially the top-K problem has been widely studied.
Top-K 算法在很多领域都有很广泛的使用,这些领域包括:查询合并文本,元数据,和多媒体;
so the research is very significant.
2. motivation
The most famous algorithm proposed for efficient computing of top-k results if TA(Threshold)algorithm proposed by Fagin et al.
最著名的算法,关于有效的计算top-k结果的是TA(基于阈值的算法)。
Finding the true top-K result can sometimes be quite resource-intensive and time-consuming. the main factor in measuring top-k performance is the cost accessing the list from the different sourced.
找到确切的top-k结果有时是相当资源密集的并且时间耗费的。评价top-k算法的性能的主要因子是访问不同资源列表的花费.
It is usually assumed that the objects on each source are sorted according to some local score and it is possible to access the sources either sequentially(sorted access) or by random access.
To minimize such costs, approximate results may be returned instead of exact top-k qurey.
为了降低耗费,使用近似解来代替精确解。
Existing work on top-k processing has focused on achieving exact results or at least approximate results with a certain quality with minimal costs. In contrast, this paper focuses on the dual problem:
现存的关于top-k处理的工作重点放在,使用最小的花费在一定质量上来获取精确的或者至少近似的结果。
与之不同的是,我们这篇文章,重点在与下面的双重问题。
instead of trying to minimize the processing costs to reach a target result quality, we provide a best-effort top-k query processing given a fixed limit for the execution cost.
我们提出了一个,在给定运行花费固定的限制的,尽最大努力的top-k查询处理。而不是尽可能的降低处理花费去达到一个目标结果的质量。
Such a budget can be easily defined in terms of time, number of disk accesses, or number of network messages, reflecting an essential limitation faced by contemporary applications.
这样的预算能很容易的定义,使用时间,硬盘存取的数量,或者网络信息的数量,反映了面向当前应用的一个必要的限制。