

1、Chapter 3 Web Search 说 明:词汇后的[1][2][4][5]等为相应词语释义词汇后的[z1][z2][z3][z4]等为相应词汇背景知识注释带下划线的句子后面的[i][ii][iii][iv]等为对句子的译文(翻译)Chapter 3 Web Search Background Knowledge 本章简单介绍网络信息检索的相关背景知识,包括其发展历史、Web的特点、广告模式、用户经验、搜索引擎智能化技术(如网络信息采集与抽取、相似内容的判定与聚类、动态采集、相关概念反馈、容错检索)等知识。内容系摘引自如下文献:Manning Christopher D., Prabhaker Raghavan, Hinirich Schutze. Introduction to Information Retrieval, Cambridge Press, 2008, pp: 42。

2、1-442. Gao Kai, Web Information Processing and Extracting, Proceedings of the Complex Systems Optimization and Application in IEEE International Conference on Machine Learning and Cybernetics, China, 2010.Gao Kai. Effective Page Refresh Policy, International Journal on Computer Applications in Engineering Education, Vol. 15, Issue 3, 2007, pp: 240-247Gao Kai. Presenting Implicit Relevance Feedback。

3、 in Educational Search Engine, International Journal on Computer Applications in Engineering Education (DOI: 10.1002/cae.20311), 2010.Gao Kai, et al.. Tolerant Retrieval and Query Processing in Search Engine, Proceedings of 2008 IEEE International Conference on CSSE, 2008.GAO Kai. The Strategy on Replicate and Similar Web Collections’ Detecting and Clustering, International Journal on Computer App。

4、lications in Engineering Education (DOI: 10.1002/cae.20388; 2010) 搜索引擎一般是指从因特网等信息源中以一定的策略采集所需信息,经加工处理(如对网页进行去重处理、抽取信息、索引、生成主题词标引、生成自动文摘、信息分类、对相似网页进行聚类等),将用户感兴趣的内容按照一定的规则排序后,以超链等形式提供给用户的系统。相应地,搜索引擎主要包括信息采集、信息加工、信息检索与检索结果提供这几个部分,其中信息采集模块以一定的策略在因特网等信息源中采集相关信息;信息加工主要指对网页资源进行标引、建立索引、编制摘要、完成信息分类等;信息检索模块则根据用户的检索提问对检索项与索引项进行匹配运算以获取对应的检索结果集,有些系统为方便用户使用还提供高级检索功能、支持自然语言提问等;检索结果提供则是在进行必要的相关分析后以超链等形式给出检索结果。Cha。

5、pter 3 Web Search Text As recently as the 1990s, studies showed that most people preferred getting information from other people rather than from information retrieval systems. Of course, in that time period, most people also used human travel agents to book their travel. However, during the last decade, relentless[1] optimization of information retrieval effectiveness has driven web search engine。

6、s to new quality levels where most people are satisfied most of the time, and web search has become a standard and often preferred[2] source of information finding. For example, the 2004 Pew Internet Survey [Fallows 2004] found that “92% of Internet users say the Internet is a good place to go for getting everyday information.” To the surprise of many, the field of information retrieval has moved 。

7、from being a primarily academic discipline to being the basis underlying most people’s preferred means of information access. This chapter presents the underpinnings of Web search.[1] relentless: adj. 无情的[2] preferred: adj. 首选的Chapter 3 Web Search Session1. Background and History The Web is unprecedented[1] in many ways: unprecedented in scale, unprecedented in the almost-complete lack of coordina。

8、tion in its creation, and unprecedented in the diversity[2] of backgrounds and motives of its participants[3]. Each of these contributes to making web search d。省略部分。the term “soap” when he (or she) types the query “soup” by accident. Generally, users want the system can tolerate some misspellings and present relevant advice, so tolerant retrieval and query processing are necessary. As for the rela。

9、ted work, reference [Manning, 2008] presented two approaches to solve misspelling: the first was based on edit distance and the second was on k-gram overlap. Based on the web contents mining, the strategy on how to extract key terms is discussed in reference [Gao, 2005]. Probabilistic models (i.e., noisy channel models) for spelling correction were pioneered by [Kernighan, 1990], and further devel。

10、oped by [Brill, 2000]. In these models, the misspelled query was viewed as a probabilistic corruption of a correct query. They had a similar mathematical basis to the language model methods and also provided ways of incorporating phonetic similarity and data from the actual spelling mistakes of users. Reference [Cucerzan, 2004] showed how this work could be extended to learn spelling correction mo。

11、dels based on query reformulation in search engine logs. In reference [Gao, 2004], authors integrated the linkage of a query as a hidden variable, which expressed the term dependencies within the query as an acyclic[1], undirected graph. They also presented a method for model parameter estimation and an approach to learn the linkage of a sentence in an unsupervised manner. In reference [Sadakane, 。

12、2001], authors proposed two algorithms—one was based on plane-sweep algorithm and the other was on divide-and-conquer approach—for finding documents in which all given keywords appeared in neighboring places, yet they were not suitable for some situations.[1] acyclic: adj. 非循环的Chapter 3 Web Search Thinking and Exercising1. What is a URL? Please present a URL example and illustrate each part of it.。

13、2. Early web search fells into two broad categories. Please illustrate the full-text index search engine and the drawbacks of the taxonomy approach.3. Please translate the following passages from English into Chinese:(1) Passage 1:The first generation of web search engines transported classical search techniques to the web domain, focusing on the challenge of scale. The earliest web search engines。

14、 had to contend with indexes containing tens of millions of documents, which was a few orders of magnitude larger than any prior information retrieval system in the public domain. Indexing, query serving and ranking at this scale required tens of machines to create highly available systems.(2) Passage 2:The boom in using Internet triggers the needs for the efficient tools to use and retrieve information from the Web expediently and efficiently. Although search engine can help users, its intelligence needs to be improved, as retrieving relevant information from the Web expediently and efficiently is not easy. Therefore, it is necessary to research and implement on intellectualized techniques in search engine.The End of Chapter 3Any question?。

  • 0
  • 0
    觉得还不错? 一键收藏
  • 0




当前余额3.43前往充值 >
领取后你会自动成为博主和红包主的粉丝 规则
钱包余额 0


