提示:文章写完后,目录可以自动生成,如何生成可参考右边的帮助文档
前言
- 通过实验理解倒排索引的构建过程;
- 掌握采用倒排索引做一个简单的搜索引擎,实现对关键字的检索。
提示:以下是本篇文章正文内容,下面案例可供参考
一、介绍
- 实验1:采用倒排索引做一个用数组模拟文档的简单的搜索引擎;
- 实验2:改写实验1,实现对文档的倒排索引。
二、实验 1
代码如下(示例):
docu_set={'d1':'i love shanghai',
'd2':'i am from shanghai now i study in tongji university',
'd3':'i am from lanzhou now i study in lanzhou university of science and technolgy',}
all_words=[]
for i in docu_set.values():
cut=i.split()
all_words.extend(cut)
set_all_words=set(all_words)
print(set_all_words)
#构建倒排索引
invert_index=dict()
for b in set_all_words:
temp=[]
for j in docu_set.keys():
field=docu_set[j]
split_field=field.split()
if b in split_field:
temp.append(j)
invert_index[b]=temp
print(invert_index)
print('全文搜索university:', invert_index['university'])
2.实验 2
代码如下(示例):
doc_set = {
'd1': 'i love shanghai',
'd2': 'i am from shanghai now i study in tongji university',
'd3': 'i am from lanzhou now i study in lanzhou university of science and technology'
}
# 构建倒排索引
invert_index = {}
for doc_id, doc_content in doc_set.items():
words = doc_content.split()
for word in words:
if word not in invert_index:
invert_index[word] = []
invert_index[word].append(doc_id)
# 搜索
query = 'university'
if query in invert_index:
result = invert_index[query]
print('包含查询词的文档:', result)
else:
print('没有包含查询词的文档。')
总结