智能信息检索——邻近搜索中两个倒排记录表的搜索算法


《信息检索导论》部分实验python实现汇总请进入此博客查看。

1.实验目的

掌握搜索系统中的邻近搜索,并实现临近搜索中两个倒排记录表的搜索算法。

2.实验任务与要求

充分理解邻近搜索中两个倒排记录表的搜索算法,并通过python编程实现。当用户在提示后输入查询语句即可以实现临近搜索中两个倒排记录表的搜索算法。

3.实验说明书

⑴功能描述

系统读取预设文档返回所有可查询的词项,用户通过提示输入查询词项,系统分别计算所有词项所在的文档及其倒排记录表,然后执行临近搜索中两个倒排记录表的搜索算法,并将合并结果输出。

⑵概要设计

分为提示输入模块与临近搜索中两个倒排记录表的搜索算法模块两个功能模块。

⑶详细设计

  1. 总体流程图

图1 总体流程图

  1. 各功能模块流程图
  • 提示输入模块
    图2 提示输入模块

  • 临近搜索中两个倒排记录表的搜索算法模块

图3临近搜索中两个倒排记录表的搜索算法模块

⑷代码实现

  • 创建文档字典

createdict函数为功能函数,用来创建文档字典。createdict函数调用了python字符串处理的re库,处理预设的文档,返回所有词项用于提示用户可选词项,并计算所有词项的倒排记录表。

def createdict(f0):
    dl = list(set(re.split('[ \n?!,.;]', f0)))
    dl.pop(0)
    d = f0.split('\n')
    dict1 = {}
    dict0 = {}
    for word in dl:
        for i in range(len(d)): 
            d0 = re.split('[ \n?!,.;]', d[i])
            if word in d0:
                dict1[i + 1] = []                                
                for j in range(len(d0)):
                    if word == d0[j]:                      
                        dict1[i + 1].append(j + 1)
        dict0[word] = dict1
        dict1 = {}
    return dict0
  • 临近搜索中两个倒排记录表的搜索算法模块

PositionalIntersect函数为临近搜索中两个倒排记录表的搜索算法模块,首先获取输入倒排记录表的所有文档ID,然后循环文档ID,获取该文档ID下的倒排记录表,再通过循环获取倒排记录表元素,判断两个元素的距离是否小于等于预设距离,满足则存储文档ID、词项1的倒排记录表、词项2的倒排记录表,不满足则继续循环,最终返回该结果列表。

def PositionalIntersect(p1, p2, k):
    r = []
    k1, k2 = [key for key in p1], [key for key in p2]
    i, j = 0, 0
    while(i < len(p1) and j < len(p2)):
        if(k1[i] == k2[j]):
            l = []
            pp1, pp2 = p1[k1[i]], p2[k2[j]]
            i1, j1 = 0, 0
            while(i1 < len(pp1)):
                while(j1 < len(pp2)):
                    if(abs(pp1[i1] - pp2[j1]) <= k):
                        l.append(pp2[j1])
                    elif(pp2[j1] > pp1[i1]):
                        break
                    j1 = j1 + 1
                while(l != [] and abs(l[0] - pp1[i1]) > k):
                    del(l[0])
                for n in range(0, len(l)):
                    r.append([k1[i], pp1[i1], l[n]])
                i1 = i1 + 1
            i = i + 1
            j = j + 1
        elif(k1[i] > k2[j]):
            j = j + 1
        else:
            i = i + 1  
    return r
  • 代码补全

下面的p1、p2为调试的倒排记录表。

import re
p1 = {1: [7, 18, 33, 72, 86, 231], 2: [1, 17, 74, 222, 255], 4: [8, 16, 190, 429, 433], 5: [363, 367], 7: [13, 23, 191]}
p2 = {1: [17, 25], 4: [17, 191, 291, 430, 434], 5: [14, 19, 101]}
f = open("document.txt", "r")
f0 = f.read()
f.close()
dict0 = createdict(f0)
k = [key for key in dict0]
print("可供查询的词项为:", k, "\n")
print("请输入临近搜索要查询的第一个词项:", end = '')
p1 = dict0[input()]
print("请输入临近搜索要查询的第二个词项:", end = '')
p2 = dict0[input()]
print("临近搜索结果为:\n", PositionalIntersect(p1, p2, 1))    

document.txt模拟文档如下,应该可以用任意一篇英文文档尝试。

There are moments in life when you miss someone so much that you just want to pick them from your dreams and hug them for real! Dream what you want to dream;go where you want to go;be what you want to be,because you have only one life and one chance to do all the things you want to do.
May you have enough happiness to make you sweet,enough trials to make you strong,enough sorrow to keep you human,enough hope to make you happy? Always put yourself in others’shoes.If you feel that it hurts you,it probably hurts the other person, too.
The happiest of people don’t necessarily have the best of everything;they just make the most of everything that comes along their way.Happiness lies for those who cry,those who hurt, those who have searched,and those who have tried,for only they can appreciate the importance of people
Who have touched their lives.Love begins with a smile,grows with a kiss and ends with a tear.The brightest future will always be based on a forgotten past, you can’t go on well in life until you let go of your past failures and heartaches.
When you were born,you were crying and everyone around you was smiling.Live your life so that when you die,you’re the one who is smiling and everyone around you is crying.
Please send this message to those people who mean something to you,to those who have touched your life in one way or another,to those who make you smile when you really need it,to those that make you see the brighter side of things when you are really down,to those who you want to let them know that you appreciate their friendship.And if you don’t, don’t worry,nothing bad will happen to you,you will just miss out on the opportunity to brighten someone’s day with this message.

4.实验成果

根据提示分别输入词项,临近距离即程序PositionalIntersect(p1, p2, k)中的k值设置为1,得到结果如下图。

图4 临近距离为1结果

搜索结果的三个位置数字的含义分别为:第6个文档、send在当前文档第2个位置、this在当前文档第3个位置。现在将距离设置为5再次查询,结果如下图。

图5 临近距离为5结果

5.程序调试过程

在程序调试过程中,预设p1 = {1: [7, 18, 33, 72, 86, 231], 2: [1, 17, 74, 222, 255], 4: [8, 16, 190, 429, 433], 5: [363, 367], 7: [13, 23, 191]},p2 = {1: [17, 25], 4: [17, 191, 291, 430, 434], 5: [14, 19, 101]},运行得到结果如下图所示。
图6 调试过程

  • 5
    点赞
  • 8
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

lazyn

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值