Coursera：liner algebra[week0 inverse_index_lab task4]

最新推荐文章于 2023-01-09 17:33:47 发布

weixin_30670925

最新推荐文章于 2023-01-09 17:33:47 发布

阅读量193

点赞数

原文链接：http://www.cnblogs.com/georgeai/p/3194756.html

版权

Coursera上线性代数课程作业，要求利用python完成如下任务

Task 4: Write a procedure makeInverseIndex(strlist) that, given a list of strings (documents), returns
a dictionary that maps each word to the set consisting of the document numbers of documents in which that
word appears. This dictionary is called an inverse index. (Hint: use enumerate.)

e.g

input: s=['this is the first sentence.','and this is the second sentence','at last this is the third sentence']

output: {'third': {2}, 'sentence': {0, 1, 2}, 'this': {0, 1, 2}, 'second': {1}, 'is': {0, 1, 2}, 'at': {2}, 'last': {2}, 'and': {1}, 'first': {0}, 'the': {0, 1, 2}}

目的是输入文件，将文件分成若干句子，每个句子有自身的代号，输出的目的是，找出在本文存在的所有词在哪一个代号的句子中出现过。

程序如下：

#make inverse index

def makeInverseIndex(strlist):
    result={}
    a=[]
   for i in range(len(strlist)):
        a.append(strlist[i].split())
    for j in range(len(strlist)):
        for k in a[j]:
            result[k]={j}
            for l in range(len(strlist)):
                if k in a[l]:
                    result[k].add(l)
    return result
        
s=['this is the first sentence.','and this is the second sentence','at last this is the third sentence']
print(makeInverseIndex(s))