Coursera上线性代数课程作业,要求利用python完成如下任务
Task 4: Write a procedure makeInverseIndex(strlist) that, given a list of strings (documents), returns
a dictionary that maps each word to the set consisting of the document numbers of documents in which that
word appears. This dictionary is called an inverse index. (Hint: use enumerate.)
e.g
input: s=['this is the first sentence.','and this is the second sentence','at last this is the third sentence']
output: {'third': {2}, 'sentence': {0, 1, 2}, 'this': {0, 1, 2}, 'second': {1}, 'is': {0, 1, 2}, 'at': {2}, 'last': {2}, 'and': {1}, 'first': {0}, 'the': {0, 1, 2}}
目的是输入文件,将文件分成若干句子,每个句子有自身的代号,输出的目的是,找出在本文存在的所有词在哪一个代号的句子中出现过。
程序如下:
#make inverse index def makeInverseIndex(strlist): result={} a=[] for i in range(len(strlist)): a.append(strlist[i].split()) for j in range(len(strlist)): for k in a[j]: result[k]={j} for l in range(len(strlist)): if k in a[l]: result[k].add(l) return result s=['this is the first sentence.','and this is the second sentence','at last this is the third sentence'] print(makeInverseIndex(s))
在此,并未利用enumerate函数,以后需要多加研究。
此问题中,之前一直没有完成,原因在于最后一句:result[k].add(l)
之前写的是:先定义一个集合t=set()
然后 result[k]=t.add(l)
返回结果是 result[k]=None
正确的写法 result[k].add(l) 原因在于之前 result[k]={j},即已经将result这一dict类型的变量的key赋值,且赋值类型为{},所以再次调用result[k]时,默认其数据类型为集合(set),可以用.add() method