wordnet.synset需要一个格式为3-part
name的字符串:
word.pos.nn。在
您没有为list1和中的每个单词指定pos.nn部分
list2。在
假设所有的词都是名词似乎是合理的,所以我们可以试试
将字符串'.n.01'附加到list1和list2中的每个字符串:for word1, word2 in IT.product(list1, list2):
wordFromList1 = wordnet.synset(word1+'.n.01')
wordFromList2 = wordnet.synset(word2+'.n.02')
然而,这并不奏效。wordnet.synset('drinks.n.01')引发WordNetError。在
另一方面,same doc
page表示可以
使用synsets方法查找类似单词:
例如,wordnet.synsets('drinks')返回列表:
^{pr2}$
所以在这一点上,你需要考虑一下你想让程序做什么。如果您可以选择列表中的第一项作为drinks的代理,
那你就可以利用for word1, word2 in IT.product(list1, list2):
wordFromList1 = wordnet.synsets(word1)[0]
wordFromList2 = wordnet.synsets(word2)[0]
这将导致程序如下所示:import nltk.corpus as corpus
import itertools as IT
wordnet = corpus.wordnet
list1 = ["apple", "honey", "drinks", "flowers", "paper"]
list2 = ["pear", "shell", "movie", "fire", "tree", "candle"]
for word1, word2 in IT.product(list1, list2):
# print(word1, word2)
wordFromList1 = wordnet.synsets(word1)[0]
wordFromList2 = wordnet.synsets(word2)[0]
print('{w1}, {w2}: {s}'.format(
w1 = wordFromList1.name,
w2 = wordFromList2.name,
s = wordFromList1.lch_similarity(wordFromList2)))
它产生了apple.n.01, pear.n.01: 2.53897387106
apple.n.01, shell.n.01: 1.07263680226
apple.n.01, movie.n.01: 1.15267950994
apple.n.01, fire.n.01: 1.07263680226
...