使用gensim生成词典时,原代码如下:
import json
from gensim import corpora
# corpus
f = open('filename', 'r')
content = f.read()
a = json.loads(content)
f.close()
corpus = []
for k,v in a.items():
corpus.append(v)
# 生成词典
dictionary = corpora.Dictionary(corpus)
# 生成词袋
mycorpus = [dictionary.doc2bow(text) for text in corpus]
print(mycorpus)
但是报错:
TypeError: doc2bow expects an array of unicode tokens on input, not a single string
试了网上的方法:
将
dictionary = corpora.Dictionary(corpus)
改成:
dictionary = corpora.Dictionary([corpus])
但还是报错:
TypeError: doc2bow expects an array of unicode tokens on input, not a single string
⭐最终解决方法来了!!!
把这两句:
# 生成词典
dictionary = corpora.Dictionary(corpus)
# 生成词袋
mycorpus = [dictionary.doc2bow(text) for text in corpus]
改成:
# 生成词典
dictionary = corpora.Dictionary([corpus])
# 生成词袋
mycorpus = [dictionary.doc2bow(text) for text in [corpus]]
也就是将 corpus 都加上[ ],就可以正常运行啦!!!
希望这篇文章对您有所帮助✌