使用Gensim时报“the array at index 0 has size 1024 and the array at index 1 has size 1023”（没有解决）

最新推荐文章于 2021-06-22 21:08:01 发布

蛐蛐蛐

最新推荐文章于 2021-06-22 21:08:01 发布

阅读量1.7k

点赞数

分类专栏：论文点评

本文链接：https://blog.csdn.net/qysh123/article/details/102752234

版权

论文点评专栏收录该内容

38 篇文章 1 订阅

订阅专栏

这里只是记录一下这个问题。最近又用Gensim+LSI做了一些分类的事情，但是发现如果将LSI中的topic number设置过高，则很有可能会报错，一些示例性的代码如下：

dictionary = corpora.Dictionary(line.lower().split() for line in open(FILE_STRING))
corpus = [dictionary.doc2bow(line.lower().split()) for line in open(FILE_STRING)]
tfidf_model = models.TfidfModel(corpus)
corpus_tfidf = tfidf_model[corpus]
lsi_model = models.LsiModel(corpus_tfidf,id2word=dictionary,num_topics=NUM_TOPICS)
corpus_lsi = lsi_model[corpus_tfidf]


this_corpus=dictionary.doc2bow(this_line.lower().split())#例如这里要对this_line来生成其LSI向量
this_corpus_tfidf=tfidf_model[this_corpus]
this_lsi=lsi_model[this_corpus_tfidf]
print(this_lsi)
vec_list=[]
for each_topic in this_lsi:
    vec_list.append(each_topic[1])
print(len(vec_list))#生成之后发现如果NUM_TOPICS设置得太高，则这里的vector会少一个维度，真的太奇怪了。

如我设置NUM_TOPICS==1024，则很容易报下面的错：

ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 1024 and the array at index 1 has size 1023

所以经验是使用LSI时维度不能设置得太高，否则很多训练数据会不可用。也实在不知道原因在哪里，只是觉得很奇怪！

蛐蛐蛐

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
3
评论
使用Gensim时报“the array at index 0 has size 1024 and the array at index 1 has size 1023”（没有解决）

这里只是记录一下这个问题。最近又用Gensim+LSI做了一些分类的事情，但是发现如果将LSI中的topic number设置过高，则很有可能会报错，一些示例性的代码如下：dictionary = corpora.Dictionary(line.lower().split() for line in open(FILE_STRING))corpus = [dictionary.doc2bo...
复制链接

扫一扫