使用gensim实现lda,并计算perplexity( gensim Perplexity Estimates in LDA Model)
Neither. The values coming out of bound()
depend on the number of topics (as well as number of words), so they’re not comparable across different num_topics (or different test corpora).
No, the opposite: a smaller bound value implies deterioration. For example, bound -6000 is “better” than -7000 (bigger is better
-====================================================
You can use method log_perplexity
for evaluating your LdaModel
Small code example1
from gensim.models import LdaModel
from gensim.corpora import Dictionary
import numpy as np
docs = [["a", "a", "b"],
["a", "c", "g"],
["c"],
["a", "c", "g"]]
dct = Dictionary(docs)
corpus = [dct.doc2bow(_) for _ in docs]
c_train, c_test = corpus[:2], corpus[2:]
ldamodel = LdaModel(corpus=c_train, num_topics=2, id2word=dct)
Per-word Perplexity=ldamodel.log_perplexity(c_test)
print(Per-word Perplexity)
Small code example1
I am attempting to estimate an LDA topicmodel for a corpus of ~59,000 documents and ~500,000 unique tokens. I would prefer to estimate the final model in R to utilize its visualization tools for interpreting my results, however firs