说明:
安装
$ pip install lda --user
示例
from __future__ import division, print_function
import numpy as np
import lda
import lda.datasets
# document-term matrix
X = lda.datasets.load_reuters()
print("type(X): {}".format(type(X)))
print("shape: {}\n".format(X.shape))
print(X[:5, :5])
'''输出:
type(X):
shape: (395L, 4258L)
[[ 1 0 1 0 0]
[ 7 0 2 0 0]
[ 0 0 0 1 10]
[ 6 0 1 0 0]
[ 0 0 0 2 14]]
'''
X为395*4298的矩阵,意味着395个文本,共4258个单词。值代表出现次数。
看一下是哪些单词:
# the vocab
vocab = lda.datasets.load_reuters_vocab()
print("type(vocab): {}".format(type(vocab)))
print("len(vocab): {}\n".format(len(vocab)))
print(vocab[:6])
&#