我有一个满是.txt文件(文档)的目录。首先,我load删除一些文档,去掉一些括号并删除一些引号,因此文档如下所示,例如:document1:
is a scientific discipline that explores the construction and study of algorithms that can learn from data Such algorithms operate by building a model
document2:
Machine learning can be considered a subfield of computer science and statistics It has strong ties to artificial intelligence and optimization which deliver methods
所以我从目录中加载文件如下:
^{pr2}$
然后,我尝试将document1和{}矢量化,以创建如下训练矩阵:from sklearn.feature_extraction.text import HashingVectorizer
vectorizer = HashingVectorizer(analyzer='word')
X = HashingVectorizer.fit_transform(documents)
X.toarray()
这是输出:raise ValueError("empty vocabulary; perhaps the documents only"
ValueError: empty vocabulary; perhaps the documents only contain stop words
我怎样才能创建一个向量表示呢?。我以为我携带的是documents中加载的文件,但似乎无法安装这些文档。在