原因:数据生成的稀疏矩阵,模型需要的稠密矩阵,两者不兼容
解决:改模型,例如我用高斯朴素贝叶斯出现了这个bug,换用多项式朴素贝叶斯就可以得到较好的结果
model = Pipeline([
('vect',TfidfVectorizer()),
('clf',GaussianNB()),
])
model.fit(X_train,y_train)
Traceback (most recent call last):
File "<ipython-input-55-8ed8bbc4bd13>", line 6, in <module>
model.fit(X_train,y_train)
File "D:\Anaconda\python\lib\site-packages\sklearn\pipeline.py", line 335, in fit
self._final_estimator.fit(Xt, y, **fit_params_last_step)
File "D:\Anaconda\python\lib\site-packages\sklearn\naive_bayes.py", line 210, in fit
X, y = self._validate_data(X, y)
File "D:\Anaconda\python\lib\site-packages\sklearn\base.py", line 432, in _validate_data
X, y = check_X_y(X, y, **check_params)
File "D:\Anaconda\python\lib\site-packages\sklearn\utils\validation.py", line 72, in inner_f
return f(**kwargs)
File "D:\Anaconda\python\lib\site-packages\sklearn\utils\validation.py", line 795, in check_X_y
X = check_array(X, accept_sparse=accept_sparse,
File "D:\Anaconda\python\lib\site-packages\sklearn\utils\validation.py", line 72, in inner_f
return f(**kwargs)
File "D:\Anaconda\python\lib\site-packages\sklearn\utils\validation.py", line 575, in check_array
array = _ensure_sparse_format(array, accept_sparse=accept_sparse,
File "D:\Anaconda\python\lib\site-packages\sklearn\utils\validation.py", line 353, in _ensure_sparse_format
raise TypeError('A sparse matrix was passed, but dense '
TypeError: A sparse matrix was passed, but dense data is required. Use X.toarray() to convert to a dense numpy array.
model = Pipeline([
('vect',TfidfVectorizer()),
('clf',MultinomialNB(alpha=1.0)),
])
model.fit(X_train,y_train)
predict = model.predict(X_test)
from sklearn.metrics import accuracy_score
accuracy=accuracy_score(y_test,predict)
print(accuracy)
###0.8888
解决二:也可以进行转化为稠密矩阵,能力不足,目前这个方法还没弄出来,大家有会的可以教教我