因为实验要用到20newsgroups,所以决定好好看一下sklearn关于20newsgroups的官方文档
文档网址:http://scikit-learn.org/stable/datasets/twenty_newsgroups.html#usage
20newsgroups划分成train和test
这是我下载的
然后看一下文章的个数
from sklearn.datasets import fetch_20newsgroups
newsgroups_train = fetch_20newsgroups(subset='train')
newsgroups_test &