20 News Groups Dataset(20个新闻组数据集)
数据摘要:
This is a well known data set for text classification, used mainly for training classifiers by using both labeled and unlabeled data (see references below). The data set is a collection of 20,000 messages, collected from UseNet postings over a period of several months in 1993. The data are divided almost evenly among 20 different UseNet discussion groups. Many of the categories fall into overlapping topics; for example 5 of them are about companies discussion groups and 3 of them discuss religion. Other topics included in News Groups are: politics, sports, sciences and miscellanious.
中文关键词:
数据挖掘,新闻,文本分类,交叉主题,
英文关键词:
Data mining,News,Text Classification,Overlapping topics,
数据格式:
TEXT