统计了下载到的文本分类数据集信息,汇总成表格如下(时间:2020.7.1):
Dataset | Classes | Type | Samples | Best Method | Performance |
---|---|---|---|---|---|
AG News | 4 | Topic | Train:120000 Test: 7600 | XLNet | Error: 4.45 |
Dbpedia | 14 | Topic | Train: 560000 Test: 70000 | XLNet | Error: 0.6 |
TREC-6 | 6 | Question | Train: 5452 Test: 500 | USE_T+CNN | Error: 1.93 |
TREC-50 | 50 | Question | Train: 5452 Test: 500 | Rules | Error: 2.8 |
20NEWS | 20 | Topic | 20,000 | SGC | Acc: 88.5 |
IMDb | 2 | Sentiment | Train: 25,000 Test: 25,000 | XLNet | Acc: 96.8 |
Yahoo! Answers | 10 | Question | Train: 1,400,000 Test: 60,000 | BERT-ITPT-FiT< |