https://github.com/CLUEbenchmark/CLUENER2020
As we can see in Table 3, for MSRANER[7] and PeopleDailyNER3 dataset, they only have three classic categories (person name, location and organization), while WeiboNER[8, 9] add a category of Geo-political; For BOSONNER4[10], it add three more categories (time, product name, company name), but the it only has 2k samples. It should be mentioned that Resume NER [11] owns 8 categories in which Educational Institution and Ethnicity Background are unique. For Resume NER, the distribution is particularly unbalanced. The category with the largest amount of data is 134 times larger than the category with the smallest amount of data. However, in CLUENER2020, we control the amount of data in each category, making it on the same order of magnitude. See details in Figure 2. Except those three classic categories, CLUENER2020 has 7 other new categories than MSRANER and PeopleDailyNER, and more samples than BOSONNER. Besides diversity, our dataset is also more challenging than other datasets. Currently, state-of-the-art models in Chinese NER tasks got around f1 score 95 or more, while the best model in CLUENER2020 only got around 80 of the f1 score
参考文献:
CLUENER2020: FINE-GRAINED NAMED ENTITY RECOGNITION DATASET AND BENCHMARK FOR CHINESE