一、步骤:
在doccano标注好,之后导出标注好的数据,文件后缀名为.jsonl,.jsonl文件示例如下:
{"id":62,"text":"我爱北京。","label":[[2,3,"Location"]],"Comments":[]}
{"id":63,"text":"我爱南京。","label":[[2,3,"Location"]],"Comments":[]}
{"id":64,"text":"我爱天津。","label":[[2,3,"Location"]],"Comments":[]}
将jsonl数据按照6:2:2的比例划为训练集、验证集和测试集;
将划分好的.jsonl文件的数据转换为标准的json数据,json文件示例如下:
[{"id":62,"text":"我爱北京。","label":[[2,3,"Location"]],"Comments":[]},
{"id":63,"text":"我爱南京。","label":[[2,3,"Location"]],"Comments":[]},