一、方式一
遇到B则前面存在的实体,进行一次存储。
多个I
粘连一块儿也可能被认为是一个实体。错误的情况是B识别成I
了。
对于类别判断失误,粘连的实体取众数。
#标签转录BIO格式
string="我是李明,我爱中国,我来自呼和浩特"
predict=["o","o","i-per","i-per","o","o","o","b-loc","i-loc","o","o","o","o","b-per","i-loc","i-loc","i-loc"]
# 返回格式
item = {
"string": string, "entities": []}
entity_name = ""
flag=[]
visit=False
for char, tag in zip(string, predict):
if tag[0] == "b":
if entity_name!="":
x=dict((a,flag.count(a)) for a in flag)
y=[k for k,v in x.items() if max(x.values())==v]
item["entities"