第6章实战之聊天语料处理—6.9,6.10训练语料问答对的处理

x_data=[]
    y_data=[]
    for group in tqdm(groups):
        for i, line in enumerate(group):
            last_line=None
            if i>0:
                last_line=group[i-1]
                if not good_line(last_line):
                    last_line=None
                next_line=None
                if i<len(group)-1:
                    next_line=group[i+1]
                    if not good_line(next_line):
                        next_line=line
                next_next_line=None
                if i<len(group)-2:
                    next_next_line=group[i+2]
                    if not good_line(next_next_line):
                        next_next_line=None

                if next_line:
                    x_data.append(line)
                    y_data.append(next_line)
                if last_line and next_line:
                    x_data.append(last_line+make_split(last_line)+line)
                    y_data.append(next_line)
                if next_line and next_next_line:
                    x_data.append(line)
                    y_data.append(next_line+make_split(next_line)+next_next_line)

    print(len(x_data),len(y_data))

    for ask, answer in zip(x_data[:20], y_data[:20]):
        print(''.join(ask))
        print(''.join(answer))
        print('-'*20)

if __name__=='__main__':
    main()

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值