使用训练好的情感分析模型预测句子结果都是一样的?

  • 关键字:数据字典字符编码

  • 问题描述:使用循环神经网络训练一个IMDB数据集得到一个模型,使用这个模型进行预测句子,无论句子是正面还是负面的,预测的结果都是一样。

  • 报错信息:

[[5146, 5146, 5146, 5146, 5146, 5146], [5146, 5146, 5146, 5146, 5146], [5146, 5146, 5146, 5146]]
Predict probability of  0.54538333  to be positive and  0.45461673  to be negative for review ' read the book forget the movie '
Predict probability of  0.54523355  to be positive and  0.45476642  to be negative for review ' this is a great movie '
Predict probability of  0.54504114  to be positive and  0.45495886  to be negative for review ' this is very bad '
  • 问题复现:在预测是,使用Inferencer接口创建一个预测器,然后把句子里的每个单词转换成列表形式,然后使用word_dict.get(words, UNK)根据数据集的字典把单词转换成标签,然后使用这些标签进行预测,最后预测的都是错误的。错误代码如下:
inferencer = Inferencer(
    infer_func=partial(inference_program, word_dict),
    param_path=params_dirname,
    place=place)
reviews_str = ['read the book forget the movie', 'this is a great movie', 'this is very bad']
reviews = [c.split() for c in reviews_str]
UNK = word_dict['<unk>']
lod = []
for c in reviews:
    lod.append([word_dict.get(words, UNK) for words in c])
print(lod)
base_shape = [[len(c) for c in lod]]
tensor_words = fluid.create_lod_tensor(lod, base_shape, place)
results = inferencer.infer({'words': tensor_words})
  • 解决问题:错误的原因是没使用正确的编码,所以在使用word_dict.get(words, UNK)转换编码时,程序理解里面都是<unk>,所以句子都是<unk>对应的编码。需要对里面的单词转换成UTF-8的字符编码,例子这样word_dict.get(words.encode('utf-8')。正确代码如下:
inferencer = Inferencer(
    infer_func=partial(inference_program, word_dict),
    param_path=params_dirname,
    place=place)
reviews_str = ['read the book forget the movie', 'this is a great movie', 'this is very bad']
reviews = [c.split() for c in reviews_str]
UNK = word_dict['<unk>']
lod = []
for c in reviews:
    lod.append([word_dict.get(words.encode('utf-8'), UNK) for words in c])
print(lod)
base_shape = [[len(c) for c in lod]]
tensor_words = fluid.create_lod_tensor(lod, base_shape, place)
results = inferencer.infer({'words': tensor_words})
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值