Kreas之电影评论分类----二分类问题

二分类问题可能是应用最广泛的机器学习问题。在这个例子中,你将学习根据电影评论的 文字内容将其划分为正面或负面

IMDB 数据 

  • IMDB介绍:

本文使用IMDB 数据集,它包含来自互联网电影数据库(IMDB)的50 000 条严重两极分 化的评论。数据集被分为用于训练的 25 000 条评论与用于测试的 25 000 条评论,训练集和测试 集都包含 50% 的正面评论和 50% 的负面评论。 为什么要将训练集和测试集分开?因为你不应该将训练机器学习模型的同一批数据再用于 测试模型!模型在训练数据上的表现很好,并不意味着它在前所未见的数据上也会表现得很好, 而且你真正关心的是模型在新数据上的性能(因为你已经知道了训练数据对应的标签,显然不 再需要模型来进行预测)。例如,你的模型最终可能只是记住了训练样本和目标值之间的映射关 系,但这对在前所未见的数据上进行预测毫无用处。下一章将会更详细地讨论这一点。 与 MNIST 数据集一样,IMDB 数据集也内置于 Keras 库。它已经过预处理:评论(单词序列) 已经被转换为整数序列,其中每个整数代表字典中的某个单词。 下列代码将会加载 IMDB 数据集(第一次运行时会下载大约 80MB 的数据)。 

  • IMDB下载:

链接:https://pan.baidu.com/s/1maS2Xn5SLeZfsh1u2qu4HA 
       提取码:8tyo

  • python加载IMDB数据集
from keras.datasets import imdb

(train_data,train_labels),(test_data,test_labels) = imdb.load_data(num_words=10000)

 参数 num_words=10000 的意思是仅保留训练数据中前10 000 个最常出现的单词。低频单 词将被舍弃。这样得到的向量数据不会太大,便于处理。 train_data 和 test_data 这两个变量都是评论组成的列表,每条评论又是单词索引组成 的列表(表示一系列单词)。train_labels 和 test_labels 都是0 和 1 组成的列表,其中0 代表负面(negative), 1 代表正面(positive)。

  • 解码IMDB数据集
from keras.datasets import imdb


def main():
    (train_data,train_labels),(test_data,test_labels) = imdb.load_data(num_words=10000)
    word_index = imdb.get_word_index()
    reverse_word_index = dict(
        [(value,key) for (key,value) in word_index.items()]
    )
    decoded_review = ' '.join(
        [reverse_word_index.get(i-3,'?') for i in train_data[0]]
    )
    print(decoded_review)

if __name__ == '__main__':
    main()

以上代码是将数据集中第一条影评的索引编码转化成字符编码

结果:

 

? this film was just brilliant casting location scenery story direction everyone's really suited the part they played and you could just imagine being there robert ? is an amazing actor and now the same being director ? father came from the same scottish island as myself so i loved the fact there was a real connection with this film the witty remarks throughout the film were great it was just brilliant so much that i bought the film as soon as it was released for ? and would recommend it to everyone to watch and the fly fishing was amazing really cried at the end it was so sad and you know what they say if you cry at a film it must have been good and this definitely was also ? to the two little boy's that played the ? of norman and paul they were just brilliant children are often left out of the ? list i think because the stars that play them all grown up are such a big profile for the whole film but these children are amazing and should be praised for what they have done don't you think the whole story was so lovely because it was true and was someone's life after all that was shared with us all

 继续学习,做笔记中.......

 

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值