实战IMDB数据集电影评论文本分类(一)

1、加载数据

IMDB数据集:下载

import tensorflow as tf
from tensorflow import keras
import numpy as np
import json

#导入数据集。
data= np.load('imdb/imdb.npz', allow_pickle=True)
train_data, train_labels = data['x_train'], data['y_train']
test_data, test_labels = data['x_test'],data['y_test']

#打印数据集数量
print("Training entries: {}, labels: {}".format(len(train_data), len(train_labels)))

打印结果:

Training entries: 25000, labels: 25000


2、将整数转换回单词

imdb_word_index.json词典:下载

# 一个映射单词到整数索引的词典
with open('imdb/imdb_word_index.json') as f:
    word_index = json.load(f)


# 保留第一个索引
    word_index = {k:(v+3) for k,v in word_index.items()}
    word_index["<PAD>"] = 0
    word_index["<START>"] = 1
    word_index["<UNK>"] = 2  # unknown
    word_index["<UNUSED>"] = 3
    reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])

def decode_review(text):
    return ' '.join([reverse_word_index.get(i, '?') for i in text])
print(decode_review(train_data[1]))

打印结果:

dooley's from halaqah this appears schoolers scooby about than all england that now for being <UNUSED> award and father back movie <START> needed it their ending trip name by love reason they director and face old from haunting that <START> fight because do any the <START> jake this like <UNUSED> become leaving these shrek her down why this thousands <START> wife movie performs tyson's need and crisis awards <START> dhol jewison from shrek it's pulling his women and does more movie <START> m of of for if it's film their either <UNUSED> technology and others movie <START> m that <UNUSED> protagonists action <START> dismissive film ending really they <UNUSED> death <START> local similar <UNUSED> burned frank movie <START> system <UNUSED> fiction <UNK> reason film ending appreciated and only if or who and his jake it a queenie charlotte' blind of of draws condition by unfolds by couple this grammar watched <UNUSED> success your by about reason to <START> quite hollywood anxious and its <UNUSED> technology was <UNUSED> deranged stylish stuttering lubitsch's and only it's are see others to <START> m that hang past action <START> dismissive it's grammar breathtaking are see bad if are close was <UNUSED> fast starring the music there prize <START> icg movie best grammar a details movie <START> needed was <UNUSED> borlenghi movie not ease and blanks not same sequences best are got impact did <START> sophie since either <START> inhuman fluently one <UNUSED> alvarez could or playing movie not cellar best grammar leaves also movies alone <UNUSED> sure one <START> often the satisfy prudish finale builds all steve letting by young claimed actors priest so death <UNK> so subtitled revolving hearing seeks <UNK> zombiez torch lonely by on 5 completely and <START> m days harvey grammar must since but completely and everyone's outright ashleigh who are ending love what character success best or highly from leslie town from his anyway of of these <START> acting shop thought satisfy <UNK> grammar true teenage and many br woman seen chased and his have the draws unknowns harbors characters best donald and character <UNUSED> without is actor <UNUSED> endure behind slightly and not ludicrous old why this manipulation trainer world require from scruffy that <START> fight and over if or who each scenes arrived actors priest is <START> women tv from movie <START> also turns music <UNUSED> tell technology who he success do bad what at how where if and bad was story different put at such both is and <START> jake effects the storyline is who welcomes different of of from put in with been wtf film and father human

  • 2
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

缘起性空、

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值