英文情感

最新推荐文章于 2022-03-05 11:25:42 发布

yongquanfengjie

最新推荐文章于 2022-03-05 11:25:42 发布

阅读量206

点赞数

本文链接：https://blog.csdn.net/yongquanfengjie/article/details/106656989

版权



import numpy as np
import tensorflow as tf
from string import punctuation
from collections import Counter

# 介绍预览该项目，并介绍该项目网络结构！

with open('../datas/sentiment/reviews.txt', 'r') as f:
    reviews = f.read()
with open('../datas/sentiment/labels.txt', 'r') as f:
    labels = f.read()
print(reviews[0])


# 数据预处理
# todo-1、移除所有标点符号(生成1个没有标点符号的列表，然后再组合成文本)
all_text = ''.join([c for c in reviews if c not in punctuation])

# todo 2、以'\n'为分隔符，拆分所有评论
reviews = all_text.split('\n')
all_text = ' '.join(reviews)
# 文本拆分为单独的单词列表
words = all_text.split()


# todo 1、创建数据字典：{单词：整数}。后面我们会对input向量填充0，编码的整数从1开始（不是0）
#      2、将所有文本转换成为整数，并存储到新的列表中：reviews_ints.

counts = Counter(words)
# 按计数进行排序
vocab = sorted(counts, key=counts.get, reverse=True)
# 生成字典：{单词：整数}
vocab_to_int = {
   word: ii for ii, word in enumerate(vocab, 1)}

# 将文本列表 转换为 整数列表
reviews_ints = []
for each in reviews:
    reviews_ints.append([vocab_to_int[word] for word in each.split()])

# todo-对labels进行编码： 将标签转换为数值：positive==1 和 negative ==0
labels = labels.split('\n')
labels = np.array([1 if each == 'positive' else 0 for each in labels])


# todo-有一个问题：
"""
有一条评论长度为0；且最长的评论长度为2514，过长了一点。所以将其截断成200的长度：
   1、评论长度小于200的，对其左边填充0， 
   2、对于大于200的，只截取其前200个单词。
"""
review_lens = Counter([len(x) for x in reviews_ints])
print("长度为0的评论数量: {}".format(review_lens[0]))
print("最大评论的长度为: {}".format(max(review_lens)))

# todo-从  reviews_ints列表中移除0长度的评论。
# 获得长度非0的 评论的索引号
non_zero_idx = [ii for ii, review in enumerate(reviews_ints) if len(review) != 0]
# 为了确保代码不出错，用in判断下
reviews_ints = [reviews_ints[ii] for ii in non_zero_idx]
labels = np.array

最低0.47元/天解锁文章

yongquanfengjie

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
英文情感

import numpy as npimport tensorflow as tffrom string import punctuationfrom collections import Counter# 介绍预览该项目，并介绍该项目网络结构！with open('../datas/sentiment/reviews.txt', 'r') as f: reviews = f.read()with open('../datas/sentiment/labels.txt', 'r'..
复制链接

扫一扫