说明
需要相关文件的,请访问我的 github
仓库,指路。拜托顺手点个小 star
。
步骤
读取评论文件,按照正向和负向两个分类把评论分别写入两个新的文件(正向的评论和负向的评论)。读取前4000条写入正向的评论文件,后8000条写入负向的评论文件。
def separate_csv(file):
""" 将评论按照正向或者负向分别写入两个文件 """
a = 1
with open(file, "r", encoding="utf-8") as f:
reader = csv.reader(f)
for row in reader:
if a < 4002:
a += 1
continue
with open("comments_0.csv", "a", encoding="utf-8", newline="") as f2:
wirter = csv.writer(f2)
wirter.writerow(row)
a += 1
a = 1
with open(file, "r", encoding="utf-8") as f:
reader = csv.reader(f)
for row in reader:
if a == 1:
a += 1
continue
with open("comments_1.csv", "a", encoding="utf-8", newline="") as f2:
wirter = csv.writer(f2)
wirter.writerow(row)
a += 1
if a > 4001:
break
然后我们用jieba
分词工具分别获取正向和负向评价中出现频率最高的50个词,分别写入对应的文件。
def jieba_get_high_frequency_words():
""" 用jieba分词分别提取出正向和负向的高频词 """
col_name = [
'ID',
'comment'
]
csvpd = pd.read_csv("comments_1.csv", names=col_name)['comment']
data = ''.join(csvpd)
with open("high_frequency_word_1.csv", "w", encoding="utf-8", newline="") as f:
csvwriter = csv.writer(f)
i = 1
for keyword, weight in textrank(data, topK=50, withWeight=True):
csvwriter.writerow([i, keyword])