疯犬少年的天空热评分析_分析小说评论数与评分的相关性给出相关参数的代码-CSDN博客

本文链接：https://blog.csdn.net/qq_42965915/article/details/108952384

该博客通过分析B站热门青春剧《风犬少年的天空》的观众评分和评论，发现大部分观众给出了10分的高评价，显示出对剧集的强烈喜爱。词云图显示高频词汇如'青春'、'感动'，揭示了剧集引发的共鸣。然而，少部分差评中关键词'天天'、'一直'暗示部分观众可能因过度推广感到不满。分析认为，方言和推送策略可能是导致低分的原因。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

最近B站有一部很火的青春剧《风犬少年的天空》，现在看到第9集（迫切等更），嘿好看，于是爬了它的评论，累计10000+，等更完了应该远不止一万，不管它，先看看小破站上观影人的ATTITUDE。
在这里插入图片描述
需要源代码可以评论，留下邮箱。
开干！！！！！！！！！！！！！！

导入相关库

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import os
import jieba
import re
from collections import Counter
from wordcloud import WordCloud, ImageColorGenerator
from PIL import Image

plt.rcParams['font.sans-serif'] = ['SimHei']

os.chdir('D:\学习笔记\python\爬虫\风犬少年的天空热评')

data = pd.read_excel('shortviews.xlsx', sheet_name='views', encoding='GBK')

评分情况

remark = data.groupby(data['score'])[['score']].count()
remark['rate'] = round(remark['score'] / remark['score'].sum(), 3)
remark.columns = ['freq', 'rate']
remark

	freq	rate
score
2	992	0.094
4	187	0.018
6	198	0.019
8	591	0.056
10	8619	0.814

plt.figure(figsize=(12, 6))
plt.style.use('ggplot')
plt.bar(x=remark.index, height=remark['freq'], bottom=0, color=['grey', 'grey', 'grey', 'grey', 'brown'])
plt.grid(False)
plt.title('评分情况', fontdict=dict(fontsize=30))
plt.xlabel('评分', fontsize=18)
plt.ylabel('计数', fontsize=18)
plt.tick_params(labelsize=16)
plt.show()

在这里插入图片描述

可以看出，仅10分占了81%，B友都很是喜欢啊！

总评词云图

txt = ''.join(data['content'].values.tolist())

txt = re.sub('[，‘“”；’()（）？！。Bb【】的了 是看也就]', '', txt)

segments = jieba.lcut(txt)

count = Counter(segments)

res = sorted(count.items(), key=lambda x: x[1], reverse=True)

image = Image.open('bg.jpg')
img = np.array(image)

wc=WordCloud(
        background_color="#fff", #设置背景为白色，默认为黑色
        width=990,              #设置图片的宽度
        height=440,              #设置图片的高度
        margin=10,               #设置图片的边缘
        max_font_size=100,
        random_state=30,
        font_path='C:/Windows/Fonts/simkai.ttf', #中文处理，用系统自带的字体
        mask=img
    ).generate_from_frequencies(count)

# wc.to_image().show()

wc.to_file('wc.png')
plt.figure(figsize=(25,25))
plt.imshow(wc)
plt.axis('off')
plt.show()

在这里插入图片描述

In my personal perspective, B友们对这部剧表现出不同程度的喜爱，很大一部分原因可能是某一剧情引起了大家的回忆从而产生的共鸣，毕竟2020年了，国家这些年对教育的重视和包容，高中绝大多数都上过。

graphara = {}
for _key, _value in count.items():
    if len(_key) > 1:
        graphara.update({_key: _value})

wc=WordCloud(
        background_color="#fff", #设置背景为白色，默认为黑色
        width=990,              #设置图片的宽度
        height=440,              #设置图片的高度
        margin=10,               #设置图片的边缘
        max_font_size=100,
        random_state=30,
        font_path='C:/Windows/Fonts/simkai.ttf', #中文处理，用系统自带的字体

    ).generate_from_frequencies(graphara)

# wc.to_image().show()
# wc.to_file('wc1.png')
plt.figure(figsize=(12, 8))
plt.imshow(wc)
plt.axis('off')
plt.show()

在这里插入图片描述

青春、感动、真实、搞笑、现实、遗憾…，大家领悟吧！

差评

dislike = data[data['score'] < 5]

txt1 = ''.join(dislike['content'].values.tolist())
txt1 = re.sub('[，‘“”；’()（）？！。Bb【】的了 是看也就]', '', txt1)
segments = jieba.lcut(txt1)
count = Counter(segments)
dis_graphara = {}
for _key, _value in count.items():
    if len(_key) > 1:
        dis_graphara.update({_key: _value})
wc=WordCloud(
        background_color="#fff", #设置背景为白色，默认为黑色
        width=990,              #设置图片的宽度
        height=440,              #设置图片的高度
        margin=10,               #设置图片的边缘
        max_font_size=100,
        random_state=30,
        font_path='C:/Windows/Fonts/simkai.ttf', #中文处理，用系统自带的字体

    ).generate_from_frequencies(dis_graphara)

# wc.to_image().show()
plt.figure(figsize=(18, 8))
plt.imshow(wc)
plt.axis('off')
plt.show()