豆瓣电影词云图

#导入模块
import jieba#分词
from matplotlib import pyplot as plt#绘图数据可视化
from wordcloud import WordCloud#词云图
import numpy as np#矩阵
import sqlite3#数据库
from PIL import Image#图片处理
#准备词云图需要的文字
con = sqlite3.connect('movie.bd')
cur = con.cursor()#获取游标
sql = 'select instroduction from movie250'
data = cur.execute(sql)#执行sql
text = " "
for item in data:
    text = text + item[0]
print(text)
cur.close()#关闭数据库
con.close()
#分词
cut = jieba.cut(text)
string = ' '.join(cut)
print(string)
print(len(string))
#绘图准备
img = Image.open('6.jpg')
img_array = np.array(img)
wc = WordCloud(
               background_color = 'white',
               mask = img_array,
               font_path = "STXINWEI.TTF"
               )
wc.generate_from_text(string)#切好词放进去

#绘制图片
fig = plt.figure(1)
plt.rcParams['font.sans-serif'] = 'SimHei'#设置字体
plt.imshow(wc)#按词云显示
plt.axis('off')#是否显示坐标
plt.show()#展示生成的词云图
plt.savefig('8.png',dpi = 800)

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
以下是Python爬取豆瓣电影词云图的步骤: 1.导入需要的库和模块 ```python import requests from bs4 import BeautifulSoup import jieba from wordcloud import WordCloud, ImageColorGenerator import matplotlib.pyplot as plt from PIL import Image import numpy as np ``` 2.获取网页源代码并解析 ```python url = 'https://movie.douban.com/subject/26363254/comments?status=P' headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'} response = requests.get(url, headers=headers) soup = BeautifulSoup(response.text, 'html.parser') ``` 3.获取评论内容并进行分词 ```python comments = soup.find_all('span', class_='short') comment_text = '' for comment in comments: comment_text += comment.text words = jieba.cut(comment_text) ``` 4.统计词频并生成词云图 ```python word_counts = {} for word in words: if len(word) == 1: continue else: word_counts[word] = word_counts.get(word, 0) + 1 wordcloud = WordCloud(font_path='msyh.ttc', background_color='white', max_words=200, max_font_size=100, width=800, height=600) wordcloud.generate_from_frequencies(word_counts) plt.imshow(wordcloud, interpolation='bilinear') plt.axis('off') plt.show() ``` 5.生成带有图片的词云图 ```python mask = np.array(Image.open('movie.png')) image_colors = ImageColorGenerator(mask) wordcloud = WordCloud(font_path='msyh.ttc', background_color='white', max_words=200, max_font_size=100, width=800, height=600, mask=mask) wordcloud.generate_from_frequencies(word_counts) plt.imshow(wordcloud.recolor(color_func=image_colors), interpolation='bilinear') plt.axis('off') plt.show() ```

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值