本文首先从速卖通(Aliexpress)获取到AGM X1手机(战狼2中吴京用的手机)的评论数据,然后利用一个很好的公开词频分析工具WordArt(https://wordart.com/create)来对评论数据进行分析。
1. 获取评论数据
(1) 评论数据获取python代码如下所示:
# -*- coding: utf-8 -*-
"""
Created on Tue Aug 15 16:44:15 2017
@author: Administrator
"""
import urllib.request
from bs4 import BeautifulSoup
import time
import random
import pymysql.cursors
def crawl(url,i):
html1 = urllib.request.urlopen(url).read()
html1 = str(html1)
soup1 = BeautifulSoup(html1,'lxml')
result1 = soup1.find_all(attrs={"class":"r-time"})
#print(result1)
result2 = soup1.find_all(attrs={"class":"buyer-feedback"})
result2 = str(result2)
soup2 = BeautifulSoup(result2,'lxml')
result3 = soup2.find_all('span')
for j in range(0,10):
commentTime = result1[j].string
print(commentTime)
commentContent = result3[j].get_text()
print(commentContent)
'''
数据库操作
'''
#获取数据库链接
connection = pymysql.connect(host = 'localhost',
user = 'root',
password = '123456',
db = 'comment',
charset = 'utf8mb4')
try:
#获取会话指针
with connection.cursor() as cursor:
#创建sql语句
sql = "insert into `agm` (`commentTime`,`commentContent`) values (%s,%s)"
#执行sql语句
cursor.execute(sql,(commentTime,commentContent))
#提交数据库
connection.commit()
finally:
connection.close()
for i in range(1,26):
print("正在下载第{}页数据...".format(i))
#速卖通商品评论链接
url = "https://feedback.aliexpress.com/display/productEvaluation.htm?productId=32789025522&ownerMemberId=224795258&companyId=234539103&memberType=seller&startValidDate=&i18n=true&page=" + str(i)
crawl(url,i)
t = random.randint(11,16)
print("休眠时间为:{}s".format(t))
time.sleep(t)
(2) 获取到的数据格式如下所示:
2. 用WordArt做词频分析
首先选取20个用户的评论数据导入WordArt中,删除部分无用字符之后的初步分析结果如下图所示:
设置好图片的形状、字体、布局等参数之后,画出来的效果如下图所示:、
通过上图,可以很直观地看出评论中哪些词语出现的频率最高。