数据可视化分析案例：基于Python的2021中国品牌Top100强数据分析

置顶 wmy0217_

已于 2022-08-16 09:43:25 修改

阅读量2.1k

点赞数 8

分类专栏： python 文章标签： python 数据可视化词云图

于 2021-07-23 20:39:35 首次发布

本文链接：https://blog.csdn.net/wmy0217_/article/details/119043829

版权

python 专栏收录该内容

11 篇文章 1 订阅

订阅专栏

文章目录

老师布置的期末作业，特来总结一下，希望对大家有帮助。

爬取网站

在这里插入图片描述爬的网站在这里哦！
（2021-05-20日在此网站发布2021中国品牌价值500强名单）
本次数据分析只收取前100强名单。

如何爬取

通过使用request、BeautifulSoup库对网站发送请求。并在网站的开发者模式下观察html代码，并找到所需信息的在代码中的位置。解析时采用BeautifulSoup库的find函数。最后将爬取的数据导入excel表格中。

代码

代码分开写了，main实现了爬取，wmy实现了数据可视化。
main：

import requests
from bs4 import BeautifulSoup
import csv

header = {'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.77 Safari/537.36'}

fp = open("2021中国品牌价值top100榜.csv", 'a', newline='', encoding = 'utf-8-sig')
writer = csv.writer(fp)

s = 1
r = requests.get('https://www.maigoo.com/news/593320.html',headers=header)
r.encoding = 'utf-8'
soup = BeautifulSoup(r.text,'html.parser')
for item in soup.find_all('tr'):
    if(s > 505): break
    for i in item.find_all('td'):
        if(s % 5 == 1):
            id = i.string
        elif(s % 5 == 2):
            name = i.string
        elif (s % 5 == 3):
            hang = i.string
        elif (s % 5 == 4):
            loc = i.string
        else:
            rate = i.string
        s += 1
    print(id,name,hang,loc,rate)
    writer.writerow((id,name,hang,loc,rate))

wmy：

import csv
import pandas as pd
import matplotlib.pyplot as plt
from collections import Counter
from wordcloud import WordCloud
from PIL import Image
import numpy as np

#按行业统计，统计每种行业有多少个公司
data = '2021中国品牌价值top100榜.csv'
data = pd.read_csv('2021中国品牌价值top100榜.csv')
list_hang = data[u'行业'] #按行业
list_hang = list(list_hang)
c = Counter(list_hang)

#绘制饼图
fig = plt.figure()
label = list(c.keys())
value = list(c.values())
plt.rcParams['font.sans-serif']=['SimHei'] #显示中文标签
plt.rcParams['axes.unicode_minus']=False   #解决负号“-”显示为方块的问题
plt.pie(value[0:10],labels=label[0:10],startangle=180,autopct="%1.2f%%",shadow=True)
plt.title("Top10行业所占比")
plt.show()

#按总部所在地统计
list_loc = data[u'总部所在地']
list_loc = list(list_loc)
b = Counter(list_loc)
print(b)

#绘制柱状图
heng = list(b.keys()) #横坐标
zong = list(b.values()) #纵坐标
plt.rcParams['font.sans-serif']=['SimHei'] #显示中文标签
plt.rcParams['axes.unicode_minus']=False   #解决负号“-”显示为方块的问题
plt.bar(heng[0:10],zong[0:10])
plt.title("总部所在地情况Top10")
plt.show()

#按增长率统计
list_zeng = data[u'品牌价值(人民币)/年增长率']
list_zeng = list(list_zeng)
d = []
for i in list_zeng:
    x = i.find('/')
    if(x!=-1):
        d.append(i[x+1:len(i)-1])
print(d)
zeng = []
for i in d[0:5]:
    if(i[2] != '.'): #两位数
        x = int(i[1])*10 + int(i[2])
    else: x = int(i[1])

    if(i[0] == '-'): zeng.append(-int(x))
    else: zeng.append(int(x))

# 绘制柱状图
list_pin = list(data[u'品牌名称'])
plt.rcParams['font.sans-serif']=['SimHei'] #显示中文标签
plt.rcParams['axes.unicode_minus']=False   #解决负号“-”显示为方块的问题
plt.bar(list_pin[0:5],zeng[0:5])
plt.title("Top5品牌价值年增长率")
plt.show()

#生成词云图
list_pin = list(data[u'品牌名称'])
c = ""
for i in list_pin:
    c = c + str(i) + ' '

print(c)

mask_pic = np.array(Image.open(r'D:\33.png'))
word = WordCloud(
    font_path='C:/Windows/Fonts/simfang.ttf',  # 设置字体，本机的字体
    mask=mask_pic,  # 设置背景图片
    background_color='white',  # 设置背景颜色
    max_font_size=150,  # 设置字体最大值
    max_words=1000,  # 设置最大显示字数
                 ).generate(c)

image = word.to_image()
image.show()

运行结果

爬取运行结果(main)

在这里插入图片描述

数据可视化运行结果(wmy)

在这里插入图片描述

结果分析

通过以上分析可以得出：
1、 100强中大多数品牌行业分布在银行（21.33%）、科技（16.00%）、保险（13.33%）、零售（10.67%）、媒体文化（9.33%）等。
2、 100强中大多数品牌的总部分布在北京、广东、上海、香港等地。
3、前三品牌价值分别是中国工商银行、微信、中国建设银行。
4、前5强年增长率最高的是腾讯QQ，其次是微信。中国工商、中国建设、华为呈现负增长。