数据可视化分析实验

最新推荐文章于 2024-09-29 09:43:24 发布

努力coding的米羊羊

最新推荐文章于 2024-09-29 09:43:24 发布

阅读量99

点赞数

分类专栏：课程实验文章标签：信息可视化 python matplotlib pandas

本文链接：https://blog.csdn.net/m0_62766582/article/details/134475345

版权

课程实验专栏收录该内容

3 篇文章 0 订阅

订阅专栏

请根据如下语句将JSON 文件内容转换为 DataFrame 内容，显示并仔细查看内容。

导入必要的包

import pandas as pd

import json

import matplotlib.pyplot as plt

读入数据

data_str=open('viewing_infos.json','r',encoding='UTF-8').read()

df=pd.read_json(data_str,orient='records')

df.head()

检查数据是否存在空值

df.isnull().values.any()

由运行结果知，不存在空值

2、文件中包含东方卫视和浙江卫视两个电视台的收视信息，

1）请在同一图中绘出这两个电视台收视率的折线对比图

2）请在同一图中绘出这两个电视台收视率的柱状对比图

3）请在同一图中，画出子图：包含每个电视台的收视率折线图和柱状图

设置绘图前的准备工作，显示中文与正负号、设置主题样式

import numpy as np

import matplotlib.pyplot as plt

import matplotlib



matplotlib.rcParams['font.family']='SimHei'     #将字体设置为黑体'SimHei'

matplotlib.rcParams['font.sans-serif'] = ['SimHei']

plt.style.use('ggplot')  # 设置ggplot主题样式

#绘制折线图

plt.figure(figsize=(20,8))

plt.plot(df['broadcastDate'],df['dongfang_rating'],label='东方卫视')

plt.plot(df['broadcastDate'],df['zhejiang_rating'],label='浙江卫视')



plt.xticks(rotation=45)

plt.legend()

xticks可以设置坐标文字旋转角度

#绘制直方图

plt.figure(figsize=(20,8))

plt.bar(x=np.arange(len(df['broadcastDate'])),height=df['dongfang_rating'],width=0.3,label='东方卫视')

plt.bar(x=np.arange(len(df['broadcastDate']))+0.3,height=df['zhejiang_rating'],width=0.3,label='浙江卫视')

plt.legend()

plt.show()

#同时绘图

plt.figure(figsize=(20,8))

plt.plot(df['broadcastDate'],df['dongfang_rating'],label='东方卫视')

plt.plot(df['broadcastDate'],df['zhejiang_rating'],label='浙江卫视')

plt.bar(x=np.arange(len(df['broadcastDate'])),height=df['dongfang_rating'],width=0.3,label='东方卫视')

plt.bar(x=np.arange(len(df['broadcastDate']))+0.3,height=df['zhejiang_rating'],width=0.3,label='浙江卫视')

plt.legend()

plt.xticks(rotation=45)

plt.show()

3、文件中包含电视剧演员的信息

1 请画出不同性别演员年龄分布的柱状图

2请画出演员体重分布的饼图

3请画出男演员身高的雷达图

数据读入

act_str=open('actor_infos.json','r',encoding='UTF-8').read()

data=pd.read_json(act_str,orient='records')

data.head(20)

发现演员数据记录只有九条，且只有血型存在空值，但并不影响本实验中的数据分析，因此不做处理

查阅资料观察发现，只有乔欣和左小青为女演员，他们身高均低于175，而男性演员身高均高于175，因此用身高做条件分类演员性别

遍历DataFrame的每一行，

如果身高大于175，则为男性置一，不然置零

lst=[]

for index,row in data.iterrows():

    if row['height']>175:

        lst.append(1)

    else:

        print(row['height'])

        lst.append(0)

data['sex']=lst

#数据准备

male_age=[]

male_labels=[]



male_height=[]

female_age=[]

female_labels=[]

for index,row in data.iterrows():

   

    if row['sex']==1:

        male_age.append(2023-row['birth_day'])

        male_labels.append(row['name'])

        male_height.append(row['height'])

    else:

        female_age.append(2023-row['birth_day'])

        female_labels.append(row['name'])

#不同性别演员年龄分布的柱状图图

plt.figure(figsize=(20,8))

x=np.arange(len(male_age))

plt.bar(x=x,height=male_age,width=0.3,label='男性',tick_label=male_labels)

plt.legend()

plt.show()

plt.figure(figsize=(5,4))

x=np.arange(len(female_age))

plt.bar(x=x,height=female_age,width=0.2,label='女性',tick_label=female_labels)

plt.legend()

plt.show()

#演员体重分布的饼图

#饼图

plt.figure(figsize=(20,8))

data['weight']=data['weight'].str.replace('kg','')

plt.pie(x=data['weight'].values,labels=data['name'],autopct='%1.1f%%')

plt.title('2022年饼图')

plt.legend()

plt.show()

饼图autpopct属性显示比例

绘制雷达图、注意标签，角度、数据都要闭合

height=np.array(male_height)

male_labels=np.array(male_labels)

angles = np.linspace(0,2*np.pi,len(height),endpoint=False)

height=np.concatenate((height,[height[0]]))

angles=np.concatenate((angles,[angles[0]]))

male_labels=np.concatenate((male_labels,[male_labels[0]]))

fig = plt.figure(facecolor="white")       #facecolor 设置框体的颜色

plt.subplot(111,polar=True)



plt.plot(angles,height,'bo-',color ='g',linewidth=2,label='男性身高雷达图')

plt.fill(angles,height,facecolor='g',alpha=0.25)

plt.thetagrids(angles*180/np.pi,male_labels)          #做标签

plt.figtext(0.52,0.95,'男性身高雷达图',ha='center')   #添加雷达图标题

plt.grid(True)

plt.legend()

plt.show()