11周课内比较2010-2016年期间Universal Pictures和Paramount Pictures两家影视公司制作各类型电影的平均利润

rua◍'ㅅ'◍rua

已于 2023-12-25 11:36:36 修改

阅读量320

点赞数 10

分类专栏： python数据可视化文章标签： python 数据分析信息可视化

于 2023-12-24 14:42:56 首次发布

本文链接：https://blog.csdn.net/weixin_73122330/article/details/135181722

版权

python数据可视化专栏收录该内容

23 篇文章 1 订阅

订阅专栏

本文使用TMDB5000MovieDataset数据集，分析了2010-2016年间UniversalPictures和ParamountPictures两家公司的电影制作利润，通过数据可视化展示了两家公司在不同电影类型上的平均利润对比。

摘要由CSDN通过智能技术生成

为更好地提供数据支持的电影制作据，要求大家以TMDB 5000 Movie Dataset数据集为研究对象，自选恰当类型图表，完成以下数据可视化任务：

比较2010-2016年期间Universal Pictures和Paramount Pictures两家影视公司制作各类型电影的平均利润。

import pandas as pd
import matplotlib.pyplot as plt             # 导入绘图库

import warnings
# 忽略警告。pandas很多时候会弹出警告，说某条命令即将在新版本中过期，建议换用新命令。如不想看到警告信息，可用此设置忽略
warnings.filterwarnings('ignore')

plt.rcParams['font.family']='SimHei'        # 设置黑体字体以正常显示中文
plt.rcParams['axes.unicode_minus']=False   # 正常显示负号

df = pd.read_excel('第十一周课堂作业（预处理之后的数据）.xlsx')

df.dropna(inplace=True)
#提取年份为2010-2016年的数据
df = df[(df['year'] >= 2010)&(df['year'] <= 2016)]
df.reset_index(inplace=True)

#计算利润=收入-预算 得出利润列（profit）
#预处理数据时已经处理过
#df['profit'] = df['revenue'] - df['budget']

#查看df数据内容

#df.info()
for i in range(0,df.shape[0]):
    if('Universal Pictures' in df.loc[i,'production_companies'])\
    &('Paramount Pictures' in df.loc[i,'production_companies']):
        df.loc[i,'production_companies'] = 'UP'
    elif 'Universal Pictures' in df.loc[i,'production_companies']:
        df.loc[i,'production_companies'] = 'U'
    elif 'Paramount Pictures' in df.loc[i,'production_companies']:
        df.loc[i,'production_companies'] = 'P'
    else:
        df.drop(i,inplace = 'True')
df.reset_index(inplace = True)

#df.to_excel('data_2.xlsx')

#建立genres列表，提取电影的类型
genres_set = set()
for genre in df['genres'].str.split('|'):
    for item in genre:
        genres_set.add(item)
        
genres_list = list(genres_set)
genres_list.sort()

#初始化一个空列表
n = []
for i in range(0,df.shape[0]):
    #df.shape[0]获取一维数组的长度 为104 即df的104行数据
    n.append(len(df.loc[i,'genres'].split('|')))
 
#df.loc[i,'genres'].split('|') 选取df中第i行和'genres'列的元素，将该元素（一个以'|'分隔的字符串）分割成一个列表
#len(...)计算分割后得到列表的长度
#n.append(...)将计算得到的长度添加到列表n中
 
for genre in genres_list:
    for i in range(0,df.shape[0]):
        #遍历df的所有行 df.shape[0]返回df的行数104
        if genre in df.loc[i,'genres']:
            #判断从genres_list取出的电影类型是否在df的第i行的'genres'列中
            df.loc[i,genre] = df.loc[i,'profit']/n[i]
      #如果在，则df的第行的genre列的值为 第i行'profit'列的值÷该行电影类型数量n
#df.to_excel('profit2.xlsx')

#Universal Pictures公司每种电影类型的平均利润
df_Uni = df[(df['production_companies'] == 'U') | (df['production_companies'] == 'UP')]
genre_Uni = round(df_Uni.loc[:,genres_list].mean()/1.0e+4,0)
genre_Uni = genre_Uni.fillna(0)
#genre_Uni.toexcel('genre_Uni.xlsx')

#查看genre_Uni数据内容

Action              8759.0
Adventure           4566.0
Animation          24940.0
Comedy              7605.0
Crime               2466.0
Documentary            0.0
Drama               2244.0
Family             21177.0
Fantasy             5833.0
History             1670.0
Horror              3650.0
Music              11434.0
Mystery              787.0
Romance             4537.0
Science Fiction     2205.0
Thriller            3005.0
War                 1847.0
Western                0.0
dtype: float64

#Paramount Pictures公司每种电影类型的平均利润
df_Par = df[(df['production_companies'] == 'P') | (df['production_companies'] == 'UP')]
genre_Par = round(df_Par.loc[:,genres_list].mean()/1.0e+4,0)
#genre_Par.toexcel('genre_Par.xlsx')

#查看genre_Par数据内容

Action             11289.0
Adventure          12070.0
Animation           5406.0
Comedy              5705.0
Crime               4381.0
Documentary         1036.0
Drama               4804.0
Family              3223.0
Fantasy             4962.0
History              388.0
Horror             11008.0
Music               1036.0
Mystery             7409.0
Romance            10533.0
Science Fiction    13322.0
Thriller            6051.0
War                  388.0
Western             4679.0
dtype: float64

#画图：折线图
x = 0
fig,ax = plt.subplots(1,1,figsize=(12,8))
ax.plot(genre_Uni.index,genre_Uni.values,label='Universal Pictures')
ax.plot(genre_Par.index,genre_Par.values,label='Paramount Pictures')

for y_h,y_l in zip(genre_Uni.values,genre_Par.values):
    plt.text(x,y_h + 400,y_h,ha = 'center',family='SimHei',fontsize=12,fontstyle='normal')
    plt.text(x,y_l + 400,y_l,ha = 'center',family='SimHei',fontsize=12,fontstyle='normal')
    x += 1
    
ax.set_title('2010-2016年期间两家公司制作各类型电影平均利润')
ax.set_xlabel('电影类型')
ax.set_ylabel('平均利润（千万元）')
ax.tick_params(direction='in',length=6,width=2,labelsize=12)
ax.xaxis.set_tick_params(labelrotation=45)
plt.legend(ncol=2)
plt.savefig('11周课内2010-2016两公司制作各电影类型平均利润.png')
plt.show()