数据源来自Kaggle,链接如下:
https://www.kaggle.com/gregorut/videogamesales
文章目录
分析思路
- 游戏题材
- 1.各游戏题材前五名的游戏(总销量排名,北美销量,欧洲销量,日本销量,其他地区销量)
- 2.各题材游戏最多的发行商(前五)
- 不同地区
- 1.不同地区销售额变化趋势
- 2.不同地区最受欢迎的游戏题材
- 3.不同地区最受欢迎的发行商
- 4.不同地区最受欢迎的游戏平台
- 发行平台
- 1.各大平台前五的游戏
- 2.各大平台最受欢迎的游戏题材
- 3.对各平台贡献最大的发行商
- 不同发行商
- 1.各发行商历年的总营收情况(不同地区)
- 2.在不同题材游戏上的营收情况
- 3.在不同平台上的营收情况
导入需要的库,因为只单纯的做分析,基本上就以三个库为主。
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
读取文件
data = pd.read_csv(r'vgsales.csv')
data.head()
各字段的含义:
Rank - Ranking of overall sales(总销量排名)
Name - The games name(游戏名称)
Platform - Platform of the games release (i.e. PC,PS4, etc.)(游戏平台)
Year - Year of the game's release(游戏发行时间)
Genre - Genre of the game(游戏体裁)
Publisher - Publisher of the game(游戏发行商)
NA_Sales - Sales in North America (in millions)(北美销量)
EU_Sales - Sales in Europe (in millions)(欧洲销量)
JP_Sales - Sales in Japan (in millions)(日本销量)
Other_Sales - Sales in the rest of the world (in millions)(世界其他地方销量)
Global_Sales - Total worldwide sales.(全球总销量)
查看数据概况
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16598 entries, 0 to 16597
Data columns (total 11 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Rank 16598 non-null int64
1 Name 16598 non-null object
2 Platform 16598 non-null object
3 Year 16327 non-null float64
4 Genre 16598 non-null object
5 Publisher 16540 non-null object
6 NA_Sales 16598 non-null float64
7 EU_Sales 16598 non-null float64
8 JP_Sales 16598 non-null float64
9 Other_Sales 16598 non-null float64
10 Global_Sales 16598 non-null float64
dtypes: float64(6), int64(1), object(4)
memory usage: 1.4+ MB
缺失值大概在1.6%左右,直接删除缺失字段,对数据整体分布影响不大。
data.dropna(inplace = True)
数据中共有多少种游戏题材:
#游戏题材
data.Genre.unique()
array(['Sports', 'Platform', 'Racing', 'Role-Playing', 'Puzzle', 'Misc',
'Shooter', 'Simulation', 'Action', 'Fighting', 'Adventure',
'Strategy'], dtype=object)
#游戏平台的数量
data.Platform.unique()
数据中共有多少种游戏平台:
array(['Wii', 'NES', 'GB', 'DS', 'X360', 'PS3', 'PS2', 'SNES', 'GBA',
'3DS', 'PS4', 'N64', 'PS', 'XB', 'PC', '2600', 'PSP', 'XOne', 'GC',
'WiiU', 'GEN', 'DC', 'PSV', 'SAT', 'SCD', 'WS', 'NG', 'TG16',
'3DO', 'GG', 'PCFX'], dtype=object)
数据中共有多少游戏发行商:
#游戏发行商的数量
data.Publisher.unique()
array(['Nintendo', 'Microsoft Game Studios', 'Take-Two Interactive',
'Sony Computer Entertainment', 'Activision', 'Ubisoft',
'Bethesda Softworks', 'Electronic Arts', 'Sega', 'SquareSoft',
'Atari', '505 Games', 'Capcom', 'GT Interactive',
'Konami Digital Entertainment'
], dtype=object)
总共有576个发行商,这里就列举一部分。
游戏题材
各游戏题材的前五名
#总销量排名前五(以sports类为例)
data.pivot_table(index=['Genre','Name'],values = 'Global_Sales',aggfunc='sum').loc['Sports',:].sort_values(by='Global_Sales',ascending=False).head()
#北美销量排名前五(以sports类为例)
data.pivot_table(index=['Genre','Name'],values = ['NA_Sales'],aggfunc='sum').loc['Sports',:].sort_values(by='NA_Sales',ascending=False).head(5)
#欧洲销量前五(以sports类为例)
data.pivot_table(index=['Genre','Name'],values='EU_Sales',aggfunc='sum').loc['Sports',:].sort_values(by='EU_Sales',ascending=False).head()
#日本销量前五(以sports为例)
data.pivot_table(index=['Genre','Name'],values='JP_Sales',aggfunc='sum').loc['Sports',:].sort_values(by='JP_Sales',ascending=False).head()
#其他地区销量(以sports为例)
data.pivot_table(index=['Genre','Name'],values='Other_Sales',aggfunc='sum').loc['Sports',:].sort_values(by='Other_Sales',ascending=False).head()
#输出各游戏类型的前五名(以总销量为依据)
for genre in data.Genre.unique():
print(genre)
print(data.pivot_table(index=['Genre','Name'],values = 'Global_Sales',aggfunc='sum').loc[genre,:].sort_values(by='Global_Sales',ascending=False).head())
print('*'*40)
Sports
Global_Sales
Name
Wii Sports 82.74
Wii Sports Resort 33.00
Wii Fit 22.72
Wii Fit Plus 22.00
FIFA 15 19.02
****************************************
Platform
Global_Sales
Name
Super Mario Bros. 45.31
New Super Mario Bros. 30.01
New Super Mario Bros. Wii 28.62
Super Mario World 26.07
Super Mario Bros. 3 22.48
****************************************
Racing
Global_Sales
Name
Mario Kart Wii 35.82
Mario Kart DS 23.42
Gran Turismo 3: A-Spec 14.98
Need for Speed: Most Wanted 14.08
Mario Kart 7 12.21
****************************************
Role-Playing
Global_Sales
Name
Pokemon Red/Pokemon Blue 31.37
Pokemon Gold/Pokemon Silver 23.10
The Elder Scrolls V: Skyrim 19.28
Pokemon Diamond/Pokemon Pearl 18.36
Pokemon Ruby/Pokemon Sapphire 15.85
****************************************
Puzzle
Global_Sales
Name
Tetris 35.84
Brain Age 2: More Training in Minutes a Day 15.30
Dr. Mario 10.19
Pac-Man 9.03
Professor Layton and the Curious Village 5.26
****************************************
Misc
Global_Sales
Name
Wii Play 29.02
Minecraft 23.73
Kinect Adventures! 21.82
Brain Age: Train Your Brain in Minutes a Day 20.22
Guitar Hero III: Legends of Rock 16.40
****************************************
Shooter
Global_Sales
Name
Call of Duty: Modern Warfare 3 30.83
Call of Duty: Black Ops II 29.72
Call of Duty: Black Ops 29.40
Duck Hunt 28.31
Call of Duty: Ghosts 27.38
****************************************
Simulation
Global_Sales
Name
Nintendogs 24.76
The Sims 3 15.45
Animal Crossing: Wild World 12.27
Animal Crossing: New Leaf 9.09
Cooking Mama 5.72
****************************************
Action
Global_Sales
Name
Grand Theft Auto V 55.92
Grand Theft Auto: San Andreas 23.86
Grand Theft Auto IV 22.47
Grand Theft Auto: Vice City 16.19
FIFA Soccer 13 16.16
****************************************
Fighting
Global_Sales
Name
Super Smash Bros. Brawl 13.04
Super Smash Bros. for Wii U and 3DS 12.47
Mortal Kombat 8.40
WWE SmackDown vs Raw 2008 7.41
Tekken 3 7.16
****************************************
Adventure
Global_Sales
Name
Assassin's Creed 11.30
Super Mario Land 2: 6 Golden Coins 11.18
L.A. Noire 5.95
Zelda II: The Adventure of Link 4.38
Rugrats: Search For Reptar 3.34
*