1.项目介绍:
本次项目分析所用数据集来源于kaggle,官网未提出明确的任务需求。但基于国内游戏行业迅猛发展趋势,可以通过不同国家、发布者、游戏类型等因素,对销售数据进行分析,提出有效的建议,帮助电子游戏销量提高。
2.定义问题
(1)电子游戏行业近年来的发展状况
(2)电子游戏市场分析:受欢迎的游戏、类型、发布平台、发行人等;
(3)top发行商的主导什么类型游戏
(4)【高级】预测每年电子游戏销售额。
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
3.查看数据和数据清理
data = pd.read_csv('vgsales.csv')
display('{}records in the dataset'.format(len(data)))
data.head(5)
'16598records in the dataset'
|
Rank |
Name |
Platform |
Year |
Genre |
Publisher |
NA_Sales |
EU_Sales |
JP_Sales |
Other_Sales |
Global_Sales |
0 |
1 |
Wii Sports |
Wii |
2006.0 |
Sports |
Nintendo |
41.49 |
29.02 |
3.77 |
8.46 |
82.74 |
1 |
2 |
Super Mario Bros. |
NES |
1985.0 |
Platform |
Nintendo |
29.08 |
3.58 |
6.81 |
0.77 |
40.24 |
2 |
3 |
Mario Kart Wii |
Wii |
2008.0 |
Racing |
Nintendo |
15.85 |
12.88 |
3.79 |
3.31 |
35.82 |
3 |
4 |
Wii Sports Resort |
Wii |
2009.0 |
Sports |
Nintendo |
15.75 |
11.01 |
3.28 |
2.96 |
33.00 |
4 |
5 |
Pokemon Red/Pokemon Blue |
GB |
1996.0 |
Role-Playing |
Nintendo |
11.27 |
8.89 |
10.22 |
1.00 |
31.37 |
可以看出:
- 该数据集一共有16598行记录
- 该数据集有11个字段:
- Rank 序号
- Name 游戏名
- Platform 运行平台
- Year 游戏发行年份
- Genre 游戏类型
- Publisher 游戏发行者
- NA_Sales 北美销量(百万套)
- EU_Sales 欧盟销量
- JP_Sales 日本销量
- Other_Sales 其他国家销量
- Global_Sales 全球总销量
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16598 entries, 0 to 16597
Data columns (total 11 columns):
Rank 16598 non-null int64
Name 16598 non-null object
Platform 16598 non-null object
Year 16327 non-null float64
Genre 16598 non-null object
Publisher 16540 non-null object
NA_Sales 16598 non-null float64
EU_Sales 16598 non-null float64
JP_Sales 16598 non-null float64
Other_Sales 16598 non-null float64
Global_Sales 16598 non-null float64
dtypes: float64(6), int64(1), object(4)
memory usage: 1.4+ MB
data.isnull().sum()
Rank 0
Name 0
Platform 0
Year 271
Genre 0
Publisher 58
NA_Sales 0
EU_Sales 0
JP_Sales 0
Other_Sales 0
Global_Sales 0
dtype: int64
- 数据质量整体良好,仅Year和Publisher字段存在缺失值
- Year缺失271个,Publisher缺失58个。
- 由于数据总量大,所以删除缺失部分的数据并不影响
data.dropna(inplace=True)
data.reset_index(drop=True,inplace=True)
data.head(10)
|
Rank |
Name |
Platform |
Year |
Genre |
Publisher |
NA_Sales |
EU_Sales |
JP_Sales |
Other_Sales |
Global_Sales |
0 |
1 |
Wii Sports |
Wii |
2006.0 |
Sports |
Nintendo |
41.49 |
29.02 |
3.77 |
8.46 |
82.74 |
1 |
2 |
Super Mario Bros. |
NES |
1985.0 |
Platform |
Nintendo |
29.08 |
3.58 |
6.81 |
0.77 |
40.24 |
2 |
3 |
Mario Kart Wii |
Wii |
2008.0 |
Racing |
Nintendo |
15.85 |
12.88 |
3.79 |
3.31 |
35.82 |
3 |
4 |
Wii Sports Resort |
Wii |
2009.0 |
Sports |
Nintendo |
15.75 |
11.01 |
3.28 |
2.96 |
33.00 |
4 |
5 |
Pokemon Red/Pokemon Blue |
GB |
1996.0 |
Role-Playing |
Nintendo |
11.27 |
8.89 |
10.22 |
1.00 |
31.37 |
5 |
6 |
Tetris |
GB |
1989.0 |
Puzzle |
Nintendo |
23.20 |
2.26 |
4.22 |
0.58 |
30.26 |
6 |
7 |
New Super Mario Bros. |
DS |
2006.0 |
Platform |
Nintendo |
11.38 |
9.23 |
6.50 |
2.90 |
30.01 |
7 |
8 |
Wii Play |
Wii |
2006.0 |
Misc |
Nintendo |
14.03 |
9.20 |
2.93 |
2.85 |
29.02 |
8 |
9 |
New Super Mario Bros. Wii |
Wii |
2009.0 |
Platform |
Nintendo |
14.59 |
7.06 |
4.70 |
2.26 |
28.62 |
9 |
10 |
Duck Hunt |
NES |
1984.0 |
Shooter |
Nintendo |
26.93 |
0.63 |
0.28 |
0.47 |
28.31 |
data.describe()
|
Rank |
Year |
NA_Sales |
EU_Sales |
JP_Sales |
Other_Sales |
Global_Sales |
count |
16291.000000 |
16291.000000 |
16291.000000 |
16291.000000 |
16291.000000 |
16291.000000 |
16291.000000 |
mean |
8290.190228 |
2006.405561 |
0.265647 |
0.147731 |
0.078833 |
0.048426 |
0.540910 |
std |
4792.654450 |
5.832412 |
0.822432 |
0.509303 |
0.311879 |
0.190083 |
1.567345 |
min |
1.000000 |
1980.000000 |
0.000000 |
0.000000 |
0.000000 |
0.000000 |
0.010000 |
25% |
4132.500000 |
2003.000000 |
0.000000 |
0.000000 |
0.000000 |
0.000000 |
0.060000 |
50% |
8292.000000 |
2007.000000 |
0.080000 |
0.020000 |
0.000000 |
0.010000 |
0.170000 |
75% |
12439.500000 |
2010.000000 |
0.240000 |
0.110000 |
0.040000 |
0.040000 |
0.480000 |
max |
16600.000000 |
2020.000000 |
41.490000 |
29.020000 |
10.220000 |
10.570000 |
82.740000 |
data.describe(include='O')
|
Name |
Platform |
Genre |
Publisher |
c |