数据集简介
本文的数据集来自Kaggle网站游戏数据集,样本量为81312,20个字段。链接如下:board-game-data
分析目的:选出最具潜力的棋牌游戏
数据分析过程
1、游戏类型分布情况
--SELECT
type,
COUNT(type)
FROM
`games`
GROUP BY
type;
解释:得到两种类型的玩家分布情况
2、发行时间
SELECT
yearpublished,
COUNT(yearpublished)
FROM
`games`
GROUP BY
yearpublished;
解释:选择了游戏发行数量最多的5个年份分别是2014、2012、2011、2009、2010,发行量集中于3200。
3、平均在线人数分布
SELECT
(
CASE
WHEN avgplayers > 0
AND avgplayers <= 2 THEN '0~2' WHEN avgplayers > 2
AND avgplayers <= 4 THEN '2~4' WHEN avgplayers > 4
AND avgplayers < 8 THEN
'5~8' ELSE '8个以上'
END
) AS avgplayers_distribution,
COUNT( id ) AS avgplayers_number,
CONCAT( ROUND( COUNT( id ) / 70065 * 100, 2 ), '%' ) AS avgplayers_rate
FROM
( SELECT id, ROUND( SUM( minplayers + maxplayers ) / 2, 2 ) AS avgplayers FROM games GROUP BY id ) test
GROUP BY
(
CASE
WHEN avgplayers > 0
AND avgplayers <= 2 THEN '0~2' WHEN avgplayers > 2
AND avgplayers <= 4 THEN '2~4' WHEN avgplayers > 4
AND avgplayers < 8 THEN
'5~8' ELSE '8个以上'
END
);
解释:棋盘类游戏玩家人数大多出分布在2~4人,接下来就是一个或者两个人
4、游戏平均时长分布
SELECT
(
CASE
WHEN avgplaytime > 0
AND avgplaytime <= 30 THEN '0~30' WHEN avgplaytime > 30
AND avgplaytime <= 60 THEN '30~60' WHEN avgplaytime > 60
AND avgplaytime <= 90 THEN '60~90' WHEN avgplaytime > 90
AND avgplaytime <= 120 THEN '90~120' WHEN avgplaytime > 120
AND avgplaytime <= 180 THEN '120~180' WHEN avgplaytime > 180
AND avgplaytime <= 240 THEN
'180~240' ELSE '240以上'
END
) AS avgplaytime_distribution,
COUNT( id ) AS avgplaytime_number,
CONCAT( ROUND( COUNT( id ) / 70065 * 100, 2 ), "%" ) AS avgplaytime_rate
FROM
( SELECT ROUND( SUM( minplaytime + maxplaytime ) / 2, 2 ) AS avgplaytime, id FROM games GROUP BY id ) test
GROUP BY
(
CASE
WHEN avgplaytime > 0
AND avgplaytime <= 30 THEN '0~30' WHEN avgplaytime > 30
AND avgplaytime <= 60 THEN '30~60' WHEN avgplaytime > 60
AND avgplaytime <= 90 THEN '60~90' WHEN avgplaytime > 90
AND avgplaytime <= 120 THEN '90~120' WHEN avgplaytime > 120
AND avgplaytime <= 180 THEN '120~180' WHEN avgplaytime > 180
AND avgplaytime <= 240 THEN
'180~240' ELSE '240以上'
END
)
ORDER BY
avgplaytime_rate DESC;
解释:平均游戏时长只要集中在在0-30h,30h-60h,240以上
5、玩家最小年龄段统计
SELECT
minage,
COUNT(id)
FROM
games
WHERE
minage IS NOT NULL
AND minage > 0
GROUP BY
minage
ORDER BY
COUNT(id) DESC;
解释:低年龄段集中在5-12岁
6、玩家平均评分
SELECT ROUND(average_rating,2) AS average_rating,COUNT(average_rating) as rate_number
FROM games
WHERE average_rating>0
GROUP BY average_rating
ORDER BY rate_number DESC;
解释:5分与6分最多
7、最具潜力的游戏
SELECT DISTINCT name,total_owners,total_wishers,average_rating,total_comments,
CONCAT(ROUND((total_wanters/total_owners)*100,1),'%') AS wishers_accounting
FROM games
ORDER BY total_owners+0 DESC;
-- 想要购买的人数比例超过2%
SELECT DISTINCT name,total_owners,total_wishers,average_rating,total_comments,
CONCAT(ROUND((total_wanters/total_owners)*100,1),'%') AS wishers_accounting
FROM games
WHERE ROUND((total_wanters/total_owners)*100,1)> 2
AND
total_owners>10000
ORDER BY total_owners ASC ,wishers_accounting+0 DESC;
解释:7wonders与Agricola潜在购买人数最多,同时评价最好,看好后续发展