分析背景
对某电商2016年3月-5月的用户及订单信息进行统计,需要根据已知数据进行简单分析。本文主要进行代码练习。
分析要求
1、统计不同月份的下单人数
2、统计用户三月份的回购率和复购率
3、统计男女用户消费频次是否有差异
4、统计多次消费的用户,第一次和最后一次消费间隔是多少天
5、统计不同年龄段,用户的消费金额是否有差异
6、统计消费的二八法则,消费的top20%用户,贡献了多少消费额
使用工具:MySQL,EXCEL
一、导入数据
用show columns查看数据字典,了解数据结构
涉及表:
二、数据分析
1、统计不同月份的下单人数
SELECT YEAR
( paidtime ),
MONTH ( paidtime ),
count( DISTINCT userid ) AS cou
FROM
orderinfo
WHERE
ispaid = '已支付'
AND paidtime <> '0000-00-00 00:00:00'
GROUP BY
YEAR ( paidtime ),
MONTH ( paidtime );
2、统计用户三月份的回购率和复购率
复购率:当月购买了多次的用户占当月用户的比例
回购率:上月购买用户中有多少用户本月又再次购买
复购率:
a、先筛选出3月份的消费情况
SELECT
*
FROM
orderinfo
WHERE
isPaid = "已支付"
AND MONTH ( paidTime ) = "03";
b、统计一下每个用户在3月份消费了多少次
SELECT
userid,
count( 1 ) AS cons
FROM
orderinfo
WHERE
isPaid = "已支付"
AND MONTH ( paidTime ) = "03"
GROUP BY
userid;
c、对购买次数做一个判断,统计出来那些消费了多次(大于1次)的用户数
SELECT
count( 1 ) AS userid_cons,
sum( IF ( cons > 1, 1, 0 ) ) AS fugou_cons,
sum( IF ( cons > 1, 1, 0 ) ) / count( 1 ) AS fugou_rate
FROM
( SELECT userid, count( 1 ) AS cons FROM orderinfo WHERE isPaid = "已支付" AND MONTH ( paidTime ) = "03" GROUP BY userid ) a;
回购率:
3月份的回购率 = 3月用户中4月又再次购买的人数 / 3月的用户总数
a、统计每年每月的一个用户消费情况
SELECT
userid,
date_format( paidTime, '%Y-%m-01' ) AS month_dt,
count( 1 ) AS cons
FROM
orderinfo
WHERE
isPaid = "已支付"
GROUP BY
userid,
date_format( paidTime, '%Y-%m-01' );
b、相邻月份进行关联,能关联上的用户说明就是回购
SELECT
*
FROM
(
SELECT
userid,
date_format( paidTime, '%Y-%m-01' ) AS month_dt,
count( 1 ) AS cons
FROM
orderinfo
WHERE
isPaid = "已支付"
GROUP BY
userid,
date_format( paidTime, '%Y-%m-01' )
) a
LEFT JOIN (
SELECT
userid,
date_format( paidTime, '%Y-%m-01' ) AS month_dt,
count( 1 ) AS cons
FROM
orderinfo
WHERE
isPaid = "已支付"
GROUP BY
userid,
date_format( paidTime, '%Y-%m-01' )
) b ON a.userid = b.userid
AND date_sub( b.month_dt, INTERVAL 1 MONTH ) = a.month_dt;
c、统计每个月份的消费人数情况及格得到回购率
SELECT
a.month_dt,
count( a.userid ),
count( b.userid ),
count( b.userid ) / count( a.userid )
FROM
(
SELECT
userid,
date_format( paidTime, '%Y-%m-01' ) AS month_dt,
count( 1 ) AS cons
FROM
orderinfo
WHERE
isPaid = "已支付"
GROUP BY
userid,
date_format( paidTime, '%Y-%m-01' )
) a
LEFT JOIN (
SELECT
userid,
date_format( paidTime, '%Y-%m-01' ) AS month_dt,
count( 1 ) AS cons
FROM
orderinfo
WHERE
isPaid = "已支付"
GROUP BY
userid,
date_format( paidTime, '%Y-%m-01' )
) b ON a.userid = b.userid
AND date_sub( b.month_dt, INTERVAL 1 MONTH ) = a.month_dt
GROUP BY
a.month_dt;
3、统计男女用户消费频次是否有差异
a、统计每个用户的消费次数,注意要带性别
SELECT
a.userid,
sex,
count( 1 ) AS cons
FROM
orderinfo a
INNER JOIN ( SELECT * FROM userinfo WHERE sex <> '' ) b ON a.userid = b.userid
GROUP BY
a.userid,
sex;
b、对性别做一个消费次数平均计算
SELECT
sex,
avg( cons ) AS avg_cons
FROM
(
SELECT
a.userid,
sex,
count( 1 ) AS cons
FROM
orderinfo a
INNER JOIN ( SELECT * FROM userinfo WHERE sex <> '' ) b ON a.userid = b.userid
GROUP BY
a.userid,
sex
) a
GROUP BY
sex;
4、统计多次消费的用户,第一次和最后一次消费间隔是多少天
a、取出多次消费的用户
SELECT
userid
FROM
orderinfo
WHERE
isPaid = "已支付"
GROUP BY
userid
HAVING
count( 1 ) > 1;
b、取出第一次和最后一次的时间
SELECT
userid,
min( paidTime ),
max( paidTime ),
datediff( max( paidTime ), min( paidTime ) )
FROM
orderinfo
WHERE
isPaid = "已支付"
GROUP BY
userid
HAVING
count( 1 ) > 1;
5、统计不同年龄段,用户的消费金额是否有差异
a、计算每个用户的年龄,并对年龄进行分层:0-10:1,11-20:2,21-30:3
SELECT
userid,
birth,
now( ),
timestampdiff( YEAR, birth, now( ) ) AS age
FROM
userinfo
WHERE
birth > '1900-00-00';
SELECT
userid,
birth,
now( ),
ceil( timestampdiff( YEAR, birth, now( ) ) / 10 ) AS age
FROM
userinfo
WHERE
birth > '1901-00-00';
b、关联订单信息,获取不同年龄段的一个消费频次和消费金额
SELECT
a.userid,
age,
count( 1 ) AS cons,
sum( price ) AS prices
FROM
orderinfo a
INNER JOIN (
SELECT
userid,
birth,
now( ),
ceil( timestampdiff( YEAR, birth, now( ) ) / 10 ) AS age
FROM
userinfo
WHERE
birth > '1901-00-00'
) b ON a.userid = b.userid
GROUP BY
a.userid,
age;
c、再对年龄分层进行聚合,得到不同年龄层的消费情况
SELECT
age,
avg( cons ),
avg( prices )
FROM
(
SELECT
a.userid,
age,
count( 1 ) AS cons,
sum( price ) AS prices
FROM
orderinfo a
INNER JOIN (
SELECT
userid,
birth,
now( ),
ceil( timestampdiff( YEAR, birth, now( ) ) / 10 ) AS age
FROM
userinfo
WHERE
birth > '1901-00-00'
) b ON a.userid = b.userid
GROUP BY
a.userid,
age
) a
GROUP BY
age;
6、统计消费的二八法则,消费的top20%用户,贡献了多少消费额
a、统计每个用户的消费金额,并进行一个降序排序
SELECT
userid,
sum( price ) AS total_price
FROM
orderinfo a
WHERE
isPaid = "已支付"
GROUP BY
userid;
b、统计一下一共有多少用户,以及总消费金额是多少
SELECT
count( 1 ) AS cons,
sum( total_price ) AS all_price
FROM
( SELECT userid, sum( price ) AS total_price FROM orderinfo a WHERE isPaid = "已支付" GROUP BY userid ) a;
c、取出前20%的用户进行金额统计
SELECT
count( 1 ) AS cons,
sum( total_price ) AS all_price
FROM
( SELECT userid, sum( price ) AS total_price FROM orderinfo a WHERE isPaid = "已支付" GROUP BY userid ORDER BY total_price DESC LIMIT 17000 ) b;