项目背景:
某快消品电商公司自行研发了一款APP用于吸引用户并销售自己的产品,但是由于公司刚刚成立,尚未搭建完善的指标体系,导致每日得到的数据多且杂乱,很难抓到业务重点,随之导致问题归因困难,难以做出有效的数据分析报告。
项目目的:
结合公司情况搭建指标体系,根据指标体系列出公司各业务部门所需要的统计表,编写相应的SQL代码对所需报告进行输出。
项目正文:
指标体系搭建:
确定公司核心指标
首先,确定电商公司的成单大致流程图:
并确定大致的部门业务流:
公司的业务流以促使成单为核心,通过渠道投放,站内运营,产品研发,售后服务等多部门业务协同合作推动销售。因此,我们可以将公司的商业目标定为:持续的销售收入。
而在用户层面上,用户感受到的价值来源于公司的产品所带来的体验。因此,结合公司的商业目标和用户价值,可以列出一下四个备选的核心指标,并且其相互关系也可以列出:
针对这四个核心指标,需要通过多个标准进行评估择优选择,评估方式如下:
综合考量,我们选择将成单量作为公司的核心指标
核心指标拆解
在确定好核心指标后,需要对核心指标拆解,使其能够在平时的工作中可量化,易归因。
拆解模型选择:
考虑到公司刚刚开始进行产品运作,本身没有很强的品牌效应和私域流量,因此首先排除AIPL模型。其次,虽然人货场和AARRR模型都能分析B2C行业,但是AARRR更适合将指标拆解到各个部门,因此这里选用AARRR模型作为核心指标拆解模型。
按链路式方法,我们可以将成单量拆解如下:
针对拆解后的每个关键指标,可以汇总出各部门所需要的报表:
对不同的表格,整理出需要统计的主要指标:
以上,我们通过对业务流的梳理和核心指标的拆解确定了所需要统计的报表,接下来就是代码实现。
通过MySQL提取统计报表
源数据:
原始数据记录了用户从安装APP开始,包括APP启动,页面点击,页面浏览,创建订单,支付订单,APP退出等的所有埋点信息。数据较为杂乱,需要利用SQL进行清洗提取方便业务部门查看。
新用户获取表
主要数据源:安装用户
DROP VIEW IF EXISTS view_install;
CREATE VIEW view_install AS (
SELECT *
FROM event_ok
WHERE event = 'AppInstall'
AND date IN (
'2020-09-01',
DATE_SUB( '2020-09-01', INTERVAL 1 DAY ),
DATE_SUB( '2020-09-01', INTERVAL 7 DAY ),
DATE_SUB( '2020-09-01', INTERVAL 14 DAY ))
);
补充数据源:开启APP用户
DROP VIEW IF EXISTS view_start;
CREATE VIEW view_start AS (
SELECT *
FROM event_ok
WHERE event = '$AppStart'
AND date >= DATE_SUB( '2020-09-01', INTERVAL 14 DAY )
AND date <= '2020-09-01'
);
将主要数据源和补充数据源连接,条件是用户在安装APP后,1天或7天或14天有启动记录:
DROP VIEW IF EXISTS view_install_start;
CREATE VIEW view_install_start AS (
SELECT DISTINCT
view_install.user_id AS user_id,
view_install.downloadchannel AS downloadchannel,
view_install.date AS install_date,
view_start.Date AS action_date
FROM
view_install LEFT JOIN view_start
ON
view_install.user_id = view_start.user_id
AND (
DATE_ADD( view_install.date, INTERVAL 1 DAY ) = view_start.Date
OR DATE_ADD( view_install.date, INTERVAL 7 DAY ) = view_start.Date
OR DATE_ADD( view_install.date, INTERVAL 14 DAY ) = view_start.Date
));
以日期和下载渠道为统计维度,计算1,7,14天留存用户数及留存用户率
DROP VIEW IF EXISTS view_retention;
CREATE VIEW view_retention AS
(
SELECT install_date,ifnull( downloadChannel, 'iOS' ),count(DISTINCT user_id) AS u_cnt,
COUNT(DISTINCT IF (DATEDIFF(action_date,install_date)=1,user_id,NULL)) AS retention_cnt_1,
IF (count(DISTINCT user_id)=0,0,COUNT(DISTINCT IF (DATEDIFF(action_date,install_date)=1,user_id,NULL))/count(DISTINCT user_id)) AS retention_pp_1,
COUNT(DISTINCT IF (DATEDIFF(action_date,install_date)=7,user_id,NULL)) AS retention_cnt_7,
IF (count(DISTINCT user_id)=0,0,COUNT(DISTINCT IF (DATEDIFF(action_date,install_date)=7,user_id,NULL))/count(DISTINCT user_id)) AS retention_pp_7,
COUNT(DISTINCT IF (DATEDIFF(action_date,install_date)=14,user_id,NULL)) AS retention_cnt_14,
IF (count(DISTINCT user_id)=0,0,COUNT(DISTINCT IF (DATEDIFF(action_date,install_date)=14,user_id,NULL))/count(DISTINCT user_id)) AS retention_pp_14
FROM view_install_start
GROUP BY install_date,downloadchannel
);
最后通过窗口函数汇总所有渠道的用户留存数据:
DROP TABLE IF EXISTS tb_new_user;
CREATE TABLE tb_new_user AS
(
SELECT
*,sum(u_cnt) over (PARTITION BY install_date) AS u_cnt_total,
sum(retention_cnt_1) over (PARTITION BY install_date) AS retention_cnt_total_1,
IF (sum(u_cnt) over (PARTITION BY install_date)=0,0,sum(retention_cnt_1) over (PARTITION BY install_date)/sum(u_cnt) over (PARTITION BY install_date)) AS rention_pp_modi,
sum(retention_cnt_7) over (PARTITION BY install_date) AS retention_cnt_total_7,
IF (sum(u_cnt) over (PARTITION BY install_date)=0,0,sum(retention_cnt_7) over (PARTITION BY install_date)/sum(u_cnt) over (PARTITION BY install_date)) AS rention_pp_modi_7,
sum(retention_cnt_14) over (PARTITION BY install_date) AS retention_cnt_total_14,
IF (sum(u_cnt) over (PARTITION BY install_date)=0,0,sum(retention_cnt_14) over (PARTITION BY install_date)/sum(u_cnt) over (PARTITION BY install_date)) AS rention_pp_modi_14
FROM view_retention
);
用户活跃表
先抽取当天,1,7,14天前打开过app的用户数据作为主要数据源
Drop view if exists view_old_u;
CREATE VIEW view_old_u AS (
SELECT user_id, $is_first_day, Date
FROM event_ok
WHERE
`event` = '$AppStart'
AND date IN (
'2020-09-01', DATE_SUB( '2020-09-01', INTERVAL 1 DAY ), DATE_SUB( '2020-09-01', INTERVAL 7 DAY ), DATE_SUB( '2020-09-01', INTERVAL 14 DAY ))
GROUP BY user_id, $is_first_day, Date
);
再选择近14天内有过打开APP行为的用户作为补充数据源
DROP VIEW IF EXISTS view_action;
CREATE VIEW view_action AS (
SELECT * FROM event_ok
WHERE `event` = '$AppStart'
AND date >= DATE_SUB( '2020-09-01', INTERVAL 14 DAY )
AND date <= '2020-09-01'
);
将主要数据源和补充数据源连接,条件是用户在启动APP后,1天或7天或14天有启动记录:
DROP VIEW IF EXISTS view_old_u_action;
CREATE VIEW view_old_u_action AS (
SELECT DISTINCT
view_old_u.user_id AS user_id,
view_old_u.date AS start_date,
view_old_u.$is_first_day AS first_day,
view_action.Date AS action_date
FROM
view_old_u
LEFT JOIN view_action ON view_old_u.user_id = view_action.user_id
AND date_add( view_old_u.Date, INTERVAL 1 DAY ) = view_action.Date
OR date_add( view_old_u.Date, INTERVAL 7 DAY ) = view_action.Date
OR date_add( view_old_u.Date, INTERVAL 14 DAY ) = view_action.Date
);
最后得到活跃用户的留存情况表:
DROP TABLE IF EXISTS tb_old_action_retention;
CREATE TABLE tb_old_action_retention as (
SELECT start_date, COUNT(DISTINCT user_id) AS dau,
COUNT(DISTINCT IF (first_day=1,user_id,NULL)) AS first_retention,
IF (count(DISTINCT user_id)=0,0,COUNT(DISTINCT IF (first_day=1,user_id,NULL))/count(DISTINCT user_id)) AS first_retention_pp,
COUNT(DISTINCT IF (DATEDIFF(action_date,start_date)=1,user_id,NULL)) AS action_retention_1,
IF (count(DISTINCT user_id)=0,0,COUNT(DISTINCT IF (DATEDIFF(action_date,start_date)=1,user_id,NULL))/count(DISTINCT user_id)) AS action_retention_pp_1,
COUNT(DISTINCT IF (DATEDIFF(action_date,start_date)=7,user_id,NULL)) AS action_retention_7,
IF (count(DISTINCT user_id)=0,0,COUNT(DISTINCT IF (DATEDIFF(action_date,start_date)=7,user_id,NULL))/count(DISTINCT user_id)) AS action_retention_pp_7,
COUNT(DISTINCT IF (DATEDIFF(action_date,start_date)=14,user_id,NULL)) AS action_retention_14,
IF (count(DISTINCT user_id)=0,0,COUNT(DISTINCT IF (DATEDIFF(action_date,start_date)=14,user_id,NULL))/count(DISTINCT user_id)) AS action_retention_pp_14
FROM view_old_u_action
GROUP BY start_date
);
活跃用户在线时长表
CREATE TABLE tb_action_duration AS
SELECT
start_date, count( DISTINCT tb_start.user_id ) AS dau, sum( duration ) AS total_duration,
IF(count( DISTINCT tb_start.user_id )= 0,0,sum( duration )/ count( DISTINCT tb_start.user_id )) AS duration_mean
FROM(
SELECT user_id, Date AS start_date FROM event_ok
WHERE `event` = '$AppStart' AND date = '2020-09-01' GROUP BY user_id, Date ) AS tb_start
LEFT JOIN (
SELECT user_id, date AS end_date, sum( $event_duration ) AS duration FROM event_ok
WHERE `event` = '$AppEnd' AND date = '2020-09-01' GROUP BY user_id, date
) AS tb_end ON tb_start.user_id = tb_end.user_id
AND start_date = end_date
GROUP BY start_date;
收入统计表:
DROP TABLE IF EXISTS tb_total_amount;
CREATE TABLE tb_total_amount AS (
SELECT date AS start_date, count(DISTINCT user_id) AS pay_user, sum(actual_amount) AS total_amount,
IF (count(DISTINCT user_id)=0,0,sum(actual_amount)/count(DISTINCT user_id)) AS amount_mean
FROM event_ok
WHERE EVENT='PayOrderSucceed' AND date='2020-09-01'
);
渠道付费转化率表
接着,我们需要关注不同渠道的收入组成部分,以确定在不同渠道投放的广告力度。
首先以不同渠道的用户作为主要数据源
DROP VIEW IF EXISTS view_u_channel;
CREATE VIEW view_u_channel AS (
SELECT
user_id, ifnull( DownloadChannel, 'iOS' ) AS channel, date AS Install_date
FROM event_ok
WHERE
EVENT = 'AppInstall'
AND (
date = DATE_SUB( '2020-09-01', INTERVAL 1 DAY )
OR date = DATE_SUB( '2020-09-01', INTERVAL 7 DAY )
OR date = DATE_SUB( '2020-09-01', INTERVAL 14 DAY ))
GROUP BY 1,2,3
)
接着以14天内支付成功的用户作为补充数据源
DROP VIEW IF EXISTS view_u_pay;
CREATE VIEW view_u_pay AS (
SELECT
user_id, date AS pay_date, sum( actual_amount ) AS actual_amount_total,
CASE WHEN actual_amount > 200 THEN 1 ELSE - 1 END AS income_type
FROM event_ok
WHERE
EVENT = 'PayOrderSucceed'
AND date >= DATE_SUB( '2020-09-01', INTERVAL 14 DAY )
AND date <= '2020-09-01'
GROUP BY 1,4,2
)
将两张表连接:
DROP VIEW IF EXISTS view_u_channel_pay;
CREATE VIEW view_u_channel_pay AS (
SELECT
view_u_channel.channel AS channel,
count( DISTINCT view_u_channel.user_id ) AS pay_user,
sum( view_u_pay.actual_amount_total ) AS total_amount,
IF(count( DISTINCT view_u_channel.user_id ) = 0,0,sum( view_u_pay.actual_amount_total )/ count( DISTINCT view_u_channel.user_id )) AS amount_mean,
view_u_channel.Install_date AS install_date,
min( view_u_pay.pay_date ) AS pay_start_date,
max( view_u_pay.pay_date ) AS pay_end_date
FROM
view_u_channel LEFT JOIN view_u_pay
ON view_u_channel.user_id = view_u_pay.user_id
WHERE
view_u_pay.actual_amount_total > 0
GROUP BY
channel, install_date
ORDER BY
install_date, pay_user DESC
)
最后可以得到1,7,14天不同渠道的付费人数以及付费转化率
DROP TABLE IF EXISTS tb_channel_pay_rate;
CREATE TABLE tb_channel_pay_rate AS (
SELECT
appinstall.channel AS channel, num, pay_user,
appinstall.install_date AS install_date, pay_start_date, pay_end_date,
IF ( num = 0, 0, pay_user / num ) AS pay_rate,total_amount,amount_mean
FROM(
SELECT
date AS install_date,
ifnull( DownloadChannel, 'IOS' ) AS channel,
count( DISTINCT user_id ) num
FROM
event_ok
WHERE
EVENT = 'AppInstall'
AND (
date = DATE_SUB( '2020-09-01', INTERVAL 1 DAY )
OR date = DATE_SUB( '2020-09-01', INTERVAL 7 DAY )
OR date = DATE_SUB( '2020-09-01', INTERVAL 14 DAY ))
GROUP BY
channel, install_date ) appinstall
LEFT JOIN view_u_channel_pay
ON appinstall.channel = view_u_channel_pay.channel
AND appinstall.install_date = view_u_channel_pay.install_date
)
日报
最后,将上面表链接汇总为日报
drop table if exists tb_daily_report;
create table tb_daily_report as
(
select
case when tb_old_action_retention.start_date = '2020-09-01' then '当日数据'
when date_add(tb_old_action_retention.start_date,INTERVAL 1 day) = '2020-09-01' then '1日数据'
when date_add(tb_old_action_retention.start_date,INTERVAL 7 day) = '2020-09-01' then '7日数据'
when date_add(tb_old_action_retention.start_date,INTERVAL 14 day) = '2020-09-01' then '14日数据'
END AS '数据源',
tb_old_action_retention.start_date as '对应日期',
u_cnt_total as '新增用户数',
retention_cnt_total_1 as '新用户次日留存人数',
concat( round(100 *if(u_cnt_total =0,0,retention_cnt_total_1/u_cnt_total),2),'%') as "新用户次日留存率",
retention_cnt_total_7 as "新用户7日留存人数",
concat( round(100 *if(u_cnt_total=0,0,retention_cnt_total_7/u_cnt_total),2),'%') as "新用户7日留存率",
retention_cnt_total_14 as "新用户14日留存人数",
concat( round(100 *if(u_cnt_total=0,0,retention_cnt_total_14/u_cnt_total),2),'%') as "新用户14日留存率",
-- tb_old_action_retention 内
tb_old_action_retention.dau as "DAU",
first_retention as "首次活跃人数",
concat(round(100 * if(tb_old_action_retention.dau =0,0,first_retention/tb_old_action_retention.dau),2),'%') as "首次活跃用户占比",
action_retention_1 as "活跃用户次日留存人数",
concat( round(100 *if(tb_old_action_retention.dau=0,0,action_retention_1/tb_old_action_retention.dau),2),'%') as "活跃用户次日留存率",
action_retention_7 as "活跃用户7日留存人数",
concat( round(100 *if(tb_old_action_retention.dau=0,0,action_retention_7/tb_old_action_retention.dau),2),'%') as "活跃用户7日留存率",
action_retention_14 as "活跃用户14日留存人数",
concat( round(100 *if(tb_old_action_retention.dau=0,0,action_retention_14/tb_old_action_retention.dau),2),'%') as "活跃用户14日留存率",
-- tb_action_duration
if(tb_action_duration.dau=0,0,total_duration/tb_action_duration.dau/60) as "平均单日使用时长(分)",
-- tb_total_amount
pay_user as "总付费人数",
total_amount as "总付费金额",
if(pay_user=0,0,total_amount/pay_user) as "客单价"
from tb_old_action_retention
left join
(select DISTINCT install_date,
u_cnt_total, -- 安装用户总数
retention_cnt_total_1, -- 次日留存用户总数
retention_cnt_total_7, -- 7日留存用户总书
retention_cnt_total_14 -- 14日留存用户总书
from tb_new_user) as new_user on tb_old_action_retention.start_date = new_user.install_date
left join tb_action_duration on tb_action_duration.start_date = new_user.install_date
left join tb_total_amount on tb_total_amount.start_date = new_user.install_date);
在具体使用时,可以编写存储过程来更改日期,或者通过Celery做到每日定时更新。
小结:
以上部分完成了从指标体系的搭建,到通过SQL进行数据提取和清洗,最终实现统计报表需求的全过程。这将为企业提供统一的,可量化的,易归因问题的数据源。当公司出现经验问题,或者想指定未来决策时,可通过各个报表相互连接并进行可视化的方法完成。