一个产品的用户留存关系到该产品是否健康的发展
实现效果:
表详情:
注册表:d_user_register201704 以月份分表 (uid 唯一)
CREATE TABLE `d_user_register201704` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`uid` bigint(20) NOT NULL DEFAULT '0' COMMENT '用户id',
`inviteUid` bigint(20) NOT NULL,
`name` varchar(32) NOT NULL DEFAULT '',
`account` varchar(32) NOT NULL DEFAULT '' COMMENT '用户名',
`systemType` smallint(3) NOT NULL DEFAULT '0' COMMENT '用户注册系统标识(ios-0 android-1...)',
`authorizationType` smallint(3) NOT NULL DEFAULT '0' COMMENT '用户注册第三方授权标识',
`time` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00' ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
UNIQUE KEY `time` (`time`) USING BTREE,
KEY `uid` (`uid`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
登录表:d_user_register201704 同样以月份分表(0 代表登录 用户每登录一次就会插入一条数据 很大的数据冗余 数据大约300W条左右)
实现思路:
首先将要查时间区间的每一天所对应的用户留存数据计算出来(考虑到分表我们使用union 进行联查)
实现sql
select count(uid) as onl,date_format(time,'%H') as hour from d_user_login201704 where id in
(select * from ((select min(id) from d_user_login201704 where type=0 and time>= '2017-04-09 00:00:00'
and time<='2017-04-09 23:59:59' group by(uid))
as tmptable)) group by hour;
笔者考虑了好长时间也没有想到除了循环查询 能更好的查询办法
下面依次循环 要查询的日期 考虑到分表 我们使用union 日期唯一所有使用union 想要去重就使用union all
select count(DISTINCT(uid)) as liu,date_format(time,'%Y%m%d') as day from d_user_login201704 where uid in (select uid from `d_user_register201704`
where time BETWEEN '2017-04-08 00:00:00' and '2017-04-08 23:59:59') and time BETWEEN '2017-04-09' and '2017-05-09' group by day union
select count(DISTINCT(uid)) as liu,date_format(time,'%Y%m%d') as day from d_user_login201705 where uid in (select uid from `d_user_register201704`
where time BETWEEN '2017-04-08 00:00:00' and '2017-04-08 23:59:59') and time BETWEEN '2017-04-09' and '2017-05-09' group by day
下面贴出代码:
php 控制器
/**
* 用户留存
* @param Request $request
* @return $this
*/
public function userKeepView(Request $request){
$data['start_time'] = $request->input('start_time', date('Y-m-d',strtotime("-15 day") ));
$data['end_time'] = $request->input('end_time', date('Y-m-d'));
$start_month = date('Ym',strtotime($data['start_time']));
$end_month = date('Ym',strtotime($data['end_time']));
$table_prefix = 'd_user_register';
//获得查询表集合
$select_set = [];
for($i=$start_month;$i<=$end_month;$i++){
$select_set[] = intval($i);
}
//获得已有表集合
$table_prefix_length = strlen($table_prefix);
$tables = DB::connection('log')->select("show tables like 'd_user_register%'");
$register_tables_set = [];
foreach($tables as $key=>$value){
$value = (array)$value;
$table = $value['Tables_in_dingdlog (d_user_register%)'];
$res = (int)substr($table,$table_prefix_length);
array_push($register_tables_set,$res);
}
$register_tables_set = array_filter($register_tables_set);
sort($register_tables_set);
//获取最终查询表交集
$register_tables = array_values(array_intersect($register_tables_set,$select_set));
//循环查询
$user_keep_data = [];
$first_table = current($register_tables);
$end_table = end($register_tables);
//锁定存在表时间区间
$first = date('Y-m-d 00:00:00',strtotime($first_table.'01'));
$end = date('Y-m-d 23:59:59',strtotime($end_table."01 +1month -1day"));
if(strtotime($data['start_time'])<strtotime($first)){
$data['start_time'] = $first;
}
if(strtotime($data['end_time'])>strtotime($end)){
$data['end_time'] = $end;
}
//循环时间区间以天查询
for($i=strtotime($data['start_time']);$i<=strtotime($data['end_time']);$i+=24*3600){
//当前时间
$self_time = date('Y-m-d',$i);
$self_next_time = date('Y-m-d',$i+24*3600);
$self_month = date('Ym',$i);
//30天查询
$self_next_month = date('Y-m-d',strtotime($self_time.'+ 31 day'));
$select_next_table = date('Ym',strtotime('+1month -1day'));
//先将本天的用户留存率算出来
$uid_set = DB::connection('log')->table($table_prefix.$self_month)->where('time','>=',$self_time.' 00:00:00')
->where('time','<=',$self_time.' 23:59:59')->lists('uid');
$self_total = count($uid_set);
$date[] = date('Y.m.d',$i);
$total[] = $self_total;
if(in_array($select_next_table,$register_tables_set)){
//使用union连接上查询
$result = DB::connection('log')->table('d_user_login'.$select_next_table)
->select(DB::raw("count(DISTINCT(uid)) as liu,date_format(time,'%Y%m%d') as day"))
->whereBetween('time',[$self_next_time,$self_next_month])->where('type','=','0')->whereIn('uid',$uid_set)->groupby('day');
$user_keep_data [] = DB::connection('log')->table('d_user_login'.$self_month)
->select(DB::raw("count(DISTINCT(uid)) as liu,date_format(time,'%Y%m%d') as day"))
->whereBetween('time',[$self_next_time,$self_next_month])->where('type','=','0')->whereIn('uid',$uid_set)->groupby('day')
->union($result)->get();
}else{
$user_keep_data [] = DB::connection('log')->table('d_user_login'.$self_month)
->select(DB::raw("count(DISTINCT(uid)) as liu,date_format(time,'%Y%m%d') as day"))
->whereBetween('time',[$self_next_time,$self_next_month])->where('type','=','0')->whereIn('uid',$uid_set)->groupby('day')->get();
}
}
rsort($date);
$total = array_reverse($total,false);
$user_keep_data = array_reverse($user_keep_data,false);
return view('chart/userKeepView')->with('user_keep_data', $user_keep_data)
->with('total', $total)
->with('date',$date)
->with('data',$data);
}
html
<!DOCTYPE html>
<html lang="zh-cn">
@include('layouts.head')
<body>
<div class="container-fluid">
<div class="panel panel-default">
<div class="panel-heading">
用户留存
</div>
<div class="panel-body">
<form action="{{action('Home\ChartController@userKeepView')}}" method="post" role="form">
<div class="row">
<div class="col-md-6 form-inline">
<div class="form-group form-inline">
<label for="">开始时间</label>
<input type="text" name="start_time" class="form-control form_date" value="{{$data['start_time']}}" readonly placeholder="请输入id"/>
</div>
<div class="form-group form-inline">
<label for="">结束时间</label>
<input type="text" name="end_time" class="form-control form_date" value="{{$data['end_time']}}" readonly placeholder="请输入id"/>
</div>
</div>
@if(session('msg'))
<p style="color:red">{{session('msg')}}</p>
@endif
<div class="col-md-3">
<label for=""> </label>
<div class="form-group">
<input type="submit" class="btn btn-sm btn-primary" value="查询"/>
</div>
</div>
</div>
</form>
</div>
</div>
<table class="table table-responsive table-bordered">
<thead>
<th>日期</th>
<th>当天注册人数</th>
<th>次日留存</th>
<th>2日留存</th>
<th>3日留存</th>
<th>4日留存</th>
<th>5日留存</th>
<th>6日留存</th>
<th>7日留存</th>
<th>14日留存</th>
<th>30日留存</th>
</thead>
<tbody>
@if($user_keep_data)
@foreach($user_keep_data as $key=>$value)
<tr>
<td>{{$date[$key]}}</td>
<td>{{$total[$key]}}</td>
@foreach($value as $k=>$v)
@if($k<7||$k==13||$k==29)
<td>{{number_format($v->liu/$total[$key]*100,2,'.','')}}%</td>
@endif
@endforeach
</tr>
@endforeach
@else
<tr>
<td colspan="4">暂无数据</td>
</tr>
@endif
</tbody>
</table>
</div>
</body>
<script type="text/javascript">
$('.form_date').datetimepicker({
language: 'zh-CN', /*加载日历语言包,可自定义*/
weekStart: 1, /*星期*/
todayBtn: 0, /*当天*/
autoclose: true,//选中之后自动隐藏日期选择框
todayHighlight: 1, /*今天高亮显示*/
startView: 2, /*4年3月2日1小时*/
minView: 2, /*0分1小时2天*/
format: 'yyyy-mm-dd',
forceParse: 0,
showMeridian:true
});
</script>
</html>
explain 测试sql :
explain select count(DISTINCT(uid)) as liu,date_format(time,'%Y%m%d') as day from d_user_login201704
where uid in (select uid from `d_user_register201704` where time BETWEEN '2017-04-08 00:00:00' and '2017-04-08 23:59:59')
and time BETWEEN '2017-04-09' and '2017-05-09' group by day union select count(DISTINCT(uid)) as liu,date_format(time,'%Y%m%d') as day from
d_user_login201705 where uid in (select uid from `d_user_register201704` where time BETWEEN '2017-04-08 00:00:00' and '2017-04-08 23:59:59')
and time BETWEEN '2017-04-09' and '2017-05-09' group by day
由于 login5 表没有建立索引 所以没有使用到 (笔者在 register 表 uid 以及 login 表的 uid 和 time 上建立了索引)
查询197ms 循环得出结果需要2S 左右 需要优化的地方还有很多!!