【问题】
I have two tables 1)users(id,registerdate) 2)user_answer(userid,answer,updated_date)
I want the count of zero usage per day. How many users are registering but not answering per day. Results will be like this:
Date registedCount notAnsweredCount 15-09-02 20 10 15-09-01 20 10 15-08-31 12 4
Data will be like for user table((1,‘15-09-01’),(2,‘15-09-01’),(3,‘15-09-01’)) for user answer table ((1,0,15-09-01)).. Here you can see three users are registered on the day of sep 01, 2015 but the only one user has answered one question. So, result will be (Date=>15-09-01, registedCount => 3, notAnsweredCount => 2)
有人给出解答,楼主说比较像,但没进一步反馈
SELECT date_range.aDay, COUNT(DISTINCT users.id) AS registedCount, SUM(IF(users.id IS NOT NULL AND user_answer.userid IS NULL, 1, 0)) AS notAnsweredCount FROM ( SELECT DATE_ADD('2015-09-01', INTERVAL units.aCnt + tens.aCnt * 10 DAY) AS aDay FROM ( SELECT 0 AS aCnt UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9 ) units CROSS JOIN ( SELECT 0 AS aCnt UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9 ) tens ) date_range LEFT OUTER JOIN users ON date_range.aDay = users.registerdate LEFT OUTER JOIN user_answer ON users.id = user_answer.userid GROUP BY date_range.aDay
【回答】
有两个难点要解决:库表按照指定的日期序列分组,而不是库表中的字段;每日注册的 id 如何和每日解答的 userid 进行差集运算。用 SQL 实现会比较难理解,可以用 SPL 来帮助 SQL 实现,代码分步写出很直观:
参数设置:
SPL 脚本
A | |
1 | $select id,registerdatefrom userswhere registerdate>=? And registerdate<=?; argBegin,argEnd |
2 | $select userid,updated_datefrom user_answerwhere updated_date>=? And updated_date<=?; argBegin,argEnd |
3 | =periods(argBegin,argEnd) |
4 | =A1.align@a(A3,registerdate).(~.(id)) |
5 | =A2.align@a(A3,updated_date).(~.(userid)) |
6 | =A3.new(~:Date,A4(#).len():registedCount,(A4(#)\A5(#)).len():notAnsweredCount) |
A1、A2:通过 SQL 取表中数据
A3:使用函数 periods,根据参数生成时间序列
A4、A5:使用函数 align,将集合按指定序列对齐
A6:生成新序表,“\”表示差集