连续交易问题
在电商、物流和银行 可能经常会遇到这样的需求: 1.统计用户连续交易的总额、 2.连续登陆天数、 3.连续登陆开始和结束时间、 4.间隔天数等
create table deal_tb( id string COMMENT '用户ID' ,datestr string COMMENT '交易日期' ,amount string COMMENT '交易额' )row format delimited fields terminated by ',';
id datestr amount 1,2019-02-08,6214.23 1,2019-02-08,6247.32 1,2019-02-09,85.63 1,2019-02-09,967.36 1,2019-02-10,85.69 1,2019-02-12,769.85 1,2019-02-13,943.86 1,2019-02-14,538.42 1,2019-02-15,369.76 1,2019-02-16,369.76 1,2019-02-18,795.15 1,2019-02-19,715.65 1,2019-02-21,537.71 2,2019-02-08,6214.23 2,2019-02-08,6247.32 2,2019-02-09,85.63 2,2019-02-09,967.36 2,2019-02-10,85.69 2,2019-02-12,769.85 2,2019-02-13,943.86 2,2019-02-14,943.18 2,2019-02-15,369.76 2,2019-02-18,795.15 2,2019-02-19,715.65 2,2019-02-21,537.71 3,2019-02-08,6214.23 3,2019-02-08,6247.32 3,2019-02-09,85.63 3,2019-02-09,967.36 3,2019-02-10,85.69 3,2019-02-12,769.85 3,2019-02-13,943.86 3,2019-02-14,276.81 3,2019-02-15,369.76 3,2019-02-16,369.76 3,2019-02-18,795.15 3,2019-02-19,715.65 3,2019-02-21,537.71
其中一条数据表示一次交易
每天交易总额
-- 由于没有要求关联明细数据,数据最小维度为天 SELECT id ,datestr ,sum(amount) FROM deal_tb GROUP BY id,datestr
1 2019-02-08 12461.55 1 2019-02-09 1052.99 1 2019-02-10 85.69 1 2019-02-12 769.85 1 2019-02-13 943.86 1 2019-02-14 538.42 1 2019-02-15 369.76 1 2019-02-16 369.76 1 2019-02-18 795.15 1 2019-02-19 715.65 1 2019-02-21 537.71 2 2019-02-08 12461.55 2 2019-02-09 1052.99 2 2019-02-10 85.69 2 2019-02-12 769.85 2 2019-02-13 943.86 2 2019-02-14 943.18 2 2019-02-15 369.76 2 2019-02-18 795.15 2 2019-02-19 715.65 2 2019-02-21 537.71 3 2019-02-08 12461.55 3 2019-02-09 1052.99 3 2019-02-10 85.69 3 2019-02-12 769.85 3 2019-02-13 943.86 3 2019-02-14 276.81 3 2019-02-15 369.76 3 2019-02-16 369.76 3 2019-02-18 795.15 3 2019-02-19 715.65 3 2019-02-21 537.71
1.统计用户连续交易的总额
分析:
①由于需要处理连续交易,而连续交易定义为:
连续一天或多天都存在有交易
②日期的定义
由一个开始时间,不断累加而来
1 2019-02-08 12461.55 -7 02-01 -1 02-07 1 2019-02-09 1052.99 -8 02-01 -2 02-07 1 2019-02-10 85.69 -9 02-01 -3 02-07 1 2019-02-12 769.85 -11 02-01 -4 02-08 1 2019-02-13 943.86 -5 02-08 1 2019-02-14 538.42 1 2019-02-15 369.76 1 2019-02-16 369.76
③ 如果日期为连续的,那么减去一个连续的数值,那么得到的应该是一个相同的开始时间,如果减去连续数值后开始时间不一致,那么说明该日期不连续
CREATE TABLE deal_start_day (
id STRING
,datestr STRING
,sum_amount int
,start_day STRING
)
WITH deal_tb_sum AS (
SELECT
id
,datestr
,sum(amount) as sum_amount
FROM deal_tb
GROUP BY id,datestr
)
-- WITH AS 语句和CREATE TABLE AS 不能一起使用
INSERT INTO TABLE deal_start_day
SELECT
id
,datestr
,sum_amount
,date_sub(datestr,pm) as start_day
FROM (
SELECT
id
,datestr
,sum_amount
,ROW_NUMBER() OVER(PARTITION BY id ORDER BY datestr) AS pm
FROM deal_tb_sum
)T1
+-----+-------------+---------------+-------------+--+ | id | datestr | total_amount | statr_days | +-----+-------------+---------------+-------------+--+ | 1 | 2019-02-08 | 12461.55 | 2019-02-07 | | 1 | 2019-02-09 | 1052.99 | 2019-02-07 | | 1 | 2019-02-10 | 85.69 | 2019-02-07 | | 1 | 2019-02-12 | 769.85 | 2019-02-08 | | 1 | 2019-02-13 | 943.86 | 2019-02-08 | | 1 | 2019-02-14 | 538.42 | 2019-02-08 | | 1 | 2019-02-15 | 369.76 | 2019-02-08 | | 1 | 2019-02-16 | 369.76 | 2019-02-08 | | 1 | 2019-02-18 | 795.15 | 2019-02-09 | | 1 | 2019-02-19 | 715.65 | 2019-02-09 | | 1 | 2019-02-21 | 537.71 | 2019-02-10 | | 2 | 2019-02-08 | 12461.55 | 2019-02-07 | | 2 | 2019-02-09 | 1052.99 | 2019-02-07 | | 2 | 2019-02-10 | 85.69 | 2019-02-07 | | 2 | 2019-02-12 | 769.85 | 2019-02-08 | | 2 | 2019-02-13 | 943.86 | 2019-02-08 | | 2 | 2019-02-14 | 943.18 | 2019-02-08 | | 2 | 2019-02-15 | 369.76 | 2019-02-08 | | 2 | 2019-02-18 | 795.15 | 2019-02-10 | | 2 | 2019-02-19 | 715.65 | 2019-02-10 | | 2 | 2019-02-21 | 537.71 | 2019-02-11 | | 3 | 2019-02-08 | 12461.55 | 2019-02-07 | | 3 | 2019-02-09 | 1052.99 | 2019-02-07 | | 3 | 2019-02-10 | 85.69 | 2019-02-07 | | 3 | 2019-02-12 | 769.85 | 2019-02-08 | | 3 | 2019-02-13 | 943.86 | 2019-02-08 | | 3 | 2019-02-14 | 276.81 | 2019-02-08 | | 3 | 2019-02-15 | 369.76 | 2019-02-08 | | 3 | 2019-02-16 | 369.76 | 2019-02-08 | | 3 | 2019-02-18 | 795.15 | 2019-02-09 | | 3 | 2019-02-19 | 715.65 | 2019-02-09 | | 3 | 2019-02-21 | 537.71 | 2019-02-10 | +-----+-------------+---------------+-------------+--+
统计用户连续交易的总额
SELECT
id
,start_day
,sum(sum_amount)
FROM deal_start_day
GROUP BY id,start_day
SELECT
id
,datestr
,sum_amount
,start_day
,SUM(sum_amount) OVER(PARTITION BY id,start_day) AS lxjyze
FROM deal_start_day
2.连续登陆天数、
SELECT
id
,start_day
,sum(sum_amount)
,count(*)
FROM deal_start_day
GROUP BY id,start_day
SELECT
id
,datestr
,sum_amount
,start_day
,SUM(sum_amount) OVER(PARTITION BY id,start_day) AS lxjyze
,count(*) OVER(PARTITION BY id,start_day) AS lxdlts
FROM deal_start_day
ORDER BY id,datestr
+-----+-------------+------------------+----------------+--+ | id | datestr | lx_total_amount | lx_total_days | +-----+-------------+------------------+----------------+--+ | 1 | 2019-02-08 | 13600 | 3 | | 1 | 2019-02-09 | 13600 | 3 | | 1 | 2019-02-10 | 13600 | 3 | | 1 | 2019-02-16 | 2991 | 5 | | 1 | 2019-02-12 | 2991 | 5 | | 1 | 2019-02-13 | 2991 | 5 | | 1 | 2019-02-14 | 2991 | 5 | | 1 | 2019-02-15 | 2991 | 5 | | 1 | 2019-02-18 | 1510 | 2 | | 1 | 2019-02-19 | 1510 | 2 | | 1 | 2019-02-21 | 537 | 1 | | 2 | 2019-02-08 | 13600 | 3 | | 2 | 2019-02-09 | 13600 | 3 | | 2 | 2019-02-10 | 13600 | 3 | | 2 | 2019-02-13 | 3026 | 4 | | 2 | 2019-02-15 | 3026 | 4 | | 2 | 2019-02-14 | 3026 | 4 | | 2 | 2019-02-12 | 3026 | 4 | | 2 | 2019-02-19 | 1510 | 2 | | 2 | 2019-02-18 | 1510 | 2 | | 2 | 2019-02-21 | 537 | 1 | | 3 | 2019-02-08 | 13600 | 3 | | 3 | 2019-02-09 | 13600 | 3 | | 3 | 2019-02-10 | 13600 | 3 | | 3 | 2019-02-13 | 2730 | 5 | | 3 | 2019-02-12 | 2730 | 5 | | 3 | 2019-02-14 | 2730 | 5 | | 3 | 2019-02-15 | 2730 | 5 | | 3 | 2019-02-16 | 2730 | 5 | | 3 | 2019-02-18 | 1510 | 2 | | 3 | 2019-02-19 | 1510 | 2 | | 3 | 2019-02-21 | 537 | 1 | +-----+-------------+------------------+----------------+--+
3.连续登陆开始和结束时间、
1 2019-02-08 12461 2019-02-07 13598 3 1 2019-02-09 1052 2019-02-07 13598 3 1 2019-02-10 85 2019-02-07 13598 3 1 2019-02-12 769 2019-02-08 2988 5 1 2019-02-13 943 2019-02-08 2988 5 1 2019-02-14 538 2019-02-08 2988 5 1 2019-02-15 369 2019-02-08 2988 5 1 2019-02-16 369 2019-02-08 2988 5 1 2019-02-18 795 2019-02-09 1510 2 1 2019-02-19 715 2019-02-09 1510 2 1 2019-02-21 537 2019-02-10 537 1
SELECT
id
,datestr
,sum_amount
,start_day
,MIN(datestr) OVER(PARTITION BY id,start_day) lxdl_start_day
,MAX(datestr) OVER(PARTITION BY id,start_day) lxdl_end_day
FROM deal_start_day
ORDER BY id,datestr
4.间隔天数:LEAD ()
1.SQL Server LEAD () 是一个Window函数,它提供对当前行之后的指定物理偏移量的行的访问。 例如,通过使用 LEAD () 函数,可以从当前行访问下一行的数据或下一行之后的行,依此类推。 LEAD () 函数对于将当前行的值与后续行的值进行比较非常有用。
2.日期比较函数: datediff
语法: datediff(string enddate, string startdate)
返回值: int
说明: 返回结束日期减去开始日期的天数。
表示上次结束日期和下次开始日期相差天数
1 2019-02-08 12461 2019-02-07 2019-02-08 2019-02-10 1 2019-02-09 1052 2019-02-07 2019-02-08 2019-02-10 1 2019-02-10 85 2019-02-07 2019-02-08 2019-02-10 1 2019-02-12 769 2019-02-08 2019-02-12 2019-02-16 1 2019-02-13 943 2019-02-08 2019-02-12 2019-02-16 1 2019-02-14 538 2019-02-08 2019-02-12 2019-02-16 1 2019-02-15 369 2019-02-08 2019-02-12 2019-02-16 1 2019-02-16 369 2019-02-08 2019-02-12 2019-02-16 1 2019-02-18 795 2019-02-09 2019-02-18 2019-02-19 1 2019-02-19 715 2019-02-09 2019-02-18 2019-02-19 1 2019-02-21 537 2019-02-10 2019-02-21 2019-02-21 1 2019-02-07 2019-02-10 2019-02-08 1 2019-02-08 2019-02-16 2019-02-12 1 2019-02-09 2019-02-19 2019-02-18 1 2019-02-10 2019-02-21 2019-02-21
WITH max_min_datestr AS (
SELECT
id
,start_day
,max(datestr) as max_datestr
,min(datestr) as min_datestr
FROM deal_start_day
GROUP BY id,start_day
)
SELECT
id
,start_day
,max_datestr
,nex_day
,datediff(nex_day,max_datestr) as jgsj
FROM(
SELECT
id
,start_day
,max_datestr
,LEAD(min_datestr,1,current_date) OVER(PARTITION BY id ORDER BY start_day) as nex_day
FROM max_min_datestr
)T1
+-----+-------------+-------------+----------+--+ | id | lx_end_day | nex_day | jg_days | +-----+-------------+-------------+----------+--+ | 1 | 2019-02-10 | 2019-02-12 | 2 | | 1 | 2019-02-16 | 2019-02-18 | 2 | | 1 | 2019-02-19 | 2019-02-21 | 2 | | 1 | 2019-02-21 | 2022-09-21 | 1308 | | 2 | 2019-02-10 | 2019-02-12 | 2 | | 2 | 2019-02-15 | 2019-02-18 | 3 | | 2 | 2019-02-19 | 2019-02-21 | 2 | | 2 | 2019-02-21 | 2022-09-21 | 1308 | | 3 | 2019-02-10 | 2019-02-12 | 2 | | 3 | 2019-02-16 | 2019-02-18 | 2 | | 3 | 2019-02-19 | 2019-02-21 | 2 | | 3 | 2019-02-21 | 2022-09-21 | 1308 | +-----+-------------+-------------+----------+--+
WITH a AS(
SELECT
id
,datestr
,SUM(amount) as total_amount
FROM deal_tb
GROUP BY id,datestr
),
b AS(
SELECT
id
,datestr
,total_amount
,row_number() over(PARTITION by id ORDER BY datestr) as px
FROM a
),
c AS(
SELECT
id
,datestr
,total_amount
,date_sub(datestr,px) as start_days
FROM b),
d AS(
SELECT
id
,datestr
,start_days
,floor(SUM(total_amount) over(partition by id,start_days)) as lx_total_amount
,count(*) over(partition by id,start_days) as lx_total_days
FROM c
),
e AS(
SELECT
id
,start_days
,MIN(datestr) over(partition by id,start_days) as lx_start_day
,MAX(datestr) over(partition by id,start_days) as lx_end_day
FROM d),
f AS(
SELECT
id
,start_days
,lx_end_day
,lx_start_day
FROM e
GROUP BY id ,lx_end_day,lx_start_day,start_days)
SELECT
id
,lx_end_day
,nex_day
,DATEDIFF(nex_day,lx_end_day) as jg_days
FROM
(
SELECT
id
,lx_end_day
,LEAD(lx_start_day,1,current_date) OVER(PARTITION BY id) as nex_day
FROM f)g
-
1、创建一张名为:user_log_consumption的外部表,指定路径为:“/user_log_consumption”,并加载数据。请写出建表语句。(4分)
--建表语句
create external table if not exists user_log_consumption(
day_t string,
userid string,
price double
)
row format delimited fields terminated by ','
stored as textfile
location '/user_log_consumption';
--加载数据
load data local inpath '/usr/local/soft/data/user_log_consumption.txt' into table user_log_consumption;
-
2、请将名为user_log_consumption的外部表转换成内部表。(4分)
alter table user_log_consumption set tblproperties('EXTERNAL'='FALSE');
-
3、求连续三天登录用户有多少个,用户连续交易的总额、连续登陆天数、连续登陆开始和结束时间、间隔天数,请提供SQL语句(10分)
select dt_start ,count(*) as days_cnt ,min(day_t) as start_date ,max(day_t) as end_date ,lag(dt_start, 1, dt_start) over (partition by userid order by dt_start) as lag_start_dt ,datediff(dt_start, lag(dt_start, 1, dt_start) over (partition by userid order by dt_start)) as interval_day from( select userid ,day_t ,price ,row_number() over (partition by userid order by day_t) as rnk ,date_sub(day_t,row_number() over (partition by userid order by day_t)) as dt_start from( select day_t ,userid ,sum(price) as price from user_log_consumption group by userid,day_t )t1 )t2 group by userid,dt_start;
无数据统计需求
无数据统计
需求说明:若某设备一段时间均无违法数据上报,则认为该设备可能异常
定义异常的可能性:
① 设备在使用过程中,可能中间有段时间会出现问题
② 设备一直存在问题没有及时处理
注意:设备在使用过程中,如果某个时间段中没有采集到数据 也是属于正常情况,因此无法准确判断设备多少天无数据就一定为异常设备
解决:
箱线图可以对数据进行做处理
箱线图:
①箱线图需要对数据进行做排序
②箱线图需要对数据进行取值,通过取到的值再去判断异常值
③上四分位:拿所有数据的总数量 * 3/4
④下四分位:拿所有数据的总数量 * 1/4
⑤IQR表示箱体数据跨度 :上四分位对应数据值 - 下四分位对应数据值
⑥判断异常值:(其中1.5为专家经验值,可以通过实际数据情况进行合理调整)
1> 拿上四分位 + 1.5 * IQR 表示上边缘
2> 拿下四分位 - 1.5 * IQR 表示下边缘
实际处理流程
① 先求出每个设备,两次数据间的时间差,并且有设备到现在一直没数据,也要取时间差
求出每个设备的时间差
-- 方式一:
CREATE TABLE sbbh_wfsjc AS
SELECT
sbbh
,wfsj
,lead_wfsj
,datediff(lead_wfsj,wfsj) as sjc
FROM(
SELECT
sbbh
,wfsj
,lead(wfsj,1,CURRENT_DATE) OVER(PARTITION BY sbbh ORDER BY wfsj) AS lead_wfsj
FROM base_vio_sbbh
) T1
-- 方式二:
WITH rm_tb AS (
SELECT
sbbh
,wfsj
,ROW_NUMBER() OVER(PARTITION BY sbbh ORDER BY wfsj) AS rm
FROM base_vio_sbbh
)
SELECT
T1.sbbh
,T1.wfsj
,IF(T2.wfsj IS NULL,CAST(CURRENT_DATE AS string),T2.wfsj) AS last_day
FROM rm_tb T1
LEFT JOIN rm_tb T2 ON T1.sbbh = T2.sbbh AND T1.rm = T2.rm -1
+---------------------+----------------------+-----------------------+-----------------+--+ | sbbh_wfsjc.id | sbbh_wfsjc.havedata | sbbh_wfsjc.lead_wfsj | sbbh_wfsjc.sjc | +---------------------+----------------------+-----------------------+-----------------+--+ | 451000000000071113 | 2020-04-13 | 2020-04-22 | 9 | | 451000000000071113 | 2020-04-22 | 2020-05-13 | 21 | | 451000000000071113 | 2020-05-13 | 2020-05-18 | 5 | | 451000000000071113 | 2020-05-18 | 2020-06-14 | 27 | | 451000000000071113 | 2020-06-14 | 2020-06-15 | 1 | | 451000000000071113 | 2020-06-15 | 2020-06-16 | 1 | | 451000000000071113 | 2020-06-16 | 2020-06-18 | 2 | | 451000000000071113 | 2020-06-18 | 2020-06-23 | 5 | | 451000000000071113 | 2020-06-23 | 2020-06-28 | 5 | | 451000000000071113 | 2020-06-28 | 2020-06-29 | 1 | | 451000000000071113 | 2020-06-29 | 2020-07-02 | 3 | | 451000000000071113 | 2020-07-02 | 2020-07-07 | 5 | | 451000000000071113 | 2020-07-07 | 2020-07-09 | 2 | | 451000000000071113 | 2020-07-09 | 2020-07-22 | 13 | | 451000000000071113 | 2020-07-22 | 2020-07-26 | 4 | | 451000000000071113 | 2020-07-26 | 2020-07-28 | 2 | | 451000000000071113 | 2020-07-28 | 2020-07-30 | 2 | | 451000000000071113 | 2020-07-30 | 2020-08-01 | 2 | | 451000000000071113 | 2020-08-01 | 2020-08-02 | 1 | | 451000000000071113 | 2020-08-02 | 2020-08-03 | 1 | | 451000000000071113 | 2020-08-03 | 2020-08-05 | 2 | | 451000000000071113 | 2020-08-05 | 2020-08-06 | 1 | | 451000000000071113 | 2020-08-06 | 2020-08-08 | 2 | | 451000000000071113 | 2020-08-08 | 2020-08-09 | 1 | | 451000000000071113 | 2020-08-09 | 2020-08-11 | 2 | | 451000000000071113 | 2020-08-11 | 2020-08-12 | 1 | | 451000000000071113 | 2020-08-12 | 2020-08-13 | 1 | | 451000000000071113 | 2020-08-13 | 2020-08-14 | 1 | | 451000000000071113 | 2020-08-14 | 2020-08-19 | 5 | | 451000000000071113 | 2020-08-19 | 2020-08-20 | 1 | | 451000000000071113 | 2020-08-20 | 2020-08-21 | 1 | | 451000000000071113 | 2020-08-21 | 2020-08-28 | 7 | | 451000000000071113 | 2020-08-28 | 2020-09-02 | 5 | | 451000000000071113 | 2020-09-02 | 2020-09-03 | 1 | | 451000000000071113 | 2020-09-03 | 2020-09-05 | 2 | | 451000000000071113 | 2020-09-05 | 2020-09-11 | 6 | | 451000000000071113 | 2020-09-11 | 2020-09-24 | 13 | | 451000000000071113 | 2020-09-24 | 2020-09-27 | 3 | | 451000000000071113 | 2020-09-27 | 2020-09-28 | 1 | | 451000000000071113 | 2020-09-28 | 2020-09-30 | 2 | | 451000000000071113 | 2020-09-30 | 2020-10-01 | 1 | | 451000000000071113 | 2020-10-01 | 2020-10-02 | 1 | | 451000000000071113 | 2020-10-02 | 2022-09-22 | 720 | +---------------------+----------------------+-----------------------+-----------------+--+
通过箱线图取出异常时间差
-- 求出每个设备的总数,及上下四分位对应数据所在位置
WITH sx_4 AS(
SELECT
sbbh
,count(*) as sbbh_zs
,floor(count(*) / 4) AS s_4
,floor(count(*) / 4 * 3) AS x_4
FROM sbbh_wfsjc
GROUP BY sbbh
)
+---------------------+----------+------+------+--+
| sbbh | sbbh_zs | x_4 | s_4 |
+---------------------+----------+------+------+--+
| 451000000000071113 | 43 | 10 | 32 |
+---------------------+----------+------+------+--+
, sjc_rm AS (
SELECT
sbbh
,wfsj
,lead_wfsj
,sjc
,ROW_NUMBER() OVER(PARTITION BY sbbh ORDER BY sjc DESC) rm
FROM sbbh_wfsjc
)
INSERT INTO TABLE sx_4_value
SELECT
T1.x_4
,T1.s_4
,T2.*
FROM sx_4 T1 JOIN sjc_rm T2 ON T1.sbbh = T2.sbbh
CREATE TABLE sx_4_value (
x_4 int, --10
s_4 int, --32
sbbh STRING
,wfsj STRING
,lead_wfsj STRING
,sjc int
,rm int
)
+-----------------+-----------------+---------------------+------------------+-----------------------+-----------------+----------------+--+ | sx_4_value.x_4 | sx_4_value.s_4 | sx_4_value.sbbh | sx_4_value.wfsj | sx_4_value.lead_wfsj | sx_4_value.sjc | sx_4_value.rm | +-----------------+-----------------+---------------------+------------------+-----------------------+-----------------+----------------+--+ | 10 | 32 | 451000000000071113 | 2020-10-02 | 2022-09-22 | 720 | 1 | | 10 | 32 | 451000000000071113 | 2020-05-18 | 2020-06-14 | 27 | 2 | | 10 | 32 | 451000000000071113 | 2020-04-22 | 2020-05-13 | 21 | 3 | | 10 | 32 | 451000000000071113 | 2020-09-11 | 2020-09-24 | 13 | 4 | | 10 | 32 | 451000000000071113 | 2020-07-09 | 2020-07-22 | 13 | 5 | | 10 | 32 | 451000000000071113 | 2020-04-13 | 2020-04-22 | 9 | 6 | | 10 | 32 | 451000000000071113 | 2020-08-21 | 2020-08-28 | 7 | 7 | | 10 | 32 | 451000000000071113 | 2020-09-05 | 2020-09-11 | 6 | 8 | | 10 | 32 | 451000000000071113 | 2020-05-13 | 2020-05-18 | 5 | 9 | | 10 | 32 | 451000000000071113 | 2020-08-14 | 2020-08-19 | 5 | 10 | | 10 | 32 | 451000000000071113 | 2020-06-23 | 2020-06-28 | 5 | 11 | | 10 | 32 | 451000000000071113 | 2020-08-28 | 2020-09-02 | 5 | 12 | | 10 | 32 | 451000000000071113 | 2020-06-18 | 2020-06-23 | 5 | 13 | | 10 | 32 | 451000000000071113 | 2020-07-02 | 2020-07-07 | 5 | 14 | | 10 | 32 | 451000000000071113 | 2020-07-22 | 2020-07-26 | 4 | 15 | | 10 | 32 | 451000000000071113 | 2020-06-29 | 2020-07-02 | 3 | 16 | | 10 | 32 | 451000000000071113 | 2020-09-24 | 2020-09-27 | 3 | 17 | | 10 | 32 | 451000000000071113 | 2020-07-26 | 2020-07-28 | 2 | 18 | | 10 | 32 | 451000000000071113 | 2020-09-28 | 2020-09-30 | 2 | 19 | | 10 | 32 | 451000000000071113 | 2020-07-07 | 2020-07-09 | 2 | 20 | | 10 | 32 | 451000000000071113 | 2020-06-16 | 2020-06-18 | 2 | 21 | | 10 | 32 | 451000000000071113 | 2020-09-03 | 2020-09-05 | 2 | 22 | | 10 | 32 | 451000000000071113 | 2020-08-09 | 2020-08-11 | 2 | 23 | | 10 | 32 | 451000000000071113 | 2020-08-06 | 2020-08-08 | 2 | 24 | | 10 | 32 | 451000000000071113 | 2020-08-03 | 2020-08-05 | 2 | 25 | | 10 | 32 | 451000000000071113 | 2020-07-30 | 2020-08-01 | 2 | 26 | | 10 | 32 | 451000000000071113 | 2020-07-28 | 2020-07-30 | 2 | 27 | | 10 | 32 | 451000000000071113 | 2020-08-01 | 2020-08-02 | 1 | 28 | | 10 | 32 | 451000000000071113 | 2020-09-02 | 2020-09-03 | 1 | 29 | | 10 | 32 | 451000000000071113 | 2020-08-20 | 2020-08-21 | 1 | 30 | | 10 | 32 | 451000000000071113 | 2020-08-19 | 2020-08-20 | 1 | 31 | | 10 | 32 | 451000000000071113 | 2020-08-13 | 2020-08-14 | 1 | 32 | | 10 | 32 | 451000000000071113 | 2020-08-12 | 2020-08-13 | 1 | 33 | | 10 | 32 | 451000000000071113 | 2020-08-11 | 2020-08-12 | 1 | 34 | | 10 | 32 | 451000000000071113 | 2020-08-08 | 2020-08-09 | 1 | 35 | | 10 | 32 | 451000000000071113 | 2020-09-27 | 2020-09-28 | 1 | 36 | | 10 | 32 | 451000000000071113 | 2020-08-02 | 2020-08-03 | 1 | 37 | | 10 | 32 | 451000000000071113 | 2020-06-28 | 2020-06-29 | 1 | 38 | | 10 | 32 | 451000000000071113 | 2020-06-15 | 2020-06-16 | 1 | 39 | | 10 | 32 | 451000000000071113 | 2020-06-14 | 2020-06-15 | 1 | 40 | | 10 | 32 | 451000000000071113 | 2020-09-30 | 2020-10-01 | 1 | 41 | | 10 | 32 | 451000000000071113 | 2020-10-01 | 2020-10-02 | 1 | 42 | | 10 | 32 | 451000000000071113 | 2020-08-05 | 2020-08-06 | 1 | 43 | +-----------------+-----------------+---------------------+------------------+-----------------------+-----------------+----------------+--+
取上下四分位对应值并求出上边缘值
WITH value_table AS (
SELECT
sbbh
,MAX(IF(x_4=rm,sjc,-1)) as x_4_value
,MAX(IF(s_4=rm,sjc,-1)) as s_4_value
,(MAX(IF(s_4=rm,sjc,-1)) - MAX(IF(x_4=rm,sjc,-1)))*1.5 + MAX(IF(s_4=rm,sjc,-1)) as sby_value
FROM sx_4_value
WHERE x_4 = rm OR s_4 = rm
GROUP BY sbbh
)
SELECT
T1.*
FROM sx_4_value T1 JOIN value_table T2 ON T1.sbbh = T2.sbbh
WHERE T2.sby_value < T1.sjc
+---------+---------+---------------------+-------------+---------------+---------+--------+--+ | t1.x_4 | t1.s_4 | t1.sbbh | t1.wfsj | t1.lead_wfsj | t1.sjc | t1.rm | +---------+---------+---------------------+-------------+---------------+---------+--------+--+ | 10 | 32 | 451000000000071113 | 2020-10-02 | 2022-09-22 | 720 | 1 | | 10 | 32 | 451000000000071113 | 2020-05-18 | 2020-06-14 | 27 | 2 | | 10 | 32 | 451000000000071113 | 2020-04-22 | 2020-05-13 | 21 | 3 | | 10 | 32 | 451000000000071113 | 2020-09-11 | 2020-09-24 | 13 | 4 | | 10 | 32 | 451000000000071113 | 2020-07-09 | 2020-07-22 | 13 | 5 | | 10 | 32 | 451000000000071113 | 2020-04-13 | 2020-04-22 | 9 | 6 | +---------+---------+---------------------+-------------+---------------+---------+--------+--+