hive 的连续登录问题与无数据统计需求

连续交易问题

在电商、物流和银行 可能经常会遇到这样的需求: 1.统计用户连续交易的总额、 2.连续登陆天数、 3.连续登陆开始和结束时间、 4.间隔天数等

create table deal_tb( id string COMMENT '用户ID' ,datestr string COMMENT '交易日期' ,amount string COMMENT '交易额' )row format delimited fields terminated by ',';

id    datestr      amount
1,2019-02-08,6214.23 
1,2019-02-08,6247.32 
1,2019-02-09,85.63 
1,2019-02-09,967.36 
1,2019-02-10,85.69 
1,2019-02-12,769.85 
1,2019-02-13,943.86 
1,2019-02-14,538.42
1,2019-02-15,369.76
1,2019-02-16,369.76
1,2019-02-18,795.15
1,2019-02-19,715.65
1,2019-02-21,537.71
2,2019-02-08,6214.23 
2,2019-02-08,6247.32 
2,2019-02-09,85.63 
2,2019-02-09,967.36 
2,2019-02-10,85.69 
2,2019-02-12,769.85 
2,2019-02-13,943.86 
2,2019-02-14,943.18
2,2019-02-15,369.76
2,2019-02-18,795.15
2,2019-02-19,715.65
2,2019-02-21,537.71
3,2019-02-08,6214.23 
3,2019-02-08,6247.32 
3,2019-02-09,85.63 
3,2019-02-09,967.36 
3,2019-02-10,85.69 
3,2019-02-12,769.85 
3,2019-02-13,943.86 
3,2019-02-14,276.81
3,2019-02-15,369.76
3,2019-02-16,369.76
3,2019-02-18,795.15
3,2019-02-19,715.65
3,2019-02-21,537.71

其中一条数据表示一次交易

每天交易总额

-- 由于没有要求关联明细数据,数据最小维度为天
SELECT
id
,datestr
,sum(amount)
FROM deal_tb
GROUP BY id,datestr
1       2019-02-08      12461.55
1       2019-02-09      1052.99
1       2019-02-10      85.69
1       2019-02-12      769.85
1       2019-02-13      943.86
1       2019-02-14      538.42
1       2019-02-15      369.76
1       2019-02-16      369.76
1       2019-02-18      795.15
1       2019-02-19      715.65
1       2019-02-21      537.71
2       2019-02-08      12461.55
2       2019-02-09      1052.99
2       2019-02-10      85.69
2       2019-02-12      769.85
2       2019-02-13      943.86
2       2019-02-14      943.18
2       2019-02-15      369.76
2       2019-02-18      795.15
2       2019-02-19      715.65
2       2019-02-21      537.71
3       2019-02-08      12461.55
3       2019-02-09      1052.99
3       2019-02-10      85.69
3       2019-02-12      769.85
3       2019-02-13      943.86
3       2019-02-14      276.81
3       2019-02-15      369.76
3       2019-02-16      369.76
3       2019-02-18      795.15
3       2019-02-19      715.65
3       2019-02-21      537.71

1.统计用户连续交易的总额

分析:

①由于需要处理连续交易,而连续交易定义为:

连续一天或多天都存在有交易

②日期的定义

由一个开始时间,不断累加而来

1 2019-02-08 12461.55 -7 02-01 -1 02-07 1 2019-02-09 1052.99 -8 02-01 -2 02-07 1 2019-02-10 85.69 -9 02-01 -3 02-07 1 2019-02-12 769.85 -11 02-01 -4 02-08 1 2019-02-13 943.86 -5 02-08 1 2019-02-14 538.42 1 2019-02-15 369.76 1 2019-02-16 369.76

③ 如果日期为连续的,那么减去一个连续的数值,那么得到的应该是一个相同的开始时间,如果减去连续数值后开始时间不一致,那么说明该日期不连续

CREATE TABLE deal_start_day (
id STRING
,datestr STRING
,sum_amount int
,start_day STRING
)
​
​
WITH deal_tb_sum AS (
SELECT
id
,datestr
,sum(amount) as sum_amount
FROM deal_tb
GROUP BY id,datestr
)
-- WITH AS 语句和CREATE TABLE AS 不能一起使用
INSERT INTO TABLE deal_start_day
SELECT
id
,datestr
,sum_amount
,date_sub(datestr,pm) as start_day
FROM (
SELECT
id
,datestr
,sum_amount
,ROW_NUMBER() OVER(PARTITION BY id ORDER BY datestr) AS pm
FROM deal_tb_sum
)T1
​

+-----+-------------+---------------+-------------+--+ | id | datestr | total_amount | statr_days | +-----+-------------+---------------+-------------+--+ | 1 | 2019-02-08 | 12461.55 | 2019-02-07 | | 1 | 2019-02-09 | 1052.99 | 2019-02-07 | | 1 | 2019-02-10 | 85.69 | 2019-02-07 | | 1 | 2019-02-12 | 769.85 | 2019-02-08 | | 1 | 2019-02-13 | 943.86 | 2019-02-08 | | 1 | 2019-02-14 | 538.42 | 2019-02-08 | | 1 | 2019-02-15 | 369.76 | 2019-02-08 | | 1 | 2019-02-16 | 369.76 | 2019-02-08 | | 1 | 2019-02-18 | 795.15 | 2019-02-09 | | 1 | 2019-02-19 | 715.65 | 2019-02-09 | | 1 | 2019-02-21 | 537.71 | 2019-02-10 | | 2 | 2019-02-08 | 12461.55 | 2019-02-07 | | 2 | 2019-02-09 | 1052.99 | 2019-02-07 | | 2 | 2019-02-10 | 85.69 | 2019-02-07 | | 2 | 2019-02-12 | 769.85 | 2019-02-08 | | 2 | 2019-02-13 | 943.86 | 2019-02-08 | | 2 | 2019-02-14 | 943.18 | 2019-02-08 | | 2 | 2019-02-15 | 369.76 | 2019-02-08 | | 2 | 2019-02-18 | 795.15 | 2019-02-10 | | 2 | 2019-02-19 | 715.65 | 2019-02-10 | | 2 | 2019-02-21 | 537.71 | 2019-02-11 | | 3 | 2019-02-08 | 12461.55 | 2019-02-07 | | 3 | 2019-02-09 | 1052.99 | 2019-02-07 | | 3 | 2019-02-10 | 85.69 | 2019-02-07 | | 3 | 2019-02-12 | 769.85 | 2019-02-08 | | 3 | 2019-02-13 | 943.86 | 2019-02-08 | | 3 | 2019-02-14 | 276.81 | 2019-02-08 | | 3 | 2019-02-15 | 369.76 | 2019-02-08 | | 3 | 2019-02-16 | 369.76 | 2019-02-08 | | 3 | 2019-02-18 | 795.15 | 2019-02-09 | | 3 | 2019-02-19 | 715.65 | 2019-02-09 | | 3 | 2019-02-21 | 537.71 | 2019-02-10 | +-----+-------------+---------------+-------------+--+

统计用户连续交易的总额

SELECT
id
,start_day
,sum(sum_amount)
FROM deal_start_day
GROUP BY id,start_day
​
​
SELECT
id  
,datestr  
,sum_amount  
,start_day  
,SUM(sum_amount) OVER(PARTITION BY id,start_day) AS lxjyze
FROM deal_start_day
​
2.连续登陆天数、

SELECT
id
,start_day
,sum(sum_amount)
,count(*)
FROM deal_start_day
GROUP BY id,start_day
​
​
SELECT
id  
,datestr  
,sum_amount  
,start_day  
,SUM(sum_amount) OVER(PARTITION BY id,start_day) AS lxjyze
,count(*) OVER(PARTITION BY id,start_day) AS lxdlts
FROM deal_start_day
ORDER BY id,datestr  

+-----+-------------+------------------+----------------+--+ | id | datestr | lx_total_amount | lx_total_days | +-----+-------------+------------------+----------------+--+ | 1 | 2019-02-08 | 13600 | 3 | | 1 | 2019-02-09 | 13600 | 3 | | 1 | 2019-02-10 | 13600 | 3 | | 1 | 2019-02-16 | 2991 | 5 | | 1 | 2019-02-12 | 2991 | 5 | | 1 | 2019-02-13 | 2991 | 5 | | 1 | 2019-02-14 | 2991 | 5 | | 1 | 2019-02-15 | 2991 | 5 | | 1 | 2019-02-18 | 1510 | 2 | | 1 | 2019-02-19 | 1510 | 2 | | 1 | 2019-02-21 | 537 | 1 | | 2 | 2019-02-08 | 13600 | 3 | | 2 | 2019-02-09 | 13600 | 3 | | 2 | 2019-02-10 | 13600 | 3 | | 2 | 2019-02-13 | 3026 | 4 | | 2 | 2019-02-15 | 3026 | 4 | | 2 | 2019-02-14 | 3026 | 4 | | 2 | 2019-02-12 | 3026 | 4 | | 2 | 2019-02-19 | 1510 | 2 | | 2 | 2019-02-18 | 1510 | 2 | | 2 | 2019-02-21 | 537 | 1 | | 3 | 2019-02-08 | 13600 | 3 | | 3 | 2019-02-09 | 13600 | 3 | | 3 | 2019-02-10 | 13600 | 3 | | 3 | 2019-02-13 | 2730 | 5 | | 3 | 2019-02-12 | 2730 | 5 | | 3 | 2019-02-14 | 2730 | 5 | | 3 | 2019-02-15 | 2730 | 5 | | 3 | 2019-02-16 | 2730 | 5 | | 3 | 2019-02-18 | 1510 | 2 | | 3 | 2019-02-19 | 1510 | 2 | | 3 | 2019-02-21 | 537 | 1 | +-----+-------------+------------------+----------------+--+

3.连续登陆开始和结束时间、

1       2019-02-08      12461   2019-02-07      13598   3
1       2019-02-09      1052    2019-02-07      13598   3
1       2019-02-10      85      2019-02-07      13598   3
1       2019-02-12      769     2019-02-08      2988    5
1       2019-02-13      943     2019-02-08      2988    5
1       2019-02-14      538     2019-02-08      2988    5
1       2019-02-15      369     2019-02-08      2988    5
1       2019-02-16      369     2019-02-08      2988    5
1       2019-02-18      795     2019-02-09      1510    2
1       2019-02-19      715     2019-02-09      1510    2
1       2019-02-21      537     2019-02-10      537     1
​

SELECT
id  
,datestr  
,sum_amount  
,start_day  
,MIN(datestr) OVER(PARTITION BY id,start_day) lxdl_start_day
,MAX(datestr) OVER(PARTITION BY id,start_day) lxdl_end_day
FROM deal_start_day
ORDER BY id,datestr  

4.间隔天数:LEAD ()

1.SQL Server LEAD () 是一个Window函数,它提供对当前行之后的指定物理偏移量的行的访问。 例如,通过使用 LEAD () 函数,可以从当前行访问下一行的数据或下一行之后的行,依此类推。 LEAD () 函数对于将当前行的值与后续行的值进行比较非常有用。

2.日期比较函数: datediff

语法: datediff(string enddate, string startdate)

返回值: int

说明: 返回结束日期减去开始日期的天数。

表示上次结束日期和下次开始日期相差天数

1       2019-02-08      12461   2019-02-07      2019-02-08      2019-02-10
1       2019-02-09      1052    2019-02-07      2019-02-08      2019-02-10
1       2019-02-10      85      2019-02-07      2019-02-08      2019-02-10
1       2019-02-12      769     2019-02-08      2019-02-12      2019-02-16
1       2019-02-13      943     2019-02-08      2019-02-12      2019-02-16
1       2019-02-14      538     2019-02-08      2019-02-12      2019-02-16
1       2019-02-15      369     2019-02-08      2019-02-12      2019-02-16
1       2019-02-16      369     2019-02-08      2019-02-12      2019-02-16
1       2019-02-18      795     2019-02-09      2019-02-18      2019-02-19
1       2019-02-19      715     2019-02-09      2019-02-18      2019-02-19
1       2019-02-21      537     2019-02-10      2019-02-21      2019-02-21
​
​
​
​
1       2019-02-07      2019-02-10      2019-02-08
1       2019-02-08      2019-02-16      2019-02-12
1       2019-02-09      2019-02-19      2019-02-18
1       2019-02-10      2019-02-21      2019-02-21
​
WITH max_min_datestr AS (
SELECT
id
,start_day
,max(datestr) as max_datestr
,min(datestr) as min_datestr
FROM deal_start_day
GROUP BY id,start_day
)
SELECT
id
,start_day
,max_datestr
,nex_day
,datediff(nex_day,max_datestr) as jgsj
FROM(
SELECT
id
,start_day
,max_datestr
,LEAD(min_datestr,1,current_date) OVER(PARTITION BY id ORDER BY start_day) as nex_day
FROM max_min_datestr
)T1
​

+-----+-------------+-------------+----------+--+ | id | lx_end_day | nex_day | jg_days | +-----+-------------+-------------+----------+--+ | 1 | 2019-02-10 | 2019-02-12 | 2 | | 1 | 2019-02-16 | 2019-02-18 | 2 | | 1 | 2019-02-19 | 2019-02-21 | 2 | | 1 | 2019-02-21 | 2022-09-21 | 1308 | | 2 | 2019-02-10 | 2019-02-12 | 2 | | 2 | 2019-02-15 | 2019-02-18 | 3 | | 2 | 2019-02-19 | 2019-02-21 | 2 | | 2 | 2019-02-21 | 2022-09-21 | 1308 | | 3 | 2019-02-10 | 2019-02-12 | 2 | | 3 | 2019-02-16 | 2019-02-18 | 2 | | 3 | 2019-02-19 | 2019-02-21 | 2 | | 3 | 2019-02-21 | 2022-09-21 | 1308 | +-----+-------------+-------------+----------+--+

WITH a AS(
SELECT
id
,datestr
,SUM(amount) as total_amount
FROM deal_tb
GROUP BY id,datestr
),
b AS(
SELECT
id
,datestr
,total_amount
,row_number() over(PARTITION by id ORDER BY datestr) as px
FROM a
),
c AS(
SELECT
id
,datestr
,total_amount
,date_sub(datestr,px) as start_days
FROM b),
d AS(
SELECT
id
,datestr
,start_days
,floor(SUM(total_amount) over(partition by id,start_days)) as lx_total_amount
,count(*) over(partition by id,start_days) as lx_total_days
 FROM c
 ),
 e AS(
 SELECT
 id
,start_days
,MIN(datestr) over(partition by id,start_days) as lx_start_day
,MAX(datestr) over(partition by id,start_days) as lx_end_day
 FROM d),
f AS(
 SELECT
 id
,start_days
 ,lx_end_day
 ,lx_start_day
 FROM e
 GROUP BY id ,lx_end_day,lx_start_day,start_days)
SELECT
id
,lx_end_day
,nex_day
,DATEDIFF(nex_day,lx_end_day) as jg_days
FROM
(
SELECT
id
,lx_end_day
,LEAD(lx_start_day,1,current_date) OVER(PARTITION BY id) as nex_day
FROM f)g
​
​
  • 1、创建一张名为:user_log_consumption的外部表,指定路径为:“/user_log_consumption”,并加载数据。请写出建表语句。(4分)

--建表语句
create external table if not exists user_log_consumption(
day_t string,
userid string,
price double
)
row format delimited fields terminated by ','
stored as textfile
location '/user_log_consumption';
​
--加载数据
load data local inpath '/usr/local/soft/data/user_log_consumption.txt' into table user_log_consumption;

  • 2、请将名为user_log_consumption的外部表转换成内部表。(4分)

    alter table user_log_consumption set tblproperties('EXTERNAL'='FALSE');

  • 3、求连续三天登录用户有多少个,用户连续交易的总额、连续登陆天数、连续登陆开始和结束时间、间隔天数,请提供SQL语句(10分)

    select dt_start 
           ,count(*) as days_cnt  
           ,min(day_t) as start_date 
           ,max(day_t) as end_date  
           ,lag(dt_start, 1, dt_start) over (partition by userid order by dt_start) as lag_start_dt
           ,datediff(dt_start, lag(dt_start, 1, dt_start) over (partition by userid order by dt_start)) as interval_day
    from(
        select userid
          ,day_t
          ,price
          ,row_number() over (partition by userid order by day_t) as rnk
          ,date_sub(day_t,row_number() over (partition by userid order by day_t)) as dt_start
        from(
            select day_t
                ,userid
                ,sum(price) as price
            from user_log_consumption
            group by userid,day_t
        )t1
    )t2
    group by userid,dt_start;

无数据统计需求

无数据统计

需求说明:若某设备一段时间均无违法数据上报,则认为该设备可能异常

定义异常的可能性:

① 设备在使用过程中,可能中间有段时间会出现问题

② 设备一直存在问题没有及时处理

注意:设备在使用过程中,如果某个时间段中没有采集到数据 也是属于正常情况,因此无法准确判断设备多少天无数据就一定为异常设备

解决:

箱线图可以对数据进行做处理

箱线图:

①箱线图需要对数据进行做排序

②箱线图需要对数据进行取值,通过取到的值再去判断异常值

③上四分位:拿所有数据的总数量 * 3/4

④下四分位:拿所有数据的总数量 * 1/4

⑤IQR表示箱体数据跨度 :上四分位对应数据值 - 下四分位对应数据值

⑥判断异常值:(其中1.5为专家经验值,可以通过实际数据情况进行合理调整)

1> 拿上四分位 + 1.5 * IQR 表示上边缘

2> 拿下四分位 - 1.5 * IQR 表示下边缘

实际处理流程

① 先求出每个设备,两次数据间的时间差,并且有设备到现在一直没数据,也要取时间差

求出每个设备的时间差

-- 方式一:
CREATE TABLE sbbh_wfsjc AS 
SELECT
sbbh
,wfsj
,lead_wfsj
,datediff(lead_wfsj,wfsj) as sjc
FROM(
SELECT
sbbh
,wfsj
,lead(wfsj,1,CURRENT_DATE) OVER(PARTITION BY sbbh ORDER BY wfsj) AS lead_wfsj
FROM base_vio_sbbh
) T1
​
-- 方式二:
WITH rm_tb AS (
SELECT
sbbh
,wfsj
,ROW_NUMBER() OVER(PARTITION BY sbbh ORDER BY wfsj) AS rm
FROM base_vio_sbbh
)
SELECT
T1.sbbh
,T1.wfsj
,IF(T2.wfsj IS NULL,CAST(CURRENT_DATE AS string),T2.wfsj) AS last_day
FROM rm_tb T1 
LEFT JOIN rm_tb T2 ON  T1.sbbh = T2.sbbh AND T1.rm = T2.rm -1

+---------------------+----------------------+-----------------------+-----------------+--+ | sbbh_wfsjc.id | sbbh_wfsjc.havedata | sbbh_wfsjc.lead_wfsj | sbbh_wfsjc.sjc | +---------------------+----------------------+-----------------------+-----------------+--+ | 451000000000071113 | 2020-04-13 | 2020-04-22 | 9 | | 451000000000071113 | 2020-04-22 | 2020-05-13 | 21 | | 451000000000071113 | 2020-05-13 | 2020-05-18 | 5 | | 451000000000071113 | 2020-05-18 | 2020-06-14 | 27 | | 451000000000071113 | 2020-06-14 | 2020-06-15 | 1 | | 451000000000071113 | 2020-06-15 | 2020-06-16 | 1 | | 451000000000071113 | 2020-06-16 | 2020-06-18 | 2 | | 451000000000071113 | 2020-06-18 | 2020-06-23 | 5 | | 451000000000071113 | 2020-06-23 | 2020-06-28 | 5 | | 451000000000071113 | 2020-06-28 | 2020-06-29 | 1 | | 451000000000071113 | 2020-06-29 | 2020-07-02 | 3 | | 451000000000071113 | 2020-07-02 | 2020-07-07 | 5 | | 451000000000071113 | 2020-07-07 | 2020-07-09 | 2 | | 451000000000071113 | 2020-07-09 | 2020-07-22 | 13 | | 451000000000071113 | 2020-07-22 | 2020-07-26 | 4 | | 451000000000071113 | 2020-07-26 | 2020-07-28 | 2 | | 451000000000071113 | 2020-07-28 | 2020-07-30 | 2 | | 451000000000071113 | 2020-07-30 | 2020-08-01 | 2 | | 451000000000071113 | 2020-08-01 | 2020-08-02 | 1 | | 451000000000071113 | 2020-08-02 | 2020-08-03 | 1 | | 451000000000071113 | 2020-08-03 | 2020-08-05 | 2 | | 451000000000071113 | 2020-08-05 | 2020-08-06 | 1 | | 451000000000071113 | 2020-08-06 | 2020-08-08 | 2 | | 451000000000071113 | 2020-08-08 | 2020-08-09 | 1 | | 451000000000071113 | 2020-08-09 | 2020-08-11 | 2 | | 451000000000071113 | 2020-08-11 | 2020-08-12 | 1 | | 451000000000071113 | 2020-08-12 | 2020-08-13 | 1 | | 451000000000071113 | 2020-08-13 | 2020-08-14 | 1 | | 451000000000071113 | 2020-08-14 | 2020-08-19 | 5 | | 451000000000071113 | 2020-08-19 | 2020-08-20 | 1 | | 451000000000071113 | 2020-08-20 | 2020-08-21 | 1 | | 451000000000071113 | 2020-08-21 | 2020-08-28 | 7 | | 451000000000071113 | 2020-08-28 | 2020-09-02 | 5 | | 451000000000071113 | 2020-09-02 | 2020-09-03 | 1 | | 451000000000071113 | 2020-09-03 | 2020-09-05 | 2 | | 451000000000071113 | 2020-09-05 | 2020-09-11 | 6 | | 451000000000071113 | 2020-09-11 | 2020-09-24 | 13 | | 451000000000071113 | 2020-09-24 | 2020-09-27 | 3 | | 451000000000071113 | 2020-09-27 | 2020-09-28 | 1 | | 451000000000071113 | 2020-09-28 | 2020-09-30 | 2 | | 451000000000071113 | 2020-09-30 | 2020-10-01 | 1 | | 451000000000071113 | 2020-10-01 | 2020-10-02 | 1 | | 451000000000071113 | 2020-10-02 | 2022-09-22 | 720 | +---------------------+----------------------+-----------------------+-----------------+--+

通过箱线图取出异常时间差

-- 求出每个设备的总数,及上下四分位对应数据所在位置
WITH sx_4 AS(
SELECT
sbbh
,count(*) as sbbh_zs
,floor(count(*) / 4) AS s_4 
,floor(count(*) / 4 * 3) AS x_4
FROM sbbh_wfsjc
GROUP BY sbbh
)
+---------------------+----------+------+------+--+
|        sbbh         | sbbh_zs  | x_4  | s_4  |
+---------------------+----------+------+------+--+
| 451000000000071113  | 43       | 10   | 32   |
+---------------------+----------+------+------+--+
​
, sjc_rm AS (
SELECT
sbbh
,wfsj
,lead_wfsj
,sjc
,ROW_NUMBER() OVER(PARTITION BY sbbh ORDER BY sjc DESC) rm
FROM sbbh_wfsjc
)
INSERT INTO TABLE sx_4_value 
SELECT
T1.x_4
,T1.s_4
,T2.*
FROM sx_4 T1 JOIN sjc_rm T2 ON T1.sbbh = T2.sbbh
​
​
CREATE TABLE sx_4_value (
x_4 int, --10
s_4 int, --32
sbbh STRING
,wfsj STRING
,lead_wfsj STRING
,sjc int
,rm int
)

+-----------------+-----------------+---------------------+------------------+-----------------------+-----------------+----------------+--+ | sx_4_value.x_4 | sx_4_value.s_4 | sx_4_value.sbbh | sx_4_value.wfsj | sx_4_value.lead_wfsj | sx_4_value.sjc | sx_4_value.rm | +-----------------+-----------------+---------------------+------------------+-----------------------+-----------------+----------------+--+ | 10 | 32 | 451000000000071113 | 2020-10-02 | 2022-09-22 | 720 | 1 | | 10 | 32 | 451000000000071113 | 2020-05-18 | 2020-06-14 | 27 | 2 | | 10 | 32 | 451000000000071113 | 2020-04-22 | 2020-05-13 | 21 | 3 | | 10 | 32 | 451000000000071113 | 2020-09-11 | 2020-09-24 | 13 | 4 | | 10 | 32 | 451000000000071113 | 2020-07-09 | 2020-07-22 | 13 | 5 | | 10 | 32 | 451000000000071113 | 2020-04-13 | 2020-04-22 | 9 | 6 | | 10 | 32 | 451000000000071113 | 2020-08-21 | 2020-08-28 | 7 | 7 | | 10 | 32 | 451000000000071113 | 2020-09-05 | 2020-09-11 | 6 | 8 | | 10 | 32 | 451000000000071113 | 2020-05-13 | 2020-05-18 | 5 | 9 | | 10 | 32 | 451000000000071113 | 2020-08-14 | 2020-08-19 | 5 | 10 | | 10 | 32 | 451000000000071113 | 2020-06-23 | 2020-06-28 | 5 | 11 | | 10 | 32 | 451000000000071113 | 2020-08-28 | 2020-09-02 | 5 | 12 | | 10 | 32 | 451000000000071113 | 2020-06-18 | 2020-06-23 | 5 | 13 | | 10 | 32 | 451000000000071113 | 2020-07-02 | 2020-07-07 | 5 | 14 | | 10 | 32 | 451000000000071113 | 2020-07-22 | 2020-07-26 | 4 | 15 | | 10 | 32 | 451000000000071113 | 2020-06-29 | 2020-07-02 | 3 | 16 | | 10 | 32 | 451000000000071113 | 2020-09-24 | 2020-09-27 | 3 | 17 | | 10 | 32 | 451000000000071113 | 2020-07-26 | 2020-07-28 | 2 | 18 | | 10 | 32 | 451000000000071113 | 2020-09-28 | 2020-09-30 | 2 | 19 | | 10 | 32 | 451000000000071113 | 2020-07-07 | 2020-07-09 | 2 | 20 | | 10 | 32 | 451000000000071113 | 2020-06-16 | 2020-06-18 | 2 | 21 | | 10 | 32 | 451000000000071113 | 2020-09-03 | 2020-09-05 | 2 | 22 | | 10 | 32 | 451000000000071113 | 2020-08-09 | 2020-08-11 | 2 | 23 | | 10 | 32 | 451000000000071113 | 2020-08-06 | 2020-08-08 | 2 | 24 | | 10 | 32 | 451000000000071113 | 2020-08-03 | 2020-08-05 | 2 | 25 | | 10 | 32 | 451000000000071113 | 2020-07-30 | 2020-08-01 | 2 | 26 | | 10 | 32 | 451000000000071113 | 2020-07-28 | 2020-07-30 | 2 | 27 | | 10 | 32 | 451000000000071113 | 2020-08-01 | 2020-08-02 | 1 | 28 | | 10 | 32 | 451000000000071113 | 2020-09-02 | 2020-09-03 | 1 | 29 | | 10 | 32 | 451000000000071113 | 2020-08-20 | 2020-08-21 | 1 | 30 | | 10 | 32 | 451000000000071113 | 2020-08-19 | 2020-08-20 | 1 | 31 | | 10 | 32 | 451000000000071113 | 2020-08-13 | 2020-08-14 | 1 | 32 | | 10 | 32 | 451000000000071113 | 2020-08-12 | 2020-08-13 | 1 | 33 | | 10 | 32 | 451000000000071113 | 2020-08-11 | 2020-08-12 | 1 | 34 | | 10 | 32 | 451000000000071113 | 2020-08-08 | 2020-08-09 | 1 | 35 | | 10 | 32 | 451000000000071113 | 2020-09-27 | 2020-09-28 | 1 | 36 | | 10 | 32 | 451000000000071113 | 2020-08-02 | 2020-08-03 | 1 | 37 | | 10 | 32 | 451000000000071113 | 2020-06-28 | 2020-06-29 | 1 | 38 | | 10 | 32 | 451000000000071113 | 2020-06-15 | 2020-06-16 | 1 | 39 | | 10 | 32 | 451000000000071113 | 2020-06-14 | 2020-06-15 | 1 | 40 | | 10 | 32 | 451000000000071113 | 2020-09-30 | 2020-10-01 | 1 | 41 | | 10 | 32 | 451000000000071113 | 2020-10-01 | 2020-10-02 | 1 | 42 | | 10 | 32 | 451000000000071113 | 2020-08-05 | 2020-08-06 | 1 | 43 | +-----------------+-----------------+---------------------+------------------+-----------------------+-----------------+----------------+--+

取上下四分位对应值并求出上边缘值

WITH value_table AS (
SELECT
sbbh
,MAX(IF(x_4=rm,sjc,-1)) as x_4_value
,MAX(IF(s_4=rm,sjc,-1)) as s_4_value
,(MAX(IF(s_4=rm,sjc,-1)) - MAX(IF(x_4=rm,sjc,-1)))*1.5 + MAX(IF(s_4=rm,sjc,-1)) as sby_value
FROM sx_4_value
WHERE x_4 = rm OR s_4 = rm
GROUP BY sbbh
)
SELECT
T1.*
FROM sx_4_value T1 JOIN value_table T2 ON T1.sbbh = T2.sbbh 
WHERE T2.sby_value < T1.sjc

+---------+---------+---------------------+-------------+---------------+---------+--------+--+ | t1.x_4 | t1.s_4 | t1.sbbh | t1.wfsj | t1.lead_wfsj | t1.sjc | t1.rm | +---------+---------+---------------------+-------------+---------------+---------+--------+--+ | 10 | 32 | 451000000000071113 | 2020-10-02 | 2022-09-22 | 720 | 1 | | 10 | 32 | 451000000000071113 | 2020-05-18 | 2020-06-14 | 27 | 2 | | 10 | 32 | 451000000000071113 | 2020-04-22 | 2020-05-13 | 21 | 3 | | 10 | 32 | 451000000000071113 | 2020-09-11 | 2020-09-24 | 13 | 4 | | 10 | 32 | 451000000000071113 | 2020-07-09 | 2020-07-22 | 13 | 5 | | 10 | 32 | 451000000000071113 | 2020-04-13 | 2020-04-22 | 9 | 6 | +---------+---------+---------------------+-------------+---------------+---------+--------+--+

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值