【SQL】利用HAVING取众数和中位数

本篇文章介绍如何使用HAVING取数据集合的众数和中位数

参考《SQL进阶教程》

1. 创建测试表及测试数据
CREATE TABLE graduates (
  name varchar(255),
  income number(10) 
);

INSERT INTO graduates VALUES ('桑普森', '400000');
INSERT INTO graduates VALUES ('迈克', '30000');
INSERT INTO graduates VALUES ('怀特', '20000');
INSERT INTO graduates VALUES ('阿诺德', '20000');
INSERT INTO graduates VALUES ('史密斯', '20000');
INSERT INTO graduates VALUES ('劳伦斯', '15000');
INSERT INTO graduates VALUES ('哈德逊', '15000');
INSERT INTO graduates VALUES ('肯特', '10000');
INSERT INTO graduates VALUES ('贝克', '10000');
INSERT INTO graduates VALUES ('斯科特', '10000');

COMMIT;
2. 取income众数
select income,count(*) cnt 
from graduates
group by income
having count(*) = (
select max(cnt) from (select count(*) cnt from graduates group by income) tmp
);

需求分析:

例子中要取income的众数,也就是出现次数最多income.思路如下:

(1)确定众数值出现的次数

(2)根据次数获取众数对应的值

(3)整合SQL

2.1 确定众数值出现的次数
SQL> select max(cnt) from (select count(*) cnt from graduates group by income) tmp;

  MAX(CNT)
----------
         3
2.2 根据次数获取众数对应的值
SQL> select income,count(*) cnt 
  2  from graduates
  3  group by income
  4  having count(*) = 3;

    INCOME        CNT
---------- ----------
     10000          3
     20000          3 
2.3 整合SQL
SQL> select income,count(*) cnt 
  2  from graduates
  3  group by income
  4  having count(*) = (
  5  select max(cnt) from (select count(*) cnt from graduates group by income) tmp
  6  );

    INCOME        CNT
---------- ----------
     10000          3
     20000          3
3. 取income中位数
select AVG(tmp.income)
from (
select T1.income from graduates T1,graduates T2
group by T1.income
having sum(case when T1.income >= T2.income then 1 else 0 end) >= count(*)/2
and sum(case when T1.income <= T2.income then 1 else 0 end) >= count(*)/2
) tmp;

需求分析:

数学中取中位数:

把集合中的元素按升序排列后,取位于中间的元素. 如果元素个数为偶数,则取中间两个元素的平均值.

SQL实现思路: 把集合元素按大小排序后,分为上下两部分,然后取两部分交集,再取平均值

实现步骤:

(1) 假设有相同的两个集合T1,T2,取两集合的笛卡尔积

(2) 如果T1中income 大于 T2中的income,则flag记为 1,小于则记为 0

(3) 区分上下部分

  • 对T1中每个income对应的flag取和sum_flag.

  • 对于T1中每一个income,如果sum_flag ≥ count(flag)/2,则判断此income值属于上半部分(较大部分),否则 属于下半部分(较小部分)

    • count(flag)/2 有什么用

      count(flag)/2表示T1集合的中间位置

    • 为什么是≥,而不是>

      因为当元素个数为奇数时使用">"没有问题,但当元素个数为偶数时,上下部分没有交集,所以使用 “≥”.

(4) 取上下部分的交集,并取平均值

(5) 整合SQL

3.1 假设有相同的两个集合T1,T2,取两集合的笛卡尔积
SQL> select t1.income,t2.income from GRADUATES t1,GRADUATES t2;

    INCOME     INCOME
---------- ----------
    400000     400000
    400000      30000
    400000      20000
    400000      20000
    400000      20000
    400000      15000
    400000      15000
    400000      10000
    400000      10000
    400000      10000
     30000     400000
    中间略...
100 rows selected.    
3.2 如果T1中income 大于等于 T2中的income,则flag记为 1,小于则记为 0
SQL> select 
  2  t1.income AS income_t1,
  3  t2.income AS income_t2,
  4  (CASE WHEN t1.income >= t2.income THEN 1 ELSE 0 END) AS flag
  5  from GRADUATES t1,GRADUATES t2;
  
取部分输出结果...
20000     400000          0
20000      30000          0
20000      20000          1
20000      20000          1
20000      20000          1
20000      15000          1
20000      15000          1
20000      10000          1
20000      10000          1
20000      10000          1
...

3.3 区分上下部分
  • 上半部分

    SQL> select t1.income AS income
      2  from GRADUATES t1,GRADUATES t2
      3  GROUP BY t1.income
      4  HAVING SUM(CASE WHEN t1.income >= t2.income THEN 1 ELSE 0 END) >= COUNT(*)/2;
    
        INCOME
    ----------
         30000
         15000
        400000
         20000
    
  • 下半部分

    
    SQL> select t1.income AS income
      2  from GRADUATES t1,GRADUATES t2
      3  GROUP BY t1.income
      4  HAVING SUM(CASE WHEN t1.income <= t2.income THEN 1 ELSE 0 END) >= COUNT(*)/2;
    
        INCOME
    ----------
         10000
         15000
         20000
    
3.4 取上下部分的交集,并取平均值
SQL> SELECT avg(tmp.income) FROM (
  2  select t1.income AS income
  3  from GRADUATES t1,GRADUATES t2
  4  GROUP BY t1.income
  5  HAVING SUM(CASE WHEN t1.income >= t2.income THEN 1 ELSE 0 END) >= COUNT(*)/2
  6  INTERSECT
  7  select t1.income AS income
  8  from GRADUATES t1,GRADUATES t2
  9  GROUP BY t1.income
 10  HAVING SUM(CASE WHEN t1.income <= t2.income THEN 1 ELSE 0 END) >= COUNT(*)/2
 11  ) tmp;

AVG(TMP.INCOME)
---------------
          17500
3.5 整合SQL
SQL> select AVG(tmp.income)
  2  from (
  3  select T1.income from graduates T1,graduates T2
  4  group by T1.income
  5  having sum(case when T1.income >= T2.income then 1 else 0 end) >= count(*)/2
  6  and sum(case when T1.income <= T2.income then 1 else 0 end) >= count(*)/2
  7  ) tmp;

AVG(TMP.INCOME)
---------------
          17500
  • 7
    点赞
  • 22
    收藏
    觉得还不错? 一键收藏
  • 5
    评论
评论 5
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值