Doris–基础–13.5–SQl函数–聚合函数
1、APPROX_COUNT_DISTINCT(expr)
返回类似于 COUNT(DISTINCT col) 结果的近似值聚合函数。
它比 COUNT 和 DISTINCT 组合的速度更快,并使用固定大小的内存,因此对于高基数的列可以使用更少的内存。
MySQL [test_db]> select city,approx_count_distinct(user_id) from table11 group by city;
+--------+----------------------------------+
| city | approx_count_distinct(`user_id`) |
+--------+----------------------------------+
| 广州 | 1 |
| 上海 | 1 |
| 北京 | 2 |
| 深圳 | 1 |
+--------+----------------------------------+
4 rows in set (0.01 sec)
2、AVG([DISTINCT] expr)
用于返回选中字段的平均值
可选字段DISTINCT参数可以用来返回去重平均值
MySQL [test_db]> SELECT date, AVG(cost) FROM table11 group by date;
+------------+--------------------+
| date | avg(`cost`) |
+------------+--------------------+
| 2017-10-01 | 40.666666666666664 |
| 2017-10-02 | 115 |
+------------+--------------------+
2 rows in set (0.00 sec)
MySQL [test_db]> SELECT date, AVG(distinct cost) FROM table11 group by date;
+------------+----------------------+
| date | avg(DISTINCT `cost`) |
+------------+----------------------+
| 2017-10-01 | 40.666666666666664 |
| 2017-10-02 | 115 |
+------------+----------------------+
2 rows in set (0.01 sec)
3、BITMAP_UNION
3.1、创建表
CREATE TABLE `pv_bitmap` (
`dt` int(11) NULL COMMENT "",
`page` varchar(10) NULL COMMENT "",
`user_id` bitmap BITMAP_UNION NULL COMMENT ""
) ENGINE=OLAP
AGGREGATE KEY(`dt`, `page`)
COMMENT "OLAP"
DISTRIBUTED BY HASH(`dt`) BUCKETS 2;
注:当数据量很大时,最好为高频率的 bitmap_union 查询建立对应的 rollup 表
ALTER TABLE pv_bitmap ADD ROLLUP pv (page, user_id);
3.2、插入数据
ALTER TABLE pv_bitmap ADD ROLLUP pv (page, user_id);
insert into pv_bitmap values(20191206, 'xiaoxiang', to_bitmap(101));
insert into pv_bitmap values(20191206, 'waimai', to_bitmap(101));
insert into pv_bitmap values(20191207, 'xiaoxiang', to_bitmap(101));
insert into pv_bitmap values(20191207, 'waimai', to_bitmap(102));
insert into pv_bitmap values(20191208, 'xiaoxiang', to_bitmap(101));
insert into pv_bitmap values(20191208, 'waimai', to_bitmap(101));
3.3、查询数据
BITMAP_UNION(expr) : 计算两个 Bitmap 的并集,返回值是序列化后的 Bitmap 值
MySQL [test_db]> select bitmap_count(bitmap_union(user_id)) from pv_bitmap;
+---------------------------------------+
| bitmap_count(bitmap_union(`user_id`)) |
+---------------------------------------+
| 2 |
+---------------------------------------+
1 row in set (0.02 sec)
BITMAP_COUNT(expr) : 计算 Bitmap 的基数值
MySQL [test_db]> select bitmap_union_count(user_id) from pv_bitmap;
+-------------------------------+
| bitmap_union_count(`user_id`) |
+-------------------------------+
| 2 |
+-------------------------------+
1 row in set (0.01 sec)
BITMAP_UNION_COUNT(expr): 和 BITMAP_COUNT(BITMAP_UNION(expr)) 等价
BITMAP_UNION_INT(expr) : 和 COUNT(DISTINCT expr) 等价 (仅支持 TINYINT,SMALLINT 和 INT)
4、COUNT([DISTINCT] expr)
用于返回满足要求的行的数目
MySQL [test_db]> select count(*) from table11 group by date;
+----------+
| count(*) |
+----------+
| 3 |
| 2 |
+----------+
2 rows in set (0.01 sec)
MySQL [test_db]> select count(date) from table11 group by date;
+---------------+
| count(`date`) |
+---------------+
| 3 |
| 2 |
+---------------+
2 rows in set (0.00 sec)
MySQL [test_db]> select count(distinct date) from table11 group by date;
+------------------------+
| count(DISTINCT `date`) |
+------------------------+
| 1 |
| 1 |
+------------------------+
2 rows in set (0.01 sec)
5、HLL_UNION_AGG(hll)
HLL是基于HyperLogLog算法的工程实现,用于保存HyperLogLog计算过程的中间结果
它只能作为表的value列类型、通过聚合来不断的减少数据量,以此来实现加快查询的目的
基于它得到的是一个估算结果,误差大概在1%左右,hll列是通过其它列或者导入数据里面的数据生成的
导入的时候通过hll_hash函数来指定数据中哪一列用于生成hll列,它常用于替代count distinct,通过结合rollup在业务上用于快速计算uv等
这里没有数据,我就没测试了
select HLL_UNION_AGG(uv_set) from test_uv;
6、MAX(expr)
返回expr表达式的最大值
MySQL [test_db]> select max(cost) from table11 group by date;
+-------------+
| max(`cost`) |
+-------------+
| 100 |
| 200 |
+-------------+
2 rows in set (0.00 sec)
7、MIN(expr)
返回expr表达式的最小值
MySQL [test_db]> select min(cost) from table11 group by date;
+-------------+
| min(`cost`) |
+-------------+
| 2 |
| 30 |
+-------------+
2 rows in set (0.00 sec)
8、SUM(expr)
用于返回选中字段所有值的和
MySQL [test_db]> select sum(cost) from table11 group by date;
+-------------+
| sum(`cost`) |
+-------------+
| 122 |
| 230 |
+-------------+
2 rows in set (0.00 sec)