Grouping__ID 在不同版本中的使用方法不一样

最新推荐文章于 2023-05-11 19:35:01 发布

javartisan

最新推荐文章于 2023-05-11 19:35:01 发布

阅读量1.5k

点赞数 1

分类专栏： Hive

本文链接：https://blog.csdn.net/dax1n/article/details/104308886

版权

Hive 专栏收录该内容

15 篇文章 2 订阅

订阅专栏

GROUPING SETS clause

Hive2.3版本之后

Grouping__ID function

When aggregates are displayed for a column its value is null. This may conflict in case the column itself has some null values. There needs to be some way to identify NULL in column, which means aggregate and NULL in column, which means value. GROUPING__ID function is the solution to that.

This function returns a bitvector corresponding to whether each column is present or not. For each column, a value of "1" is produced for a row in the result set if that column has been aggregated in that row, otherwise the value is "0". This can be used to differentiate when there are nulls in the data.

Consider the following example:

Column1 (key)	Column2 (value)
1	NULL
1	1
2	2
3	3
3	NULL
4	5

The following query:

SELECT key, value, GROUPING__ID, count(*)

FROM T1

GROUP BY key, value WITH ROLLUP;

will have the following results:

Column 1 (key)	Column 2 (value)	GROUPING__ID	count(*)
NULL	NULL	3	6
1	NULL	0	2
1	NULL	1	1
1	1	0	1
2	NULL	1	1
2	2	0	1
3	NULL	0	2
3	NULL	1	1
3	3	0	1
4	NULL	1	1
4	5	0	1

Note that the third column is a bitvector of columns being selected.
For the first row, none of the columns are being selected.
For the second row, both the columns are being selected (and the second column happens to be null), which explains the value 0.
For the third row, only the first column is being selected, which explains the value 1.

hive 2.3版本之前

Grouping__ID function (before Hive 2.3.0)

补充：2.3之前，的grouping__id值的二进制每一位数值1表示使用该列进行聚合。0表示不适用该列聚合。

Grouping__ID function was fixed in Hive 2.3.0, thus behavior before that release is different (this is expected). For each column, the function would return a value of "0" iif that column has been aggregated in that row, otherwise the value is "1".

Hence the following query:

SELECT key, value, GROUPING__ID, count(*)

FROM T1

GROUP BY key, value WITH ROLLUP;

will have the following results.