select "11+++++++++++++++++++++++++++++++++商品点击各维度聚合表++++++++++++++++++++++++++++++++h_temp_goods_hit_grouping"; select "------ 127 27941970 goods_id,create_month "; select "------ 191 13887202 goods_id,create_quarter "; select "------ 231 19263357 goods_id,provice_id, area_id "; select "------ 251 4623627 goods_id,sex_id "; select "------ 253 7459504 goods_id,skin_id "; select "------ 254 7428975 goods_id,age_id "; select "------ 255 2183881 goods_id "; select "------ 415 43941 brand_id,create_quarter "; select "------ 479 3474 brand_id "; set hive.execution.engine=tez; set hive.exec.dynamic.partition.mode = nonstrict; set hive.exec.dynamic.partition=true; insert overwrite table ${toDB}.h_temp_goods_hit_grouping partition(GROUPING__ID) select goods_id, create_month, create_quarter, brand_id, provice_id, area_id, sex_id, skin_id, age_id, count(*) hit_num, GROUPING__ID from ${toDB}.h_temp_goods_hit GROUP BY goods_id, ---------0/1 0:表示该列有值, 1:表示该列没值 create_month, ------- 0/1 create_quarter, brand_id, provice_id, area_id, sex_id, skin_id, age_id grouping sets (brand_id, (brand_id,create_quarter), (goods_id), (goods_id,create_month), (goods_id,create_quarter), (goods_id,age_id), (goods_id,provice_id, area_id), (goods_id,sex_id), (goods_id,skin_id));
带上条件GROUPING__ID=127
9个字段
goods_id, create_month, create_quarter, brand_id, provice_id, area_id, sex_id, skin_id, age_id
表示 001111111
goods_id | create_month | create_quarter | brand_id | provice_id | area_id | sex_id | skin_id | age_id |
0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
得到的结果如下,表示前两个字段非空
根据前面的grouping sets 可以看到没有(brand_id,age_id)的聚合结果集,
二进制表示 111011110 带上条件GROUPING__ID=478,其中brand_id是最小粒度
goods_id | create_month | create_quarter | brand_id | provice_id | area_id | sex_id | skin_id | age_id |
1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 0 |
没有结果集,
注意:
1、grouping sets 只能用于 group by 之后。
2、grouping sets 中可以包含多种粒度,粒度之间用逗号连接。
3、grouping sets 中的所有字段,都必须出现在 group by 中,相当于 group by 后面的字段是最细粒度。
4、如果 select 中的字段,没有包含在某个 grouping set 中,那么这个粒度下的这个字段值为 NULL。
4、不同的粒度,可以使用内置变量 grouping__id 进行区分。
由于Grouping_set中没有brand_id,age_id的组合情况,所以查询当然不存在了
所以终于搞明白 这个Grouping_set和Grouping_id的对应关系了