问题1:
在使用 Cube 或者 RollUP的时候,如果分组条件>=5个的时候,会出现一个特殊的问题
当分组条件个数<=4的时候:(可以正常运行且没有错误)
SELECT
nvl (wm,'ALL'),
nvl (product_id, 'ALL'),
nvl (
count(
distinct(
IF (
event_id IN ('BS00A003', 'BW00H010'),
product_id,
null
)
)
),
0
) share_commodity_num--商品被分享次数
FROM
gds.xx
WHERE
dt >= '${hivevar:sdate}'
AND dt <= '${hivevar:edate}'
AND length(shop_id) > 1
AND wm is not null
AND wm !='-'
AND category_id_1 is not null and category_id_1 !='-'
AND category_id_2 is not null and category_id_2 !='-'
AND category_id_3 is not null and category_id_3 !='-'
AND category_id_4 is not null and category_id_4 !='-'
GROUP BY
wm,
category_id_1,
category_id_2,
product_id with cube;
当分组条件个数>=5的时候:(会提示如下错误)
SELECT
nvl (wm,'ALL'),
nvl (product_id, 'ALL'),
nvl (
count(
distinct(
IF (
event_id IN ('BS00A003', 'BW00H010'),
product_id,
null
)
)
),
0
) share_commodity_num--商品被分享次数
FROM
gds.xx
WHERE
dt >= '${hivevar:sdate}'
AND dt <= '${hivevar:edate}'
AND length(shop_id) > 1
AND wm is not null
AND wm !='-'
AND category_id_1 is not null and category_id_1 !='-'
AND category_id_2 is not null and category_id_2 !='-'
AND category_id_3 is not null and category_id_3 !='-'
AND category_id_4 is not null and category_id_4 !='-'
GROUP BY
wm,
category_id_1,
category_id_2,
category_id_3,
product_id with cube;
Error while compiling statement: FAILED: SemanticException [Error 10210]: Grouping sets aggregations (with rollups or cubes) are not allowed if aggregation function parameters overlap with the aggregation functions columns
我现在解决这个问题的方式就是通过修改 sql语句的写法来处理,可以使用子查询(一般不能满足需求) 或者 Grouping Sets(可以满足需求,但是组合条件会比较多)
情况2:当from 的表需要通过表连接来得到的时候,即使cube或者rollup或者grouping sets 都会出现问题,这种情况下只能将表连接得到的表先保存到一个临时表中,在查询临时表当做数据源
问题2:
当Cube 或者 RollUp 的分组条件>=5的时候,会提示你在聚合函数中不能使用 distinct 会提示如下错误信息:
An additional MR job is introduced since the cardinality of grouping sets is more than hive.new.job.grouping.set.cardinality.
This functionality is not supported with distincts.Either set hive.new.job.grouping.set.cardinality to a high number (higher than the number of rows per input row due to grouping sets in the query),or rewrite the query to not use distincts. The number of rows per input row due to grouping sets is 32
解决方法:可以通过修改 hive.new.job.grouping.set.cardinality 配置,或者在聚合中不用distinct来解决。我现在解决的方式还是通过 Grouping Sets 来解决,因为这个错误只会发生在 Cube和RollUp的时候,在Grouping Sets的时候没有这个错误