Hive Cube RollUP 分组问题

问题1:

   在使用 Cube 或者 RollUP的时候,如果分组条件>=5个的时候,会出现一个特殊的问题

   当分组条件个数<=4的时候:(可以正常运行且没有错误)

 

SELECT
    nvl (wm,'ALL'),
    nvl (product_id, 'ALL'),
    nvl (
      count(
        distinct(
        IF (
          event_id IN ('BS00A003', 'BW00H010'),
          product_id,
          null
        )
        )
      ),
      0
    ) share_commodity_num--商品被分享次数
FROM
    gds.xx                                                        
WHERE
    dt >= '${hivevar:sdate}'
AND dt <= '${hivevar:edate}'
AND length(shop_id) > 1
AND wm is not null
AND wm !='-'
AND category_id_1 is not null and category_id_1 !='-'
AND category_id_2 is not null and category_id_2 !='-'
AND category_id_3 is not null and category_id_3 !='-'
AND category_id_4 is not null and category_id_4 !='-'
GROUP BY
    wm,
    category_id_1,
    category_id_2,
    product_id with cube;
    

  当分组条件个数>=5的时候:(会提示如下错误)

 

SELECT
    nvl (wm,'ALL'),
    nvl (product_id, 'ALL'),
    nvl (
      count(
        distinct(
        IF (
          event_id IN ('BS00A003', 'BW00H010'),
          product_id,
          null
        )
        )
      ),
      0
    ) share_commodity_num--商品被分享次数
FROM
    gds.xx                                                        
WHERE
    dt >= '${hivevar:sdate}'
AND dt <= '${hivevar:edate}'
AND length(shop_id) > 1
AND wm is not null
AND wm !='-'
AND category_id_1 is not null and category_id_1 !='-'
AND category_id_2 is not null and category_id_2 !='-'
AND category_id_3 is not null and category_id_3 !='-'
AND category_id_4 is not null and category_id_4 !='-'
GROUP BY
    wm,
    category_id_1,
    category_id_2,
    category_id_3,
    product_id with cube;
    

   Error while compiling statement: FAILED: SemanticException [Error 10210]: Grouping sets aggregations (with rollups or cubes) are not allowed if aggregation function parameters overlap with the aggregation functions columns
   我现在解决这个问题的方式就是通过修改 sql语句的写法来处理,可以使用子查询(一般不能满足需求) 或者 Grouping Sets(可以满足需求,但是组合条件会比较多)

   情况2:当from 的表需要通过表连接来得到的时候,即使cube或者rollup或者grouping sets 都会出现问题,这种情况下只能将表连接得到的表先保存到一个临时表中,在查询临时表当做数据源

问题2:

   当Cube 或者 RollUp 的分组条件>=5的时候,会提示你在聚合函数中不能使用 distinct 会提示如下错误信息:

   An additional MR job is introduced since the cardinality of grouping sets is more than hive.new.job.grouping.set.cardinality.

   This functionality is not supported with distincts.Either set hive.new.job.grouping.set.cardinality to a high number (higher than the number of rows per input row due to grouping sets in the query),or rewrite the query to not use distincts. The number of rows per input row due to grouping sets is 32

解决方法:可以通过修改 hive.new.job.grouping.set.cardinality 配置,或者在聚合中不用distinct来解决。我现在解决的方式还是通过 Grouping Sets 来解决,因为这个错误只会发生在 Cube和RollUp的时候,在Grouping Sets的时候没有这个错误







  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 2
    评论
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值