HIVE向量化执行引擎优化功能的bug解决(Caused by: java.lang.ClassCastException: java.lang.Integer cannot be cast to[B

项目场景:

项目任务调度 dwd层事件明细表-->dws层流量聚合表脚本开发 初期使用时一切正常 *脚本测试代码如下:*
export HIVE_HOME=/opt/apps/apache-hive-3.1.2-bin/

datestr=`date -d'-1 day' +%Y-%m-%d`

if [ $1 ]
then
datestr=$1
fi

${HIVE_HOME}/bin/hive -e "

INSERT INTO TABLE dws.app_pv_agg_session PARTITION(dt='${datestr}')
SELECT
    guid,
    sessionid as session_id,
    min(ts) as start_time,
    max(ts) as end_time,
    max(first_page) as first_page,
    max(last_page) as last_page,
    count(1) as pv_cnt,
    max(isnew) as isnew,
    max(hour) as hour,
    max(province) as province,
    max(city) as city,
    max(district) as district,
    max(devicetype) as device_type
FROM
    (
        SELECT
            guid,
            sessionid,
            ts,
            isnew,
            first_value(properties['pageid']) over(partition by sessionid order by ts) as first_page,
            last_value(properties['pageid']) over(partition by sessionid order by ts) as last_page,
            first_value(hour(from_unixtime(cast(ts/1000 as bigint)))) over(partition by sessionid order by ts) as hour,
            first_value(province) over(partition by sessionid order by ts) as province,
            first_value(city) over(partition by sessionid order by ts) as city,
            first_value(district) over(partition by sessionid order by ts) as district,
            devicetype
        FROM dwd.app_event_dtl
        WHERE dt = '${datestr}' AND eventid = 'pageView'
    ) O
GROUP BY guid,sessionid
"
if [ $? -eq 0 ]
then 
echo "execution succeed ,a mail has been sent ........."
else
echo "execution failed ,a mail has been sent ........."
fi

问题描述:

在后续数据写入过程中 , 一直发生数据无法从integer转换成byte数组

报错如下所示:在这里插入图片描述


原因分析:

一直在调试当中也没有找到有任何问题,对测试脚本代码进行优化 ,同时对数据中的数据进行查询过滤 ,发现数据方面也是没有任何问题的

优化后测试代码如下:

export HIVE_HOME=/opt/apps/apache-hive-3.1.2-bin/

datestr=`date -d'-1 day' +%Y-%m-%d`

if [ $1 ]
then
datestr=$1
fi
${HIVE_HOME}/bin/hive -e "

INSERT INTO TABLE dws.app_pv_agg_session PARTITION (dt = '${datestr}')
SELECT guid,
       sessionid       as session_id,
       min(ts)         as start_time,
       max(ts)         as end_time,
       max(split(area,',')[3]) as first_page,
       max(last_page)  as last_page,
       count(1)        as pv_cnt,
       max(isnew)      as isnew,
       max(split(area,',')[4])       as hour,
       max(split(area,',')[0])   as province,
       max(split(area,',')[1])       as city,
       max(split(area,',')[2])   as district,
       max(devicetype) as device_type
FROM (
         SELECT guid,
                sessionid,
                ts,
                isnew,
                first_value(area) over (partition by sessionid order by ts)     as area,
                last_value(pageid) over (partition by sessionid order by ts)    as last_page,
                devicetype
         FROM (
                  SELECT
                    guid,
                    isnew,
                    properties['pageid'] as pageid,
                    concat_ws(',',nvl(province,'UNKNOWN'),nvl(city,'UNKNOWN'),nvl(district,'UNKNOWN'),nvl(lpad(properties['pageid'],5,'0'),'00000'),nvl(lpad(hour(from_unixtime(cast((ts/1000) as bigint))),2,'0'),'00')) as area  ,
                    sessionid,
                    devicetype,
                    ts
                  FROM dwd.app_event_dtl
                  WHERE dt = '${datestr}'
                    AND eventid = 'pageView' AND properties['pageid'] is not null
              ) o1
     ) o2
GROUP BY guid, sessionid
"
if [ $? -eq 0 ]
then 
echo "execution succeed ,a mail has been sent ........."
else
echo "execution failed ,a mail has been sent ........."
fi

最终在查看日志和查询过程中发现 其实是HIVE中向量化执行引擎优化功能的BUG

在这里插入图片描述


解决方案:

目前官方是没有 关于这方面BUG 的解决方案 但是针对于上诉 项目中导致的 问题 我们可以添加一行代码 实现数据的正常写入
set hive.vectorized.execution.enabled = false;

官方也是给出这样的解决方案:
在这里插入图片描述
在这里插入图片描述

代码中添加后 ,问题解决 ,数据正常写入dws层流量聚合表中

  • 4
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值