hive表导入数据:FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

hive程序报FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

欢迎使用Markdown编辑器

在进行拉链表练习的时候,练习代码如下:
数据

2019-08-21订单数据表(order)
1,2019-08-18,2019-08-18,创建
2,2019-08-18,2019-08-18,创建
3,2019-08-19,2019-08-21,支付
4,2019-08-19,2019-08-21,完成
5,2019-08-19,2019-08-20,支付
6,2019-08-20,2019-08-20,创建
7,2019-08-20,2019-08-21,支付
8,2019-08-21,2019-08-21,创建

2019-08-22订单数据表(order)
*1,2019-08-18,2019-08-22,支付
*2,2019-08-18,2019-08-22,完成
3,2019-08-19,2019-08-21,支付
4,2019-08-19,2019-08-21,完成
5,2019-08-19,2019-08-20,支付
*6,2019-08-20,2019-08-22,支付
7,2019-08-20,2019-08-21,支付
*8,2019-08-21,2019-08-22,支付
*9,2019-08-22,2019-08-22,创建
*10,2019-08-22,2019-08-22,支付


2019-08-23订单数据表
1,2019-08-18,2019-08-23,完成
2,2019-08-18,2019-08-22,完成
3,2019-08-19,2019-08-23,完成
4,2019-08-19,2019-08-21,完成
5,2019-08-19,2019-08-23,完成
6,2019-08-20,2019-08-22,支付
7,2019-08-20,2019-08-21,支付
8,2019-08-21,2019-08-23,完成
9,2019-08-22,2019-08-22,创建
10,2019-08-22,2019-08-22,支付
11,2019-08-23,2019-08-23,创建
12,2019-08-23,2019-08-23,创建
13,2019-08-23,2019-08-23,支付

建表

CREATE TABLE orders_20190821 (
orderid INT,
createtime STRING,
modifiedtime STRING,
status STRING
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
stored AS textfile;

CREATE TABLE orders_20190822 (
orderid INT,
createtime STRING,
modifiedtime STRING,
status STRING
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
stored AS textfile;

CREATE TABLE orders_20190823 (
orderid INT,
createtime STRING,
modifiedtime STRING,
status STRING
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
stored AS textfile;

数据导入:
load data local inpath '/root/hive/order_2019-08-21.txt' into table orders_20190821;
load data local inpath '/root/hive/order_2019-08-22.txt' into table orders_20190822;
load data local inpath '/root/hive/order_2019-08-23.txt' into table orders_20190823;

ODS层设计:

在数据仓库的ODS层,有一张订单的增量数据表,按天分区,存放每天的增量数据:

CREATE TABLE ods_orders_inc (
orderid INT,
createtime STRING,
modifiedtime STRING,
status STRING
)PARTITIONED BY (day STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
stored AS textfile;

DW层设计:

在数据仓库的DW层,有一张订单的历史数据拉链表,存放订单的历史状态数据:

CREATE TABLE dw_orders_his (
orderid INT,
createtime STRING,
modifiedtime STRING,
status STRING,
dw_start_date STRING,
dw_end_date STRING
) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
stored AS textfile;

第一步,抽取全量数据到ODS:

INSERT overwrite TABLE ods_orders_inc PARTITION (day = '2019-08-20')
SELECT orderid,createtime,modifiedtime,status
FROM orders_20190821
WHERE createtime <= '2019-08-20';

第二步,从ODS刷新到DW:

INSERT overwrite TABLE dw_orders_his
SELECT orderid,createtime,modifiedtime,status,
createtime AS dw_start_date,
'9999-12-31' AS dw_end_date
FROM ods_orders_inc
WHERE day = '2019-08-20';

增量数据导入
从2019-08-22开始,需要每天正常刷新前一天(2019-08-21)的增量数据到历史表。

第一步,通过增量抽取,将2019-08-21的数据抽取到ODS:

INSERT overwrite TABLE ods_orders_inc PARTITION (day = '2019-08-21')
SELECT orderid,createtime,modifiedtime,status
FROM orders_20190821
WHERE modifiedtime = '2019-08-21' or createtime = '2019-08-21';

第二步,通过DW历史数据(数据日期为2019-08-20),和ODS增量数据(2019-08-21),刷新历史表:
先把数据放到一张临时表中:临时存储,防止出错有地方追溯。

DROP TABLE IF EXISTS dw_orders_his_tmp;
CREATE TABLE dw_orders_his_tmp AS 
SELECT orderid,
createtime,
modifiedtime,
status,
dw_start_date,
dw_end_date 
FROM (
    SELECT a.orderid,
    a.createtime,
    a.modifiedtime,
    a.status,
    a.dw_start_date,
    CASE WHEN b.orderid IS NOT NULL AND a.dw_end_date > '2019-08-21' THEN '2019-08-20' ELSE a.dw_end_date END AS dw_end_date 
    FROM dw_orders_his a 
    left outer join (SELECT * FROM ods_orders_inc WHERE day = '2019-08-21') b 
    ON (a.orderid = b.orderid) 
    UNION ALL 
    SELECT orderid,
    createtime,
    modifiedtime,
    status,
    modifiedtime AS dw_start_date,
    '9999-12-31' AS dw_end_date 
    FROM ods_orders_inc 
    WHERE day = '2019-08-21' 
) x 
ORDER BY orderid,dw_start_date;

最后overwrite到dw历史表:

INSERT overwrite TABLE dw_orders_his
SELECT * FROM dw_orders_his_tmp;

解决办法
修改yarn-site.xml中的yarn.nodemanager.resource.memory-mb值大小(2G->4G)
原先设置

<property>
      <name>yarn.nodemanager.resource.memory-mb</name>
      <value>2048</value>
</property>

修改后设置

<property>
      <name>yarn.nodemanager.resource.memory-mb</name>
      <value>4096</value>
</property>
hive 1.x版本 不支持 not in +子查询(SubQuery)

借鉴:https://www.jianshu.com/p/c6543b65cb77

深入了解:https://blog.csdn.net/qq_26442553/article/details/80143559

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值