Trafodion的Bulk Load是用Load into …命令实现的,而默认的不带任何options的Load在遇到数据不匹配的时候会自动退出,如果是大批量数据做Load的时候自动退出可能不是用户希望的结果,用户可能更希望能在遇到错误的时候继续加载数据或者是把错误信息存放到一个单独的日志文件中,这就是写本文的主要目的了,由于Trafodion官网手册里面并没有把这部分加进去,这里就写一篇短文介绍一下。涉及到的语法如下,
load [with continue on error] into table_name select ...
load [with log error rows to log_location] into table_name select ...
下面用一个具体的例子来描述这两种语法的用法及含义,
1 创建Hive表并插入测试数据,创建Trafodion表
hive> desc test4;
OK
a varchar(5)
b int
Time taken: 1.917 seconds, Fetched: 2 row(s)
hive> select * from test4;
OK
aaa 1
bbbb 2
ccccc 3
Time taken: 0.064 seconds, Fetched: 3 row(s)
SQL>showddl test_load_tbl;
CREATE TABLE TRAFODION.SEABASE.TEST_LOAD_TBL
(
A VARCHAR(4) CHARACTER SET ISO88591 COLLATE
DEFAULT DEFAULT NULL NOT SERIALIZED
, B INT DEFAULT NULL NOT SERIALIZED
)
ATTRIBUTES ALIGNED FORMAT
;
--- SQL operation complete.
2 测试不带option的Load,即遇到错误行停止
SQL>load into test_load_tbl select * from hive.hive.test4;
UTIL_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------
Task: LOAD Status: Started Object: TRAFODION.SEABASE.TEST_LOAD_TBL
Task: CLEANUP Status: Started Object: TRAFODION.SEABASE.TEST_LOAD_TBL
Task: CLEANUP Status: Ended Object: TRAFODION.SEABASE.TEST_LOAD_TBL
Task: PREPARATION Status: Started Object: TRAFODION.SEABASE.TEST_LOAD_TBL
*** ERROR[8402] A string overflow occurred during the evaluation of a character expression. Conversion of Source Type:VARCHAR(REC_BYTE_V_ASCII,5 BYTES,ISO88591) Source Value:ccccc to Target Type:VARCHAR(REC_BYTE_V_ASCII,4 BYTES,ISO88591). [2017-03-03 09:27:25]
3 测试load with continue on error,从下面结果发现,Hive源表有三条数据,因为有一条数据长度超过Varchar(4),所以被过滤掉,而另外两条数据则正常加载
SQL>load with continue on error into test_load_tbl select * from hive.hive.test4;
UTIL_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------
Task: LOAD Status: Started Object: TRAFODION.SEABASE.TEST_LOAD_TBL
Task: CLEANUP Status: Started Object: TRAFODION.SEABASE.TEST_LOAD_TBL
Task: CLEANUP Status: Ended Object: TRAFODION.SEABASE.TEST_LOAD_TBL
Task: PREPARATION Status: Started Object: TRAFODION.SEABASE.TEST_LOAD_TBL
Rows Processed: 2
Task: PREPARATION Status: Ended ET: 00:00:02.635
Task: COMPLETION Status: Started Object: TRAFODION.SEABASE.TEST_LOAD_TBL
Task: COMPLETION Status: Ended ET: 00:00:00.562
--- SQL operation complete.
SQL>select * from test_load_tbl;
A B
---- -----------
aaa 1
bbbb 2
--- 2 row(s) selected.
4 测试load with log error rows to,加载可以正常完成,从日志文件中能看到错误信息及错误行内容
SQL>load with log error rows to '/bulkload/logs' into test_load_tbl select * from hive.hive.test4;
UTIL_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------
Task: LOAD Status: Started Object: TRAFODION.SEABASE.TEST_LOAD_TBL
Task: CLEANUP Status: Started Object: TRAFODION.SEABASE.TEST_LOAD_TBL
Task: CLEANUP Status: Ended Object: TRAFODION.SEABASE.TEST_LOAD_TBL
Task: PREPARATION Status: Started Object: TRAFODION.SEABASE.TEST_LOAD_TBL
Rows Processed: 2
Task: PREPARATION Status: Ended ET: 00:00:02.652
Task: COMPLETION Status: Started Object: TRAFODION.SEABASE.TEST_LOAD_TBL
Task: COMPLETION Status: Ended ET: 00:00:00.565
--- SQL operation complete.
[trafodion@n12 ~]$ hadoop fs -cat /bulkload/logs/ERR_20170303_013347/HIVE.TEST4_hive_scan_err_0
ccccc3
*** ERROR[8402] A string overflow occurred during the evaluation of a character expression. Conversion of Source Type:VARCHAR(REC_BYTE_V_ASCII,5 BYTES,ISO88591) Source Value:ccccc to Target Type:VARCHAR(REC_BYTE_V_ASCII,4 BYTES,ISO88591).