Sqoop同步clob字段类型数据问题
问题背景
需要通过sqoop将oracle中含有clob字段的表同步至hive中,字段长度过长的话会报错,Buffer too small for CLOB to CHAR
问题分析
我使用的表中clob字段实际长度是超过char(4000)的长度的,这样也就导致了报错,无法进行数据同步
处理方式
通过
to_char(substr(col,0,4000)),截取字段长度,再转为char类型,进行同步。但仍需注意的是,是否会存在截断过后数据丢失的情况
另附
我这边使用的是CDH集群上的hive组件,可能存在配置问题,–map-column-hive/java col string不生效
官方回答:
ORA-22835
Buffer too small for CLOB to CHAR or BLOB to RAW conversion (actual: string, maximum: string)
cause
An attempt was made to convert CLOB to CHAR or BLOB to RAW, where the LOB size was bigger than the buffer limit for CHAR and RAW types. Note that widths are reported in characters if character length semantics are in effect for the column, otherwise widths are reported in bytes.
Action
Do one of the following:
Make the LOB smaller before performing the conversion, for example, by using SUBSTR on CLOB
Use DBMS_LOB.SUBSTR to convert CLOB to CHAR or BLOB to RAW.
https://docs.oracle.com/en/error-help/db/ora-22835/