这段时间碰见了一个奇怪的sqoop导入问题。我的业务目标是想将postgresql库里的某张表内的数据导入到hive里,而且在导入的时候需要做一步查询。但在导入的时候,围绕着"$CONDITION"这个参数,会有不同的运行结果,有的报错,有的查询结果为空,我们下面来看看:
在pgsql里的查询语句为:
# postgresql 查询语句
select id,check_code,lastupdatetime,check_num,report_name,json_object_keys(reportinfo),json_each_text(reportinfo) from check_report where reportinfo::Text<>'null';
单独运行,该语句可以顺利查询出结果:
接下来就是sqoop语句,有多种版本,我们一个个来看:
版本一:不带参数“$CONDITIONS”,且用单引号
版本一sqoop语句:
sqoop import --connect jdbc:postgresql://192.168.xxx.xxx:5432/xxxxxxdb
–username xxxxxx --password xxxxxxxx
–query '
select id,check_code,lastupdatetime,check_num,report_name,json_object_keys(reportinfo) from check_report where reportinfo::Text<>‘null’ '
–delete-target-dir
–target-dir /user/hive/warehouse/lmj_test.db/mid_yyts_report_analyze
–fields-terminated-by ‘\001’
–null-string ‘’
–null-non-string ‘’
-m1
# 版本一结果
19/05/28 14:28:02 INFO sqoop.Sqoop: Running Sqoop version:
19/05/28 14:28:02 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
19/05/28 14:28:03 INFO manager.SqlManager: Using default fetchSize of 1000
19/05/28 14:28:03 INFO tool.CodeGenTool: Beginning code generation
19/05/28 14:28:03 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: Query [select id,check_code,lastupdatetime,check_num,report_name,json_object_keys(reportinfo) from check_report where reportinfo::Text<>null ] must contain '$CONDITIONS' in WHERE clause.
at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:300)
at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1773)
at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1578)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:96)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:481)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:470)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:616)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
Time taken: 2 s
task IDE_TASK_6E9C5A22-CDCA-CF55-F327-FD20061225F3_20190528142801866 is complete.
直接报错,说需要$CONDITIONS
版本二:带参数“$CONDITIONS”,且用单引号
版本二sqoop语句:
sqoop import --connect jdbc:postgresql://192.168.xxx.xxx:5432/xxxxxxdb
–username xxxxxx --password xxxxxxxx
–query '
select id,check_code,lastupdatetime,check_num,report_name,json_object_keys(reportinfo) from check_report where reportinfo::Text<>‘null’ and $CONDITIONS '
–delete-