sqoop query时单双引号选用以及$CONDITION使用的探究

这段时间碰见了一个奇怪的sqoop导入问题。我的业务目标是想将postgresql库里的某张表内的数据导入到hive里,而且在导入的时候需要做一步查询。但在导入的时候,围绕着"$CONDITION"这个参数,会有不同的运行结果,有的报错,有的查询结果为空,我们下面来看看:

在pgsql里的查询语句为:

# postgresql 查询语句
select id,check_code,lastupdatetime,check_num,report_name,json_object_keys(reportinfo),json_each_text(reportinfo) from check_report where reportinfo::Text<>'null';
  • 1
  • 2

单独运行,该语句可以顺利查询出结果:
在这里插入图片描述
接下来就是sqoop语句,有多种版本,我们一个个来看:

版本一:不带参数“$CONDITIONS”,且用单引号

版本一sqoop语句:
sqoop import --connect jdbc:postgresql://192.168.xxx.xxx:5432/xxxxxxdb
–username xxxxxx --password xxxxxxxx
–query 'select id,check_code,lastupdatetime,check_num,report_name,json_object_keys(reportinfo) from check_report where reportinfo::Text<>‘null’ '
–delete-target-dir
–target-dir /user/hive/warehouse/lmj_test.db/mid_yyts_report_analyze
–fields-terminated-by ‘\001’
–null-string ‘’
–null-non-string ‘’
-m1

# 版本一结果
19/05/28 14:28:02 INFO sqoop.Sqoop: Running Sqoop version: 
19/05/28 14:28:02 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
19/05/28 14:28:03 INFO manager.SqlManager: Using default fetchSize of 1000
19/05/28 14:28:03 INFO tool.CodeGenTool: Beginning code generation
19/05/28 14:28:03 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: Query [select id,check_code,lastupdatetime,check_num,report_name,json_object_keys(reportinfo) from check_report where reportinfo::Text<>null ] must contain '$CONDITIONS' in WHERE clause.
	at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:300)
	at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1773)
	at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1578)
	at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:96)
	at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:481)
	at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:470)
	at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:616)
	at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
	at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
	at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
	at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
	
	Time taken: 2 s 
task IDE_TASK_6E9C5A22-CDCA-CF55-F327-FD20061225F3_20190528142801866 is complete.
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22

直接报错,说需要$CONDITIONS

版本二:带参数“$CONDITIONS”,且用单引号

版本二sqoop语句:
sqoop import --connect jdbc:postgresql://192.168.xxx.xxx:5432/xxxxxxdb
–username xxxxxx --password xxxxxxxx
–query 'select id,check_code,lastupdatetime,check_num,report_name,json_object_keys(reportinfo) from check_report where reportinfo::Text<>‘null’ and $CONDITIONS '
–delete-target-dir
–target-dir /user/hive/warehouse/lmj_test.db/mid_yyts_report_analyze
–fields-terminated-by ‘\001’
–null-string ‘’
–null-non-string ‘’
-m1

# 版本二结果
19/05/28 14:37:36 INFO sqoop.Sqoop: Running Sqoop version: 
19/05/28 14:37:36 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
19/05/28 14:37:36 INFO manager.SqlManager: Using default fetchSize of 1000
19/05/28 14:37:36 INFO tool.CodeGenTool: Beginning code generation
19/05/28 14:37:37 INFO manager.SqlManager: Executing SQL statement: select id,check_code,lastupdatetime,check_num,report_name,json_object_keys(reportinfo) from check_report where reportinfo::Text<>null and  (1 = 0)
19/05/28 14:37:37 INFO manager.SqlManager: Executing SQL statement: select id,check_code,lastupdatetime,check_num,report_name,json_object_keys(reportinfo) from check_report where reportinfo::Text<>null and  (1 = 0) 
19/05/28 14:37:37 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/bfd/hadoop
注: /tmp/sqoop-jupiter/compile/c4cfbf5cd9a3f0e1cc149ffec8265dcb/QueryResult.java使用或覆盖了已过时的 API。
注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。
19/05/28 14:37:39 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-jupiter/compile/c4cfbf5cd9a3f0e1cc149ffec8265dcb/QueryResult.jar
19/05/28 14:37:39 INFO tool.ImportTool: sqoopImportType&tenantName=null&null&null
19/05/28 14:37:39 INFO tool.ImportTool: final outputPath folder: mid_yyts_report_analyze
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/bfd/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/bfd/hbase-1.1.2/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
19/05/28 14:37:40 INFO tool.ImportTool: Destination directory /user/hive/warehouse/lmj_test.db/mid_yyts_report_analyze deleted.
19/05/28 14:37:40 INFO mapreduce.ImportJobBase: Beginning query import.
19/05/28 14:37:41 INFO impl.TimelineClientImpl: Timeline service address: http://dthost25:8188/ws/v1/timeline/
19/05/28 14:37:42 INFO client.RMProxy: Connecting to ResourceManager at dthost25/192.168.103.25:8050
19/05/28 14:37:44 INFO db.DBInputFormat: Using read commited transaction isolation
19/05/28 14:37:45 INFO mapreduce.JobSubmitter: number of splits:1
19/05/28 14:37:45 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1554176896418_0412
19/05/28 14:37:45 INFO impl.YarnClientImpl: Submitted application application_1554176896418_0412
19/05/28 14:37:45 INFO mapreduce.Job: The url to track the job: http://dthost25:8088/proxy/application_1554176896418_0412/
19/05/28 14:37:45 INFO mapreduce.Job: Running job: job_1554176896418_0412
19/05/28 14:37:54 INFO mapreduce.Job: Job job_1554176896418_0412 running in uber mode : false
19/05/28 14:37:54 INFO mapreduce.Job:  map 0% reduce 0%
19/05/28 14:38:04 INFO mapreduce.Job:  map 100% reduce 0%
19/05/28 14:38:04 INFO mapreduce.Job: Job job_1554176896418_0412 completed successfully
19/05/28 14:38:04 INFO mapreduce.Job: Counters: 30
	File System Counters
		FILE: Number of bytes read=0
		FILE: Number of bytes written=134608
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=87
		HDFS: Number of bytes written=0
		HDFS: Number of read operations=4
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters 
		Launched map tasks=1
		Other local map tasks=1
		Total time spent by all maps in occupied slots (ms)=12224
		Total time spent by all reduces in occupied slots (ms)=0
		Total time spent by all map tasks (ms)=6112
		Total vcore-seconds taken by all map tasks=6112
		Total megabyte-seconds taken by all map tasks=12517376
	Map-Reduce Framework
		Map input records=0
		Map output records=0
		Input split bytes=87
		Spilled Records=0
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=298
		CPU time spent (ms)=1350
		Physical memory (bytes) snapshot=223698944
		Virtual memory (bytes) snapshot=2527227904
		Total committed heap usage (bytes)=563085312
	File Input Format Counters 
		Bytes Read=0
	File Output Format Counters 
		Bytes Written=0
19/05/28 14:38:04 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 23.2572 seconds (0 bytes/sec)
19/05/28 14:38:04 INFO mapreduce.ImportJobBase: Retrieved 0 records.


Time taken: 29 s 
task IDE_TASK_9DDD516A-D19E-C4AD-72BF-FD28C5837487_2019052814373417 is complete.
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74

可以看到,语句转为了“select id,check_code,lastupdatetime,check_num,report_name,json_object_keys(reportinfo) from check_report where reportinfo::Text<>null and (1 = 0)”来执行。

并且语句可以正常执行,但是导入的数据量却为0——“ INFO mapreduce.ImportJobBase: Transferred 0 bytes in 23.2572 seconds (0 bytes/sec)”

版本三:带参数“$CONDITIONS”,且用双引号

版本三sqoop语句:
sqoop import --connect jdbc:postgresql://192.168.xxx.xxx:5432/xxxxxxdb
–username xxxxxx --password xxxxxxxx
–query "select id,check_code,lastupdatetime,check_num,report_name,json_object_keys(reportinfo) from check_report where reportinfo::Text<>‘null’ and $CONDITIONS "
–delete-target-dir
–target-dir /user/hive/warehouse/lmj_test.db/mid_yyts_report_analyze
–fields-terminated-by ‘\001’
–null-string ‘’
–null-non-string ‘’
-m1

19/05/28 14:43:26 INFO sqoop.Sqoop: Running Sqoop version: 
19/05/28 14:43:26 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
19/05/28 14:43:26 INFO manager.SqlManager: Using default fetchSize of 1000
19/05/28 14:43:26 INFO tool.CodeGenTool: Beginning code generation
19/05/28 14:43:27 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: Query [select id,check_code,lastupdatetime,check_num,report_name,json_object_keys(reportinfo) from check_report where reportinfo::Text<>'null' and ] must contain '$CONDITIONS' in WHERE clause.
	at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:300)
	at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1773)
	at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1578)
	at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:96)
	at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:481)
	at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:470)
	at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:616)
	at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
	at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
	at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
	at org.apache.sqoop.Sqoop.main(Sqoop.java:236)



Time taken: 2 s 
task IDE_TASK_00C737BF-0D70-D88B-1864-FD2E1F561F47_20190528144325116 is complete.
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23

语句无法正常执行,和上面版本一的报错一模一样,说缺少了$CONDITIONS,但明明我已经加进去了

版本四:带参数“$CONDITIONS”,且用双引号

这里注意,我加的参数是\$CONDITIONS而不是$CONDITIONS,多了一个\

版本四sqoop语句:
sqoop import --connect jdbc:postgresql://192.168.xxx.xxx:5432/xxxxxxdb
–username xxxxxx --password xxxxxxxx
–query "select id,check_code,lastupdatetime,check_num,report_name,json_object_keys(reportinfo) from check_report where reportinfo::Text<>‘null’ and \$CONDITIONS "
–delete-target-dir
–target-dir /user/hive/warehouse/lmj_test.db/mid_yyts_report_analyze
–fields-terminated-by ‘\001’
–null-string ‘’
–null-non-string ‘’
-m1

#版本四运行结果
19/05/28 14:47:20 INFO sqoop.Sqoop: Running Sqoop version: 
19/05/28 14:47:20 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
19/05/28 14:47:20 INFO manager.SqlManager: Using default fetchSize of 1000
19/05/28 14:47:20 INFO tool.CodeGenTool: Beginning code generation
19/05/28 14:47:20 INFO manager.SqlManager: Executing SQL statement: select id,check_code,lastupdatetime,check_num,report_name,json_object_keys(reportinfo) from check_report where reportinfo::Text<>'null' and  (1 = 0) 
19/05/28 14:47:20 INFO manager.SqlManager: Executing SQL statement: select id,check_code,lastupdatetime,check_num,report_name,json_object_keys(reportinfo) from check_report where reportinfo::Text<>'null' and  (1 = 0) 
19/05/28 14:47:20 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/bfd/hadoop
注: /tmp/sqoop-jupiter/compile/c1d1cbb98b7a2abc6ebd0273c2e8d60e/QueryResult.java使用或覆盖了已过时的 API。
注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。
19/05/28 14:47:23 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-jupiter/compile/c1d1cbb98b7a2abc6ebd0273c2e8d60e/QueryResult.jar
19/05/28 14:47:23 INFO tool.ImportTool: sqoopImportType&tenantName=null&null&null
19/05/28 14:47:23 INFO tool.ImportTool: final outputPath folder: mid_yyts_report_analyze
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/bfd/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/bfd/hbase-1.1.2/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
19/05/28 14:47:24 INFO tool.ImportTool: Destination directory /user/hive/warehouse/lmj_test.db/mid_yyts_report_analyze deleted.
19/05/28 14:47:24 INFO mapreduce.ImportJobBase: Beginning query import.
19/05/28 14:47:25 INFO impl.TimelineClientImpl: Timeline service address: http://dthost25:8188/ws/v1/timeline/
19/05/28 14:47:25 INFO client.RMProxy: Connecting to ResourceManager at dthost25/192.168.103.25:8050
19/05/28 14:47:28 INFO db.DBInputFormat: Using read commited transaction isolation
19/05/28 14:47:28 INFO mapreduce.JobSubmitter: number of splits:1
19/05/28 14:47:28 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1554176896418_0413
19/05/28 14:47:29 INFO impl.YarnClientImpl: Submitted application application_1554176896418_0413
19/05/28 14:47:29 INFO mapreduce.Job: The url to track the job: http://dthost25:8088/proxy/application_1554176896418_0413/
19/05/28 14:47:29 INFO mapreduce.Job: Running job: job_1554176896418_0413
19/05/28 14:47:38 INFO mapreduce.Job: Job job_1554176896418_0413 running in uber mode : false
19/05/28 14:47:38 INFO mapreduce.Job:  map 0% reduce 0%
19/05/28 14:47:47 INFO mapreduce.Job:  map 100% reduce 0%
19/05/28 14:47:47 INFO mapreduce.Job: Job job_1554176896418_0413 completed successfully
19/05/28 14:47:47 INFO mapreduce.Job: Counters: 30
	File System Counters
		FILE: Number of bytes read=0
		FILE: Number of bytes written=134612
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=87
		HDFS: Number of bytes written=848405
		HDFS: Number of read operations=4
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters 
		Launched map tasks=1
		Other local map tasks=1
		Total time spent by all maps in occupied slots (ms)=12992
		Total time spent by all reduces in occupied slots (ms)=0
		Total time spent by all map tasks (ms)=6496
		Total vcore-seconds taken by all map tasks=6496
		Total megabyte-seconds taken by all map tasks=13303808
	Map-Reduce Framework
		Map input records=11854
		Map output records=11854
		Input split bytes=87
		Spilled Records=0
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=96
		CPU time spent (ms)=4530
		Physical memory (bytes) snapshot=215683072
		Virtual memory (bytes) snapshot=2517082112
		Total committed heap usage (bytes)=563085312
	File Input Format Counters 
		Bytes Read=0
	File Output Format Counters 
		Bytes Written=848405
19/05/28 14:47:47 INFO mapreduce.ImportJobBase: Transferred 828.5205 KB in 23.3241 seconds (35.522 KB/sec)
19/05/28 14:47:47 INFO mapreduce.ImportJobBase: Retrieved 11854 records.


Time taken: 29 s 
task IDE_TASK_8729A489-3760-383C-614C-FD31AEB71237_20190528144718353 is complete.
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74

神奇的是,这次与执行成功了!而且数据量不为0,数据只真实有效的。

结论

也就是说,当我们用sqoop,并且采用query的时候,我们最好使用双引号,而且如果有where语句,必须加上“\CONDITIONS”,注意有“\”

  • 3
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值