这段时间碰见了一个奇怪的sqoop导入问题。我的业务目标是想将postgresql库里的某张表内的数据导入到hive里,而且在导入的时候需要做一步查询。但在导入的时候,围绕着"$CONDITION"这个参数,会有不同的运行结果,有的报错,有的查询结果为空,我们下面来看看:
在pgsql里的查询语句为:
# postgresql 查询语句
select id,check_code,lastupdatetime,check_num,report_name,json_object_keys(reportinfo),json_each_text(reportinfo) from check_report where reportinfo::Text<>'null';
- 1
- 2
单独运行,该语句可以顺利查询出结果:
接下来就是sqoop语句,有多种版本,我们一个个来看:
版本一:不带参数“$CONDITIONS”,且用单引号
版本一sqoop语句:
sqoop import --connect jdbc:postgresql://192.168.xxx.xxx:5432/xxxxxxdb
–username xxxxxx --password xxxxxxxx
–query '
select id,check_code,lastupdatetime,check_num,report_name,json_object_keys(reportinfo) from check_report where reportinfo::Text<>‘null’ '
–delete-target-dir
–target-dir /user/hive/warehouse/lmj_test.db/mid_yyts_report_analyze
–fields-terminated-by ‘\001’
–null-string ‘’
–null-non-string ‘’
-m1
# 版本一结果
19/05/28 14:28:02 INFO sqoop.Sqoop: Running Sqoop version:
19/05/28 14:28:02 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
19/05/28 14:28:03 INFO manager.SqlManager: Using default fetchSize of 1000
19/05/28 14:28:03 INFO tool.CodeGenTool: Beginning code generation
19/05/28 14:28:03 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: Query [select id,check_code,lastupdatetime,check_num,report_name,json_object_keys(reportinfo) from check_report where reportinfo::Text<>null ] must contain '$CONDITIONS' in WHERE clause.
at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:300)
at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1773)
at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1578)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:96)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:481)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:470)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:616)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
Time taken: 2 s
task IDE_TASK_6E9C5A22-CDCA-CF55-F327-FD20061225F3_20190528142801866 is complete.
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
直接报错,说需要$CONDITIONS
版本二:带参数“$CONDITIONS”,且用单引号
版本二sqoop语句:
sqoop import --connect jdbc:postgresql://192.168.xxx.xxx:5432/xxxxxxdb
–username xxxxxx --password xxxxxxxx
–query '
select id,check_code,lastupdatetime,check_num,report_name,json_object_keys(reportinfo) from check_report where reportinfo::Text<>‘null’ and $CONDITIONS '
–delete-target-dir
–target-dir /user/hive/warehouse/lmj_test.db/mid_yyts_report_analyze
–fields-terminated-by ‘\001’
–null-string ‘’
–null-non-string ‘’
-m1
# 版本二结果
19/05/28 14:37:36 INFO sqoop.Sqoop: Running Sqoop version:
19/05/28 14:37:36 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
19/05/28 14:37:36 INFO manager.SqlManager: Using default fetchSize of 1000
19/05/28 14:37:36 INFO tool.CodeGenTool: Beginning code generation
19/05/28 14:37:37 INFO manager.SqlManager: Executing SQL statement: select id,check_code,lastupdatetime,check_num,report_name,json_object_keys(reportinfo) from check_report where reportinfo::Text<>null and (1 = 0)
19/05/28 14:37:37 INFO manager.SqlManager: Executing SQL statement: select id,check_code,lastupdatetime,check_num,report_name,json_object_keys(reportinfo) from check_report where reportinfo::Text<>null and (1 = 0)
19/05/28 14:37:37 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/bfd/hadoop
注: /tmp/sqoop-jupiter/compile/c4cfbf5cd9a3f0e1cc149ffec8265dcb/QueryResult.java使用或覆盖了已过时的 API。
注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。
19/05/28 14:37:39 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-jupiter/compile/c4cfbf5cd9a3f0e1cc149ffec8265dcb/QueryResult.jar
19/05/28 14:37:39 INFO tool.ImportTool: sqoopImportType&tenantName=null&null&null
19/05/28 14:37:39 INFO tool.ImportTool: final outputPath folder: mid_yyts_report_analyze
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/bfd/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/bfd/hbase-1.1.2/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
19/05/28 14:37:40 INFO tool.ImportTool: Destination directory /user/hive/warehouse/lmj_test.db/mid_yyts_report_analyze deleted.
19/05/28 14:37:40 INFO mapreduce.ImportJobBase: Beginning query import.
19/05/28 14:37:41 INFO impl.TimelineClientImpl: Timeline service address: http://dthost25:8188/ws/v1/timeline/
19/05/28 14:37:42 INFO client.RMProxy: Connecting to ResourceManager at dthost25/192.168.103.25:8050
19/05/28 14:37:44 INFO db.DBInputFormat: Using read commited transaction isolation
19/05/28 14:37:45 INFO mapreduce.JobSubmitter: number of splits:1
19/05/28 14:37:45 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1554176896418_0412
19/05/28 14:37:45 INFO impl.YarnClientImpl: Submitted application application_1554176896418_0412
19/05/28 14:37:45 INFO mapreduce.Job: The url to track the job: http://dthost25:8088/proxy/application_1554176896418_0412/
19/05/28 14:37:45 INFO mapreduce.Job: Running job: job_1554176896418_0412
19/05/28 14:37:54 INFO mapreduce.Job: Job job_1554176896418_0412 running in uber mode : false
19/05/28 14:37:54 INFO mapreduce.Job: map 0% reduce 0%
19/05/28 14:38:04 INFO mapreduce.Job: map 100% reduce 0%
19/05/28 14:38:04 INFO mapreduce.Job: Job job_1554176896418_0412 completed successfully
19/05/28 14:38:04 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=134608
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=87
HDFS: Number of bytes written=0
HDFS: Number of read operations=4
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Other local map tasks=1
Total time spent by all maps in occupied slots (ms)=12224
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=6112
Total vcore-seconds taken by all map tasks=6112
Total megabyte-seconds taken by all map tasks=12517376
Map-Reduce Framework
Map input records=0
Map output records=0
Input split bytes=87
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=298
CPU time spent (ms)=1350
Physical memory (bytes) snapshot=223698944
Virtual memory (bytes) snapshot=2527227904
Total committed heap usage (bytes)=563085312
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=0
19/05/28 14:38:04 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 23.2572 seconds (0 bytes/sec)
19/05/28 14:38:04 INFO mapreduce.ImportJobBase: Retrieved 0 records.
Time taken: 29 s
task IDE_TASK_9DDD516A-D19E-C4AD-72BF-FD28C5837487_2019052814373417 is complete.
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
可以看到,语句转为了“select id,check_code,lastupdatetime,check_num,report_name,json_object_keys(reportinfo) from check_report where reportinfo::Text<>null and (1 = 0)”来执行。
并且语句可以正常执行,但是导入的数据量却为0——“ INFO mapreduce.ImportJobBase: Transferred 0 bytes in 23.2572 seconds (0 bytes/sec)”
版本三:带参数“$CONDITIONS”,且用双引号
版本三sqoop语句:
sqoop import --connect jdbc:postgresql://192.168.xxx.xxx:5432/xxxxxxdb
–username xxxxxx --password xxxxxxxx
–query "
select id,check_code,lastupdatetime,check_num,report_name,json_object_keys(reportinfo) from check_report where reportinfo::Text<>‘null’ and $CONDITIONS "
–delete-target-dir
–target-dir /user/hive/warehouse/lmj_test.db/mid_yyts_report_analyze
–fields-terminated-by ‘\001’
–null-string ‘’
–null-non-string ‘’
-m1
19/05/28 14:43:26 INFO sqoop.Sqoop: Running Sqoop version:
19/05/28 14:43:26 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
19/05/28 14:43:26 INFO manager.SqlManager: Using default fetchSize of 1000
19/05/28 14:43:26 INFO tool.CodeGenTool: Beginning code generation
19/05/28 14:43:27 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: Query [select id,check_code,lastupdatetime,check_num,report_name,json_object_keys(reportinfo) from check_report where reportinfo::Text<>'null' and ] must contain '$CONDITIONS' in WHERE clause.
at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:300)
at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1773)
at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1578)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:96)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:481)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:470)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:616)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
Time taken: 2 s
task IDE_TASK_00C737BF-0D70-D88B-1864-FD2E1F561F47_20190528144325116 is complete.
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
语句无法正常执行,和上面版本一的报错一模一样,说缺少了$CONDITIONS,但明明我已经加进去了
版本四:带参数“$CONDITIONS”,且用双引号
这里注意,我加的参数是\$CONDITIONS
而不是$CONDITIONS
,多了一个\
版本四sqoop语句:
sqoop import --connect jdbc:postgresql://192.168.xxx.xxx:5432/xxxxxxdb
–username xxxxxx --password xxxxxxxx
–query "
select id,check_code,lastupdatetime,check_num,report_name,json_object_keys(reportinfo) from check_report where reportinfo::Text<>‘null’ and \$CONDITIONS "
–delete-target-dir
–target-dir /user/hive/warehouse/lmj_test.db/mid_yyts_report_analyze
–fields-terminated-by ‘\001’
–null-string ‘’
–null-non-string ‘’
-m1
#版本四运行结果
19/05/28 14:47:20 INFO sqoop.Sqoop: Running Sqoop version:
19/05/28 14:47:20 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
19/05/28 14:47:20 INFO manager.SqlManager: Using default fetchSize of 1000
19/05/28 14:47:20 INFO tool.CodeGenTool: Beginning code generation
19/05/28 14:47:20 INFO manager.SqlManager: Executing SQL statement: select id,check_code,lastupdatetime,check_num,report_name,json_object_keys(reportinfo) from check_report where reportinfo::Text<>'null' and (1 = 0)
19/05/28 14:47:20 INFO manager.SqlManager: Executing SQL statement: select id,check_code,lastupdatetime,check_num,report_name,json_object_keys(reportinfo) from check_report where reportinfo::Text<>'null' and (1 = 0)
19/05/28 14:47:20 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/bfd/hadoop
注: /tmp/sqoop-jupiter/compile/c1d1cbb98b7a2abc6ebd0273c2e8d60e/QueryResult.java使用或覆盖了已过时的 API。
注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。
19/05/28 14:47:23 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-jupiter/compile/c1d1cbb98b7a2abc6ebd0273c2e8d60e/QueryResult.jar
19/05/28 14:47:23 INFO tool.ImportTool: sqoopImportType&tenantName=null&null&null
19/05/28 14:47:23 INFO tool.ImportTool: final outputPath folder: mid_yyts_report_analyze
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/bfd/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/bfd/hbase-1.1.2/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
19/05/28 14:47:24 INFO tool.ImportTool: Destination directory /user/hive/warehouse/lmj_test.db/mid_yyts_report_analyze deleted.
19/05/28 14:47:24 INFO mapreduce.ImportJobBase: Beginning query import.
19/05/28 14:47:25 INFO impl.TimelineClientImpl: Timeline service address: http://dthost25:8188/ws/v1/timeline/
19/05/28 14:47:25 INFO client.RMProxy: Connecting to ResourceManager at dthost25/192.168.103.25:8050
19/05/28 14:47:28 INFO db.DBInputFormat: Using read commited transaction isolation
19/05/28 14:47:28 INFO mapreduce.JobSubmitter: number of splits:1
19/05/28 14:47:28 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1554176896418_0413
19/05/28 14:47:29 INFO impl.YarnClientImpl: Submitted application application_1554176896418_0413
19/05/28 14:47:29 INFO mapreduce.Job: The url to track the job: http://dthost25:8088/proxy/application_1554176896418_0413/
19/05/28 14:47:29 INFO mapreduce.Job: Running job: job_1554176896418_0413
19/05/28 14:47:38 INFO mapreduce.Job: Job job_1554176896418_0413 running in uber mode : false
19/05/28 14:47:38 INFO mapreduce.Job: map 0% reduce 0%
19/05/28 14:47:47 INFO mapreduce.Job: map 100% reduce 0%
19/05/28 14:47:47 INFO mapreduce.Job: Job job_1554176896418_0413 completed successfully
19/05/28 14:47:47 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=134612
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=87
HDFS: Number of bytes written=848405
HDFS: Number of read operations=4
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Other local map tasks=1
Total time spent by all maps in occupied slots (ms)=12992
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=6496
Total vcore-seconds taken by all map tasks=6496
Total megabyte-seconds taken by all map tasks=13303808
Map-Reduce Framework
Map input records=11854
Map output records=11854
Input split bytes=87
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=96
CPU time spent (ms)=4530
Physical memory (bytes) snapshot=215683072
Virtual memory (bytes) snapshot=2517082112
Total committed heap usage (bytes)=563085312
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=848405
19/05/28 14:47:47 INFO mapreduce.ImportJobBase: Transferred 828.5205 KB in 23.3241 seconds (35.522 KB/sec)
19/05/28 14:47:47 INFO mapreduce.ImportJobBase: Retrieved 11854 records.
Time taken: 29 s
task IDE_TASK_8729A489-3760-383C-614C-FD31AEB71237_20190528144718353 is complete.
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
神奇的是,这次与执行成功了!而且数据量不为0,数据只真实有效的。
结论
也就是说,当我们用sqoop,并且采用query的时候,我们最好使用双引号,而且如果有where语句,必须加上“\CONDITIONS”,注意有“\”