sqoop java_Sqoop Java API 导入应用案例

本文介绍了如何使用Sqoop Java API从Oracle数据库导入数据到HDFS,过程中遇到了编译错误,原因是动态编译环境的classpath缺少hadoop-common包。通过分析日志和代码,找到了解决方案,即修改`java.class.path`或添加`--hadoop-mapred-home`配置项。经过调整,成功完成了数据导入。
摘要由CSDN通过智能技术生成

环境信息:

Linux+JDK1.7

Sqoop 1.4.6-cdh5.5.2

hadoop-core 2.6.0-mr1-cdh5.5.2

hadoop-common 2.6.0-cdh5.5.2

hadoop-mapreduce-client-core 2.6.0-cdh5.5.2

需求:

将oracle中的某表导入到hdfs

实现:

首先组织Sqoop命令:

String[] args = newString[] {     // Oracle数据库信息"--connect","jdbc:oracle:thin:@***:1522/**","-username","***","-password","***",

// 查询sql"--query","select * from TABLE_NAME where $CONDITIONS and create_date>=date'2017-05-01' and create_date

执行Sqoop任务:

String[] expandArguments =OptionsFileUtil.expandArguments(args);

SqoopTool tool= SqoopTool.getTool("import");

Configuration conf= newConfiguration();

conf.set("fs.default.name", "hdfs://nameservice1");//设置HDFS服务地址

Configuration loadPlugins =SqoopTool.loadPlugins(conf);

Sqoop sqoop= newSqoop((com.cloudera.sqoop.tool.SqoopTool) tool, loadPlugins);int res =Sqoop.runSqoop(sqoop, expandArguments);if (res == 0)

log.info ("成功");

完成编码后,发到测试环境进行测试,发现Sqoop在进行动态编译时报编译错误:

2017-07-26 15:10:15 [ERROR] [http-0.0.0.0-8080-6] [org.apache.sqoop.tool.ImportTool.run(ImportTool.java:613)] Encountered IOException running import job: java.io.IOException: Error returned by javac

at org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:217)

at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:108)

at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478)

at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)

at org.apache.sqoop.Sqoop.run(Sqoop.java:143)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)

at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)

动态编译的日志如果没有特殊配置的话,是无法通过log4j进行输出的,因此,编译错误需要到系统日志里查找:

/tmp/sqoop-deploy/compile/b78440d7bc7097805be8b088c525566b/QueryResult.java:7: error: package org.apache.hadoop.io does not exist

import org.apache.hadoop.io.BytesWritable;

^

/tmp/sqoop-deploy/compile/b78440d7bc7097805be8b088c525566b/QueryResult.java:8: error: package org.apache.hadoop.io does not exist

import org.apache.hadoop.io.Text;

^

/tmp/sqoop-deploy/compile/b78440d7bc7097805be8b088c525566b/QueryResult.java:9: error: package org.apache.hadoop.io does not exist

import org.apache.hadoop.io.Writable;

^

/tmp/sqoop-deploy/compile/b78440d7bc7097805be8b088c525566b/QueryResult.java:37: error: cannot access Writable

public class QueryResult extends SqoopRecord  implements DBWritable, Writable {

如上,推测是动态编译环境的classpath没有包含hadoop-common包导致的,在CompilationManager里查到了如下内容:

privateString findHadoopJars() {

String hadoopMapRedHome=options.getHadoopMapRedHome();if (null == hadoopMapRedHome) {

LOG.info("$HADOOP_MAPRED_HOME is not set");

return Jars.getJarPathForClass(JobConf.class);

}if (!hadoopMapRedHome.endsWith(File.separator)) {

hadoopMapRedHome= hadoopMapRedHome +File.separator;

}

File hadoopMapRedHomeFile= newFile(hadoopMapRedHome);

LOG.info("HADOOP_MAPRED_HOME is " +hadoopMapRedHomeFile.getAbsolutePath());

Iterator filesIterator =FileUtils.iterateFiles(hadoopMapRedHomeFile,new String[] { "jar" }, true);

StringBuilder sb= newStringBuilder();while(filesIterator.hasNext()) {

File file=filesIterator.next();

String name=file.getName();if (name.startsWith("hadoop-common")|| name.startsWith("hadoop-mapreduce-client-core")|| name.startsWith("hadoop-core")) {

sb.append(file.getAbsolutePath());

sb.append(File.pathSeparator);

}

}if (sb.length() < 1) {

LOG.warn("HADOOP_MAPRED_HOME appears empty or missing");return Jars.getJarPathForClass(JobConf.class);

}

String s= sb.substring(0, sb.length() - 1);

LOG.debug("Returning jar file path " +s);returns;

}

推测是由于配置里没有hadoopMapRedHome这个参数,导致这个方法只能取到JobConf.class所在的jar包,即hadoop-core包。打开DEBUG进行验证,找到如下日志:

2017-07-26 15:10:14 [INFO] [http-0.0.0.0-8080-6] [org.apache.sqoop.orm.CompilationManager.findHadoopJars(CompilationManager.java:85)] $HADOOP_MAPRED_HOME is not set

2017-07-26 15:10:14 [DEBUG] [http-0.0.0.0-8080-6] [org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:171)] Current sqoop classpath = :/usr/local/tomcat6/bin/bootstrap.jar

2017-07-26 15:10:14 [DEBUG] [http-0.0.0.0-8080-6] [org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:195)] Adding source file: /tmp/sqoop-deploy/compile/1baf2f947722b9531d4a27b1e5ef5aca/QueryResult.java

2017-07-26 15:10:14 [DEBUG] [http-0.0.0.0-8080-6] [org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:199)] Invoking javac with args:

2017-07-26 15:10:14 [DEBUG] [http-0.0.0.0-8080-6] [org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:201)]   -sourcepath

2017-07-26 15:10:14 [DEBUG] [http-0.0.0.0-8080-6] [org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:201)]   /tmp/sqoop-deploy/compile/1baf2f947722b9531d4a27b1e5ef5aca/

2017-07-26 15:10:14 [DEBUG] [http-0.0.0.0-8080-6] [org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:201)]   -d

2017-07-26 15:10:14 [DEBUG] [http-0.0.0.0-8080-6] [org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:201)]   /tmp/sqoop-deploy/compile/1baf2f947722b9531d4a27b1e5ef5aca/

2017-07-26 15:10:14 [DEBUG] [http-0.0.0.0-8080-6] [org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:201)]   -classpath

2017-07-26 15:10:14 [DEBUG] [http-0.0.0.0-8080-6] [org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:201)]   :/usr/local/tomcat6/bin/bootstrap.jar:/var/www/webapps/***/WEB-INF/lib/hadoop-core-2.6.0-mr1-cdh5.5.2.jar:/var/www/webapps/***/WEB-INF/lib/sqoop-1.4.6-cdh5.5.2.jar

果然是缺少了jar包。在CompilationManager中查到classpath的组装方式如下:

String curClasspath = System.getProperty("java.class.path");

LOG.debug("Current sqoop classpath = " +curClasspath);

args.add("-sourcepath");

args.add(jarOutDir);

args.add("-d");

args.add(jarOutDir);

args.add("-classpath");

args.add(curClasspath+ File.pathSeparator + coreJar + sqoopJar);

可以通过两种方式将缺失的jar添加进去:

1.直接修改java.class.path:

String curClasspath = System.getProperty ("java.class.path");

curClasspath=curClasspath+File.pathSeparator+ "/var/www/webapps/***/WEB-INF/lib/hadoop-common-2.6.0-cdh5.5.2.jar"

+File.pathSeparator+ "/var/www/webapps/***/WEB-INF/lib/hadoop-mapreduce-client-core-2.6.0-cdh5.5.2.jar";

System.setProperty ("java.class.path", curClasspath);

2.增加配置项(未尝试):

--hadoop-mapred-home

使用第一种方式后,已经能够正常进行导入操作:

2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.JobClient.monitorAndPrintJob(JobClient.java:1547)] Job complete: job_local703153215_0001

2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:566)] Counters: 18

2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:568)]   File System Counters

2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]     FILE: Number of bytes read=15015144

2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]     FILE: Number of bytes written=15688984

2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]     FILE: Number of read operations=0

2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]     FILE: Number of large read operations=0

2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]     FILE: Number of write operations=0

2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]     HDFS: Number of bytes read=0

2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]     HDFS: Number of bytes written=1536330810

2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]     HDFS: Number of read operations=40

2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]     HDFS: Number of large read operations=0

2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]     HDFS: Number of write operations=36

2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:568)]   Map-Reduce Framework

2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]     Map input records=3272909

2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]     Map output records=3272909

2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]     Input split bytes=455

2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]     Spilled Records=0

2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]     CPU time spent (ms)=0

2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]     Physical memory (bytes) snapshot=0

2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]     Virtual memory (bytes) snapshot=0

2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]     Total committed heap usage (bytes)=4080271360

2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:184)] Transferred 1.4308 GB in 71.5332 seconds (20.4822 MB/sec)

2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:186)] Retrieved 3272909 records.

至此,Sqoop Java API 导入demo完成。

参考文章:

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值