hue上workflow之shell-sqoop脚本配置实录

workflow之shell-sqoop脚本实录

前面讲解了sqoop的密码明文问题解决与sqoop导入分库分表mysql数据问题解决,那么这里就详细介绍下在hue上配置shell-sqoop脚本时所遇到的问题!这里的shell脚本会以上篇的脚本为例!

一、配置hue的workflow

在这里插入图片描述

二、所遇问题

2.1 不能加载mysql驱动

报错如下:

ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.RuntimeException: Could not load db driver class: com.mysql.jdbc.Driver
java.lang.RuntimeException: Could not load db driver class: com.mysql.jdbc.Driver
	at org.apache.sqoop.manager.SqlManager.makeConnection(SqlManager.java:874)
	at org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:52)
	at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:762)
	at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:785)
	at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:288)
	at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:259)
	at org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:245)
	at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:333)
	at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1858)
	at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1657)
	at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:106)
	at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:494)
	at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:621)
	at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
	at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
	at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
	at org.apache.sqoop.Sqoop.main(Sqoop.java:252)

问题解决:由于是使用的shell脚本进行的导入所以回去SQOOP_HOME的lib目录下去加载mysql驱动,所以这里只需要把mysql驱动拷贝到sqoop的lib目录下,然后重启服务就好了。操作过程如下

# 获取驱动
wget http://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.46.tar.gz/from/http://cdn.mysql.com/
# 拷贝驱动到SQOOP_HOME的lib目录下
cp /usr/share/java/mysql-connector-java-5.1.46.jar /data/cloudera/parcels/CDH-5.14.4-1.cdh5.14.4.p0.3/lib/sqoop/lib
# 分发每台机器,xsync这里是自定义的分发脚本,参考之前博客hadoop,也可使用scp命名去进行分发
xsync /data/cloudera/parcels/CDH-5.14.4-1.cdh5.14.4.p0.3/lib/sqoop/lib/mysql-connector-java-5.1.46.jar

2.2 调度时报错文件不存在

报错如下:

org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.FileNotFoundException: File does not exist: hdfs://nameservice1/user/admin/.staging/job_1590901453442_0029/job.splitmetainfo
	at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1580)
	at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1444)
	at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1402)
	at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
	at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:996)
	at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:138)
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1366)
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1142)
	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$5.run(MRAppMaster.java:1573)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1924)
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1569)
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1502)
Caused by: java.io.FileNotFoundException: File does not exist: hdfs://nameservice1/user/admin/.staging/job_1590901453442_0029/job.splitmetainfo
	at org.apache.hadoop.hdfs.DistributedFileSystem$20.doCall(DistributedFileSystem.java:1269)
	at org.apache.hadoop.hdfs.DistributedFileSystem$20.doCall(DistributedFileSystem.java:1261)
	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
	at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1261)
	at org.apache.hadoop.mapreduce.split.SplitMetaInfoReader.readSplitMetaInfo(SplitMetaInfoReader.java:51)
	at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1575)
	... 17 more

问题解决:由于hue执行shell脚本时是随机发往一台集群中的机器执行,而当我们运行该任务时,任务会在当前hue用户下创建job等任务信息,但是在下发至集群机器执行时没有指定执行任务的用户使其无法获取任务信息从而提示文件不存在。具体解决如下:

HADOOP_USER_NAME=${wf:user()}

在这里插入图片描述

2.3 oozie调度脚本java.io.IOException: output.properties data exceeds its limit [2048]

shell脚本中一次提交的作业量太大,其中包含的信息超过oozie launcher一次容许的最大值2K(2K是默认值)

问题解决:CDH集群中修改 oozie-site.xml 的 Oozie Server 高级配置代码段(安全阀),然后重启,修改如下:

<property>
<name>oozie.action.max.output.data</name>
<value>204800</value>
</property>

在这里插入图片描述

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值