hadoop 依赖式job_Hadoop - 提交一个有大量依赖项的工作(jar文件)

I want to write some sort of "bootstrap" class, which will watch MQ for incoming messages and submit map/reduce jobs to Hadoop. These jobs use some external libraries heavily. For the moment I have the implementation of these jobs, packaged as ZIP file with bin,lib and log folders (I'm using maven-assembly-plugin to tie things together).

Now I want to provide small wrappers for Mapper and Reducer, which will use parts of the existing application.

As far as I learned, when a job is submitted, Hadoop tries to find out JAR file, which has the mapper/reducer classes, and copy this jar over network to data node, which will be used to process the data. But it's not clear how do I tell Hadoop to copy all dependencies?

I could use maven-shade-plugin to create an uber-jar with the job and dependencies, And another jar for bootstrap (which jar would be executed with hadoop shell-script).

Please advice.

解决方案

One way could be to put the required jars in distributed cache. Another alternative would be to install all the required jars on the Hadoop nodes and tell TaskTrackers about their location. I would suggest you to go through this post once. Talks about the same issue.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值