hadoop 依赖式job_解决Hadoop mapreduce 包依赖问题

One of the disadvantages of setting up a Hadoop development environment in Eclipse is that I have been dependent on Eclipse to take care of job submission for me and so I had never worried about doing it by hand. I have been developing mostly on a single node cluster (i.e my laptop) which meant I never had the need to submit a job to an actual cluster, a remote cluster in this case. Also, the first MapReduce programs I have written and run on the cluster (more to follow) were not dependent on third party jars. However, the program I am working on depends on a third-party xml parser which in turn depends on another jar.

As it turns out, I had to specify 3 external jars everytime I submit a job. I knew there was a -libjars option that you could use as I had seen it somewhere (including the hadoop help when you don’t specify all arguments for a command) but I did not pay attention since I did not need it then. Googling around, I found a mention of copying the jars to the lib folder of the Hadoop installation. It seemed a good solution untill you think about a multi-node cluster which means you have to copy the libraries to every node. Also, what if you do not have complete control of the clusters. Will you have write permissions to lib folder.

Luckily, I bumped into a solution suggested Doug Cutting as an answer to someone who had a similar predicament. The solution was to create a “lib” folder in your project and copy all the external jars into this folder. According to Doug, Hadoop will look for third-party jars in this folder. It works great!

《Hadoop权威指南》中也有关于jar打包的处理措施,查找之

【任何非独立的JAR文件都必须打包到JAR文件的lib目录中。(这与Java的web application archive或WAR文件类似,不同的是,后者的JAR文件放在WEB-INF/lib子目录下的WAR文件中)】

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值