hadoop集群部署lzo

因为LZO有较快的解压缩速率,所以我打算在生产环境进行该压缩方式的使用(虽然对磁盘节约的空间不够多)。

目前这个部署步骤有点问题,还需调试和验证,稍候再给出正确的部署方式,大家如果按此步骤进行,如有问题暂时自己进行解决吧。

apache-ant-1.8.3-bin.tar.gz   ant编译工具,必须大于1.7版本,否则有些属性不支持
kevinweil-hadoop-lzo-6bb1b7f.tar.gz  用来编译hadoop-lzo-0.4.13.jar文件
hadoop-gpl-compression-0.1.0-rc0.tar.gz  上面的替代方案,经测试此方案更佳,建议使用

这里主要说明下hadoop-gpl-compression-0.1.0-rc0.tar.gz 和kevinweil-hadoop-lzo-6bb1b7f.tar.gz是重复的,只安装一个就行了,推荐安装kevinweil-hadoop-lzo-6bb1b7f.tar.gz这个。

lzo-2.06.tar.gz lzo动态库编译

 

1、安装ant,很简单

tar -zxf apache-ant-1.8.3-bin.tar.gz

修改/etc/profile文件

vim /etc/profile
内容如下:
export ANT_HOME="/home/hadoop/apache-ant-1.8.3"
export PATH=$PATH:$ANT_HOME/bin

保存并退出

source /etc/profile

 

 2、安装lzo动态库

tar -zxvf lzo-2.06.tar.gz

./configure --enable-shared
make
make install

库文件默认安装到/usr/local/lib目录下

拷贝/usr/local/lib目录下得lzo库到/usr/lib【32位操作系统】或者/usr/lib64【64位操作系统】

 

3、安装生成hadoop-lzo的jar包

tar -zxvf kevinweil-hadoop-lzo-6bb1b7f.tar.gz 

cd kevinweil-hadoop-lzo-6bb1b7f

ant compile-native tar

正常的话,会在kevinweil-hadoop-lzo-6bb1b7f/build目录下生成hadoop-lzo-0.4.15.jar包,

将这个jar包拷贝到hadoop安装目录下的lib目录里

cp hadoop-lzo-0.4.15.jar /home/hadoop/hadoop-0.20.205.0/lib

接着 

进入到/kevinweil-hadoop-lzo-6bb1b7f/build/native/Linux-amd64-64/lib目下,将全部文件拷贝到/home/hadoop/hadoop-0.20.205.0/lib/native/Linux-amd64-64目录下。

或者使用

tar -cBf - -C build/native/ . |tar -xBvf - -C /home/hadoop/hadoop-0.20.205.0/lib/native/

这个命令 ,接着赋访问权限,命令如下:

cd /home/hadoop/hadoop-0.20.205.0/lib

chown -R hadoop. native/

 

4、 处理hadoop-gpl-compression-0.1.0-rc0.tar.gz

tar -zxf hadoop-gpl-compression-0.1.0-rc0.tar.gz

cd hadoop-gpl-compression-0.1.0

在此目录下有hadoop-gpl-compression-0.1.0.jar,拷贝到hadoop安装目录下的lib里

再把目录下lib/native下的目录,拷贝到hadoop安装目录下的/lib/native里。

 

 

5、修改hadoop的配置文件core-site.xml,mapred-site.xml

core-site.xml内容如下:

<property>  
<name>io.compression.codecs</name>  
<value>org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,com.hadoop.compression.lzo.LzopCodec</value> 
</property> 

<property>  
<name>io.compression.codec.lzo.class</name>  
<value>com.hadoop.compression.lzo.LzoCodec</value> 
</property>

 

mapred-site.xml内容如下:

<property>  
<name>mapred.map.output.compress</name>  
<value>true</value>  
</property> 
 
<property>  
<name>mapred.child.env</name>  
<value>JAVA_LIBRARY_PATH=/usr/local/cdh3u0/hadoop-0.20.2-cdh3u1/lib/native/Linux-amd64-64</value>  
</property> 
 
<property>  
<name>mapred.map.output.compress.codec</name>  
<value>com.hadoop.compression.lzo.LzoCodec</value>
</property>


 

部署完成后,运行hive有如下错误:

Total MapReduce jobs = 2
Launching Job 1 out of 2
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
java.io.IOException: Cannot create an instance of InputFormat class org.apache.hadoop.mapred.TextInputFormat as specified in mapredWork!
        at org.apache.hadoop.hive.ql.io.HiveInputFormat.getInputFormatFromCache(HiveInputFormat.java:197)
        at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:298)
        at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:981)
        at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:973)
        at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:172)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:889)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:842)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
        at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:842)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:816)
        at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:452)
        at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:136)
        at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:133)
        at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
        at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:48)
Caused by: java.lang.RuntimeException: Error in configuring object
        at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
        at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
        at org.apache.hadoop.hive.ql.io.HiveInputFormat.getInputFormatFromCache(HiveInputFormat.java:193)
        at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:298)
        at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:981)
        at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:973)
        at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:172)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:890)
        ... 10 more
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
        at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
        at org.apache.hadoop.hive.ql.io.HiveInputFormat.getInputFormatFromCache(HiveInputFormat.java:193)
        at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:298)
        at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:981)
        at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:973)
        at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:172)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:889)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:842)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
        at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:842)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:816)
        at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:452)
        at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:136)
        at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:133)
        at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
        at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:47)
Caused by: java.lang.IllegalArgumentException: Compression codec com.hadoop.compression.lzo.LzopCodec not found.
        at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:96)
        at org.apache.hadoop.io.compress.CompressionCodecFactory.<init>(CompressionCodecFactory.java:134)
        at org.apache.hadoop.mapred.TextInputFormat.configure(TextInputFormat.java:38)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
        at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
        at org.apache.hadoop.hive.ql.io.HiveInputFormat.getInputFormatFromCache(HiveInputFormat.java:193)
        at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:298)
        at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:981)
        at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:973)
        at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:172)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:890)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:842)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
        at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:842)

 

 目前网上有个朋友,写到hadoop-0.20.205的版本有个bug,就是不会自动加载lib目录下的jar包,所以需要进行相关环境变量的修改,修改内容如下:

解决办法:在$hadoop_home/bin/hadoop文件中增加如下一行即可

JAVA_LIBRARY_PATH=$hadoop_home/lib/native/Linux-amd64-64

 

在hadoop-0.20.205版本中处了该错误外,还有一个错误,就是找不到lzo的jar包,这是因为他们加载classpath的方法有了变更,默认不会把$hadoop_home/lib目录下的所有jar包都加载,所以在/conf/hadoop-env.sh中增加如下代码即可解决:export  HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$hadoop_home/lib/hadoop-lzo.jar

 

目前确实也是因为加载不到jar包导致,所以正在测试环境下,进行调试中,有结果再发最终结果的信息。

经测试后,该解决方案解决了jar包找不到的问题。但本地库加载还是不成功。还需要进一步进行验证。 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值