hadoop-2.4的编译安装.
一,编译源码
1,下载需要的工具包:
java-1.6.0-27,maven-3.0.5(maven的版本选择比较重要,3.2的版本会有问题编译不通过.),cmake-2.8.12.1,protobuf-2.5.0
zlib-devel-1.2.7-2.1.2.x86_64(如果机器已经安装了zlib或gzip即可不用安装)
2,安装.
1,安装jdk.过程略.
2,安装cmake.解压tar包.
1) cd /root/hadoop-install-tools
2) tar xzf cmake-2.8.12.1.tar.gz
3) cd cmake-2.8.12.1
4) ./configure
5) make
6) make install
3,安装protobuf
1) cd /root/hadoop-install-tools
2) tar xzf protobuf-2.5.0.tar.gz
3) cd protobuf-2.5.0
4) ./conigure --prefix=/root/protobuf
5) make
6) make instal
4,安装maven.直接解压下载的二进制tar包到/usr目录下并重命名目录为maven,然后设置相关的环境变量.
vi /etc/profile
添加如下的环境变量
export MVN_HOME=/usr/maven
export PATH=$MVN_HOME/bin:$PATH
执行命令: source /etc/profile 让设置的环境变量生效.
然后执行 mvn -verion 看是否生效
Apache Maven 3.0.5 (r01de14724cdef164cd33c7c8c2fe155faf9602da; 2013-02-19 21:51:28+0800)
Maven home: /usr/maven
Java version: 1.6.0_27, vendor: Sun Microsystems Inc.
Java home: /usr/java/jdk1.6.0_27/jre
Default locale: zh_CN, platform encoding: UTF-8
OS name: "linux", version: "2.6.32-431.el6.x86_64", arch: "amd64", family: "unix"
----(网络较慢的时候,可将下载的tomcat拷贝到hadoop-2.4.0-src/hadoop-hdfs-project/hadoop-hdfs-httpfs/downloads目录下).
5,编译源代码
mvn package -Pdist -DskipTests -Dtar.
如果需要编译成本地库(Native Libraries)文件,则使用命令:mvn package -Pdist,native -DskipTests -Dtar。
编译成功后,jar文件会放在target子目录下,可以在Hadoop源码目录下借用find命令搜索各个target子目录。
编译成功后,会生成Hadoop二进制安装包hadoop-2.4.0.tar.gz,放在源代码的hadoop-dist/target子目录下:
/root/hadoop-install-tools/hadoop-2.4.0-src/hadoop-dist/target
二,安装
1,将编译生成的二进制安装包拷贝到/usr目录下进行解压.
tar -xzf hadoop-2.4.0.tar.gz
重命名解压后的目录为hadoop. mv hadoop-2.4.0 hadoop.
2,配置环境变量.
vi /etc/profile
export HADOOP_HOME=/usr/hadoop
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
执行命令: source /etc/profile 让设置的环境变量生效.
执行hadoop version来测试是否安装成功
[root@localhost usr]# hadoop version
Hadoop 2.4.0
Subversion Unknown -r Unknown
Compiled by root on 2014-08-04T10:27Z
Compiled with protoc 2.5.0
From source with checksum 375b2832a6641759c6eaf6e3e998147
This command was run using /usr/hadoop/share/hadoop/common/hadoop-common-2.4.0.jar
三,伪集群模式启动.
启动前需要做些基本的设置.
1,免登录模式模式.(前提必须要安装ssh,rsync)
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
2,设置hadoop的配置参数(etc/hadoop/core-site.xml,etc/hadoop/hdfs-site.xml).
core-site.xml:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
etc/hadoop/hadoop-env.sh中修改JAVA_HOME的目录地址:
export JAVA_HOME=/usr/java/jdk1.6.0_27
3,第一次启动文件系统前需格式化文件系统.即通过 hadoop namenode -format 或 hdfs namenode -format
4,使用start-dfs.sh来启动hdfs服务.
使用jps来看hdfs的进程是否启动.
[root@localhost hadoop]# jps
5782 NameNode
5897 DataNode
6168 Jps
6046 SecondaryNameNode
再通过web界面来查看下namenode是否正常启动.默认地址:NameNode - http://localhost:50070/
如果无法打开,可能是由于防火墙是开着的.关闭防火墙.service iptables stop.
可以通过hadoop 如下命令来测试hdfs是否可用:
[root@localhost hadoop]# hadoop fs -ls /
[root@localhost hadoop]# hadoop fs -mkdir /zxb
[root@localhost hadoop]# hadoop fs -mkdir /zxb
mkdir: `/zxb': File exists
[root@localhost hadoop]# hadoop fs -ls /zxb
[root@localhost hadoop]# hadoop fs -ls /
Found 1 items
drwxr-xr-x - root supergroup 0 2014-08-04 19:23 /zxb
可以导入数据到文件系统
[root@localhost etc]# hdfs dfs -put hadoop/ /zxb/input
[root@localhost etc]# hdfs dfs -ls /zxb/input
Found 25 items
-rw-r-r- 1 root supergroup 3589 2014-08-04 19:25 /zxb/input/capacity-scheduler.xml
-rw-r-r- 1 root supergroup 1335 2014-08-04 19:25 /zxb/input/configuration.xsl
5,设置单节点Yarn.(etc/hadoop/mapred-site.xml,etc/hadoop/yarn-site.xml)
mapred-site.xml:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
yarn-site.xml:
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
6,启动或停止yarn.
sbin/start-yarn.sh
sbin/stop-yarn.sh
通过web浏览器来查看下yarn状态:ResourceManager - http://localhost:8088/
7,测试:
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.0.jar grep /zxb/input /zxb/output3/ 'dfs[a_z.]+'
在未启动yarn模式下,也是可以运行上面命令得出结果,未启动yarn模式,采用的是默认的mrv1来执行的.
配置了yarn并启动了yarn,则采用了yarn来运行上面的任务的.
参考: http://blog.chinaunix.net/uid-20682147-id-4219103.html http://hadoop.apache.org/docs/r2.4.0/hadoop-project-dist/hadoop-common/SingleCluster.htmlhttp://hadoop.apache.org/docs/r2.4.0/hadoop-project-dist/hadoop-common/NativeLibraries.html
一,编译源码
1,下载需要的工具包:
java-1.6.0-27,maven-3.0.5(maven的版本选择比较重要,3.2的版本会有问题编译不通过.),cmake-2.8.12.1,protobuf-2.5.0
zlib-devel-1.2.7-2.1.2.x86_64(如果机器已经安装了zlib或gzip即可不用安装)
2,安装.
1,安装jdk.过程略.
2,安装cmake.解压tar包.
1) cd /root/hadoop-install-tools
2) tar xzf cmake-2.8.12.1.tar.gz
3) cd cmake-2.8.12.1
4) ./configure
5) make
6) make install
3,安装protobuf
1) cd /root/hadoop-install-tools
2) tar xzf protobuf-2.5.0.tar.gz
3) cd protobuf-2.5.0
4) ./conigure --prefix=/root/protobuf
5) make
6) make instal
4,安装maven.直接解压下载的二进制tar包到/usr目录下并重命名目录为maven,然后设置相关的环境变量.
vi /etc/profile
添加如下的环境变量
export MVN_HOME=/usr/maven
export PATH=$MVN_HOME/bin:$PATH
执行命令: source /etc/profile 让设置的环境变量生效.
然后执行 mvn -verion 看是否生效
Apache Maven 3.0.5 (r01de14724cdef164cd33c7c8c2fe155faf9602da; 2013-02-19 21:51:28+0800)
Maven home: /usr/maven
Java version: 1.6.0_27, vendor: Sun Microsystems Inc.
Java home: /usr/java/jdk1.6.0_27/jre
Default locale: zh_CN, platform encoding: UTF-8
OS name: "linux", version: "2.6.32-431.el6.x86_64", arch: "amd64", family: "unix"
----(网络较慢的时候,可将下载的tomcat拷贝到hadoop-2.4.0-src/hadoop-hdfs-project/hadoop-hdfs-httpfs/downloads目录下).
5,编译源代码
mvn package -Pdist -DskipTests -Dtar.
如果需要编译成本地库(Native Libraries)文件,则使用命令:mvn package -Pdist,native -DskipTests -Dtar。
编译成功后,jar文件会放在target子目录下,可以在Hadoop源码目录下借用find命令搜索各个target子目录。
编译成功后,会生成Hadoop二进制安装包hadoop-2.4.0.tar.gz,放在源代码的hadoop-dist/target子目录下:
/root/hadoop-install-tools/hadoop-2.4.0-src/hadoop-dist/target
二,安装
1,将编译生成的二进制安装包拷贝到/usr目录下进行解压.
tar -xzf hadoop-2.4.0.tar.gz
重命名解压后的目录为hadoop. mv hadoop-2.4.0 hadoop.
2,配置环境变量.
vi /etc/profile
export HADOOP_HOME=/usr/hadoop
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
执行命令: source /etc/profile 让设置的环境变量生效.
执行hadoop version来测试是否安装成功
[root@localhost usr]# hadoop version
Hadoop 2.4.0
Subversion Unknown -r Unknown
Compiled by root on 2014-08-04T10:27Z
Compiled with protoc 2.5.0
From source with checksum 375b2832a6641759c6eaf6e3e998147
This command was run using /usr/hadoop/share/hadoop/common/hadoop-common-2.4.0.jar
三,伪集群模式启动.
启动前需要做些基本的设置.
1,免登录模式模式.(前提必须要安装ssh,rsync)
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
2,设置hadoop的配置参数(etc/hadoop/core-site.xml,etc/hadoop/hdfs-site.xml).
core-site.xml:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
etc/hadoop/hadoop-env.sh中修改JAVA_HOME的目录地址:
export JAVA_HOME=/usr/java/jdk1.6.0_27
3,第一次启动文件系统前需格式化文件系统.即通过 hadoop namenode -format 或 hdfs namenode -format
4,使用start-dfs.sh来启动hdfs服务.
使用jps来看hdfs的进程是否启动.
[root@localhost hadoop]# jps
5782 NameNode
5897 DataNode
6168 Jps
6046 SecondaryNameNode
再通过web界面来查看下namenode是否正常启动.默认地址:NameNode - http://localhost:50070/
如果无法打开,可能是由于防火墙是开着的.关闭防火墙.service iptables stop.
可以通过hadoop 如下命令来测试hdfs是否可用:
[root@localhost hadoop]# hadoop fs -ls /
[root@localhost hadoop]# hadoop fs -mkdir /zxb
[root@localhost hadoop]# hadoop fs -mkdir /zxb
mkdir: `/zxb': File exists
[root@localhost hadoop]# hadoop fs -ls /zxb
[root@localhost hadoop]# hadoop fs -ls /
Found 1 items
drwxr-xr-x - root supergroup 0 2014-08-04 19:23 /zxb
可以导入数据到文件系统
[root@localhost etc]# hdfs dfs -put hadoop/ /zxb/input
[root@localhost etc]# hdfs dfs -ls /zxb/input
Found 25 items
-rw-r-r- 1 root supergroup 3589 2014-08-04 19:25 /zxb/input/capacity-scheduler.xml
-rw-r-r- 1 root supergroup 1335 2014-08-04 19:25 /zxb/input/configuration.xsl
5,设置单节点Yarn.(etc/hadoop/mapred-site.xml,etc/hadoop/yarn-site.xml)
mapred-site.xml:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
yarn-site.xml:
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
6,启动或停止yarn.
sbin/start-yarn.sh
sbin/stop-yarn.sh
通过web浏览器来查看下yarn状态:ResourceManager - http://localhost:8088/
7,测试:
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.0.jar grep /zxb/input /zxb/output3/ 'dfs[a_z.]+'
在未启动yarn模式下,也是可以运行上面命令得出结果,未启动yarn模式,采用的是默认的mrv1来执行的.
配置了yarn并启动了yarn,则采用了yarn来运行上面的任务的.