1.安装环境,使用vmvare四台虚拟机,操作系统centOS6.5-x86_64:
(1)ip:192.168.169.10 , hostname:master #做为hadoop集群的master 配置:1核1G
(2)ip:192.168.169.11 , hostname:slave1 #做为hadoop集群的slave1 配置:1核2G
(3)ip:192.168.169.12 , hostname:slave2 #做为hadoop集群的slave2 配置:1核2G
(4)ip:192.168.169.13 , hostname:slave3 #做为hadoop集群的slave3 配置:1核2G
2.首先配置每台主机的网络之间能互相ping通,具体步骤如下:
(1)配置主机名和网管:vi /etc/sysconfig/network
(2)配置主机网络:vi /etc/sysconfig/network-scripts/ifcfg-eth0
(3)配置主机ip和域名映射: vi /etc/hosts,可以是使用scp命令将本机配置好的hosts文件复制到其他节点。
3.配置SSH免密码登陆:
(1)在master主机上生成秘钥:ssh-keygen -t rsa -P ''
(2)把id_rsa.pub追加到授权的key里面去:cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
(3)修改authorized_keys文件权限:chmod 600 ~/.ssh/authorized_keys
(4)修改/etc/ssh/sshd_config文件中的配置内容:
RSAAuthentication yes
PubkeyAuthentication yes
AuthorizedKeysFile .ssh/authorized_keys
(5)重启SSH服务:service sshd restart
(6)将公钥复制到集群中的其它主机上:scp ~/.ssh/id_rsa.pub root@192.168.169.13:~/
(7)将/etc/ssh/sshd_config复制到集群中的其它主机上:scp /etc/ssh/sshd_config root@192.168.169.12:/etc/ssh/
(8)在其他主机上把id_rsa.pub追加到授权的key里面去:cat ~/id_rsa.pub >> ~/.ssh/authorized_keys
(9)修改authorized_keys文件权限:chmod 600 ~/.ssh/authorized_keys
(10)删除复制的公钥:rm -rf ~/id_rsa.pub
(11)重启SSH服务:service sshd restart
注:6~7步为在其他主机上的操作
4.安装JDK:安装目录:/usr/java/,在每个节点上都安装上jdk
(1)执行安装命令:rpm -ivh jdk-7u25-linux-x64.rpm
(2)配置环境变量:将下面jdk的安装目录添加/etc/profile文件末尾,保存后执行 source /etc/profile命令重新编译profile文件,并执行
java -version命令查看是否生效。
配置变量:
export JAVA_HOME=/usr/java/jdk1.7.0_25
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
5. 安装配置hadoop2.2.0,直接在apache官网下载二进制版:
(1)hadoop目录规划:
hadoop安装目录:
/usr/hadoop/hadoop-2.2.0
在每个节点上创建数据存储目录,用来存放集群数据:
/usr/hadoop/storage/hadoop-2.2.0/hdfs
在主节点master上创建目录,用来存放文件系统元数据:
/usr/hadoop/storage/hadoop-2.2.0/hdfs/name
在每个从节点上创建目录,用来存放真正的数据:
/usr/hadoop/storage/hadoop-2.2.0/hdfs/data
在每个从节点上的日志目录为:
/usr/hadoop/storage/hadoop-2.2.0/logs
在每个从节点上的临时目录为:
/usr/hadoop/storage/hadoop-2.2.0/tmp
(2)在每个节点上创建相关目录(/usr/hadoop/storage/hadoop-2.2.0/hdfs/name)除外,只在master节点创建:
mkdir -p /usr/hadoop/
mkdir -p /usr/hadoop/storage/hadoop-2.2.0/hdfs
mkdir -p /usr/hadoop/storage/hadoop-2.2.0/hdfs/name
mkdir -p /usr/hadoop/storage/hadoop-2.2.0/hdfs/data
mkdir -p /usr/hadoop/storage/hadoop-2.2.0/logs
mkdir -p /usr/hadoop/storage/hadoop-2.2.0/tmp
(3)解压hadoop-2.2.0.tar.gz:tar -zvxf hadoop-2.2.0.tar.gz
(4)在每个节点上都添加hadoop根目录到环境变量中:
export HADOOP_HOME=/usr/hadoop/hadoop-2.2.0
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_LOG_DIR=/usr/hadoop/storage/hadoop-2.2.0/logs
export YARN_LOG_DIR=$HADOOP_LOG_DIR
(5)每个节点上
重新编译profile文件:source /etc/profile
(6)需要配置的文件为:core-site.xml,hdfs-site.xml,yarn-site.xml,mapred-site.xml,hadoop-env.sh,yarn-env.sh,
mapred-env.sh
core-site.xml:
<configuration>
<property><name>fs.defaultFS</name><value>hdfs://master:9000/</value></property><property><name>dfs.replication</name><value>3</value></property><property><name>hadoop.tmp.dir</name><value>file:/usr/hadoop/storage/hadoop-2.2.0/tmp/hadoop-${user.name}</value></property></configuration>
hdfs-site.xml:
<configuration></configuration><property><name>dfs.namenode.name.dir</name><value>file:/usr/hadoop/storage/hadoop-2.2.0/hdfs/name</value></property><property><name>dfs.datanode.data.dir</name><value>file:/usr/hadoop/storage/hadoop-2.2.0/hdfs/data</value></property><property><name>dfs.permissions</name><value>false</value></property>
yarn-site.xml:
<configuration><property>< /configuration>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master:8088</value>
</property>
mapred-site.xml:
<configuration>
</configuration><property><name>mapreduce.framework.name</name><value>yarn</value></property><property><name>mapreduce.map.memory.mb</name><value>512</value></property><property><name>mapreduce.map.cpu.vcores</name><value>1</value></property><property><name>mapreduce.reduce.memory.mb</name><value>512</value></property><property><name>mapreduce.reduce.shuffle.parallelcopies</name><value>5</value></property><property><name>mapreduce.jobhistory.address</name><value>master:10020</value></property><property><name>mapreduce.jobhistory.webapp.address</name><value>master:19888</value></property>
hadoop-env.sh、yarn-env.sh、mapred-env.sh:
修改其中JAVA_HOME,如果被注释掉,去掉注释。
export JAVA_HOME=/usr/java/jdk1.7.0_25
修改slaves文件:每行添加
slave节点域名
(7)将/usr/hadoop/hadoop-2.2.0目录远程同步到其他节点:
scp -r /usr/hadoop/hadoop-2.2.0/ root@slave1:/usr/hadoop/
scp -r /usr/hadoop/hadoop-2.2.0/ root@slave2:/usr/hadoop/
scp -r /usr/hadoop/hadoop-2.2.0/ root@slave3:/usr/hadoop/
(8)关闭各个节点的防火墙:
chkconfig iptables off
chkconfig ip6tables off
(9)格式namenode:
hadoop namenode -format
(10)启动hdfs集群:start-dfs.sh
如果出现【Starting secondary namenodes [0.0.0.0]
The authenticity of host '0.0.0.0 (0.0.0.0)' can't be established.
RSA key fingerprint is 45:83:90:4f:7f:6b:d7:1b:22:be:70:4a:38:67:92:c3.
Are you sure you want to continue connecting (yes/no)? yes】
,输入yes,可以正常启动集群。
(11)启动yarn:start-yarn.sh
(12)使用jps命令查看
当前所有java进程
(12)访问web界面查看namenode状态:
http://192.168.169.10:50070/
(13)关闭yarn:stop-yarn.sh
(14)关闭hdfs:stop-dfs.sh
这里我只是为了学习hadoop亲自搭建测试hadoop集群,不保证集群的高可用和完全正确。
注:这个不是使用hadoop2.2.0源码编译安装,会存在本地库问题,因为apache二级制包默认带的32位的本地库,安装在64系统会有问题,在另一篇文章中我会单独介绍解决这个问题的办法,网址:
http://blog.csdn.net/xiaolinzi007/article/details/41127231