一、准备安装环境:
1、Vmware workstation 12 的安装
2、虚拟机 Red Hat RHEL 6.6
[hadoop@master ~]$more /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4localhost4.localdomain4
::1 localhost localhost.localdomainlocalhost6 localhost6.localdomain6
192.168.150.30 masterTST-RHEL66-00
192.168.150.31 slave1TST-RHEL66-01
192.168.150.32 slave2TST-RHEL66-02
[hadoop@master ~]$
2、虚拟机之间可以需要SSH免密码登录
## (注意:ssh与-keygen之间没有空格) 一路回车即可。
[hadoop@master ~]$ cd
[hadoop@master ~]$pwd
/home/hadoop
[hadoop@master~]$ssh-keygen -t rsa
##转到.ssh目录 cd~/.ssh 可以看到生成了id_rsa,和id_rsa.pub两个文件
[hadoop@master ~]$ cd.ssh/
[hadoop@master .ssh]$ls
authorized_keysid_rsa id_rsa.pub known_hosts
## 执行 cp id_rsa.pub authorized_keys
[hadoop@master.ssh]$cp id_rsa.pub authorized_keys
## 把Master上面的authorized_keys文件复制到Slave机器的/home/hadoop/.ssh/文件下面
[hadoop@master .ssh]$scp authorized_keys slave1:~/.ssh/
[hadoop@master .ssh]$scp authorized_keys slave2:~/.ssh/
## 修改修改.ssh目录的权限以及authorized_keys的权限(这个必须修改,要不然还是需要密码)
sudo chmod 644~/.ssh/authorized_keys
sudo chmod 700 ~/.ssh
二、Hadoop 2.0稳定版介质
http://mirrors.cnnic.cn/apache/
http://mirrors.cnnic.cn/apache/hadoop/core/stable/hadoop-2.7.2.tar.gz
1、上传解压文件并创建软链接
# tar xzvfhadoop-2.2.0.tar.gz
# chown -R hadoop:hadoop hadoop-2.2.0**(-R级联的授权,子目录都有权限)**
2、配置主机变量
配置环境变量(三台主机)
添加如下内容到hadoop用户的.bashrc文件:
# User specificaliases and functions
exportJAVA_HOME=/usr/java/latest
exportCLASSPATH= C L A S S P A T H : CLASSPATH: CLASSPATH:JAVA_HOME/lib
exportHADOOP_DEV_HOME=/home/hadoop/hadoop2
export HADOOP_MAPARED_HOME=${HADOOP_DEV_HOME}
exportHADOOP_COMMON_HOME=${HADOOP_DEV_HOME}
exportHADOOP_HDFS_HOME=${HADOOP_DEV_HOME}
exportYARN_HOME=${HADOOP_DEV_HOME}
exportHADOOP_CONF_DIR=${HADOOP_DEV_HOME}/etc/hadoop
exportHDFS_CONF_DIR=${HADOOP_DEV_HOME}/etc/hadoop
exportYARN_CONF_DIR=${HADOOP_DEV_HOME}/etc/hadoop
发送到另外两台主机
[hadoop@master .ssh]$scp .bashrc slave1:~
[hadoop@master .ssh]$scp .bashrc slave2:~
3、Hadoop配置有关文件
修改hadoop-env.sh和mapred-env.sh文件
配置hadoop-env.sh
配置mapred-env.sh
修改yarn-env.sh和slaves文件
~/hadoop2/etc/hadoop/yarn-env.sh
配置~/hadoop2/etc/hadoop/slaves
修改core-site.xml文件
创建hadoop工作目录(临时工作目录,默认是/tmp目录,服务器重启后,文件消失,所以需要另外指定一个目录 /hadoop2)
修改~/hadoop2/etc/hadoop/core-site.xml
fs_defaultFS 是 NameNode的IP
Hadoop.tmp.dir 是hadoop的临时目录,刚刚root用户创建的 /hadoop2/tmp
Hadoop.proxyuser.hadoop.hosts中的“.hadoop.”是用户名,我们这里是hadoop,如果使用别的用户,需要用别的用户名,例如: hadoop.proxyuser.userhadoop.hosts
修改hdfs-site.xml文件
创建hadoop工作目录(生产环境中的hadoop目录需要指定挂接独立磁盘或独立盘阵的目录。)
修改~/hadoop2/etc/hadoop/hdfs-site.xml
dfs.replication 副本数量,这里设置为2,默认为3
dfs.webhdfs.enabled通过web监控hdfs
二、Hadoop 2.0——集群配置安装
编辑 ~/hadoop2/etc/hadoop/mapred-site.xml
hadoop 2.7.2的官方文档中没有vcores的设置项,此截图源自 hadoop2.2.0
编辑~/hadoop2/etc/hadoop/yarn-site.xml
复制配置到其他节点
##复制.bashrc和hadoop安装目录到slave1和slave2
$ scp .bashrcslave1:~
$ scp .bashrcslave2:~
$ scp -r hadoop-2.2.0slave1:~
$ scp -r hadoop-2.2.0slave2:~
##为slave1和slave2创建软链接
$ ln –s hadoop-2.2.0hadoop2
####为slave1和slave2创建hadoop工作目录
# mkdir -p/hadoop2/dfs/data
# chown -Rhadoop.hadoop /hadoop2/
启动HDFS集群(至此hdfs文件系统启动)
启动HDFS集群:
如果配置了ssh无密码登录可以使用 start-dfs.sh 启动分布式文件系统
If etc/hadoop/slavesand ssh trusted access is configured (see Single Node Setup), all of the HDFSprocesses can be started with a utility script.
[hdfs]$$HADOOP_PREFIX/sbin/start-dfs.sh
登录Web控制台,查看HDFS集群状态
http://192.168.150.30:50070
二、启动HDFS集群可能遇到的问题
1、启动日志中提示"unable to load native-library"
hadoop默认编译的native-library是32bit,我们的RedHat 6.6是64系统需要重新编译类库,或从网路上找一个64位下载即可
http://dl.download.csdn.net/down10/20150323/537fec32064614e002edf5b9ceb4f3e5.rarresponse-content-disposition=attachment%3Bfilename%3D%22native-x64.rar%22&OSSAccessKeyId=9q6nvzoJGowBj4q1&Expires=1470307292&Signature=hhB7yb0AsKxFJI%2BsYc2DTATIE%2Fo%3D
2、多次格式化造成datanade进程无法拉起且master的日志无输出
${HADOOP_PREFIX}/bin/hdfsnamenode -format 多次执行会造成nameNode 的 clusterID与dataNode不一致
[hadoop@master ~]$head /hadoop2/tmp/dfs/namesecondary/current/VERSION
#Thu Aug 04 18:39:53PDT 2016
namespaceID=1226758419
clusterID=CID-48e0bfd8-5722-4b9d-9da2-79bc13fd8388
cTime=0
storageType=NAME_NODE
blockpoolID=BP-722559016-192.168.150.30-1470359389680
layoutVersion=-63
[hadoop@master ~]$
需要删除 在master节点上删除格式化时生成的文件,之后重新格式化
[hadoop@masterhadoop2]$ rm -fr /hadoop2/tmp/*
[hadoop@masterhadoop2]$ rm -fr /hadoop2/dfs/name/*
[hadoop@masterhadoop2]$ rm -fr /hadoop2/dfs/data/*
[hadoop@masterhadoop2]$/home/hadoop/hadoop2/bin/hdfs namenode -format
[hadoop@masterhadoop2]$start-dfs.sh
三、启动YARN集群
在Master上,执行如下命令
[hadoop@master sbin]$pwd
/home/hadoop/hadoop2/sbin
[hadoop@master sbin]$start-yarn.sh
starting yarn daemons
startingresourcemanager, logging to/home/hadoop/hadoop-2.7.2/logs/yarn-hadoop-resourcemanager-master.out
slave2: startingnodemanager, logging to /home/hadoop/hadoop-2.7.2/logs/yarn-hadoop-nodemanager-slave2.out
slave1: startingnodemanager, logging to/home/hadoop/hadoop-2.7.2/logs/yarn-hadoop-nodemanager-slave1.out
[hadoop@master sbin]$
可以查看启动日志,确认HDFS集群启动成功:
[hadoop@master logs]$pwd
/home/hadoop/hadoop2/logs
[hadoop@master logs]$ll
total 160
-rw-rw-r-- 1 hadoophadoop 56709 Aug 4 19:15hadoop-hadoop-namenode-master.log
-rw-rw-r-- 1 hadoophadoop 718 Aug 4 18:10 hadoop-hadoop-namenode-master.out
-rw-rw-r-- 1 hadoophadoop 718 Aug 4 18:03 hadoop-hadoop-namenode-master.out.1
-rw-rw-r-- 1 hadoophadoop 46001 Aug 4 18:39hadoop-hadoop-secondarynamenode-master.log
-rw-rw-r-- 1 hadoophadoop 718 Aug 4 18:10hadoop-hadoop-secondarynamenode-master.out
-rw-rw-r-- 1 hadoophadoop 718 Aug 4 18:03hadoop-hadoop-secondarynamenode-master.out.1
-rw-rw-r-- 1 hadoophadoop 0 Aug 4 18:03 SecurityAuth-hadoop.audit
-rw-rw-r-- 1 hadoophadoop 34622 Aug 4 19:29yarn-hadoop-resourcemanager-master.log
-rw-rw-r-- 1 hadoophadoop 1524 Aug 4 19:29 yarn-hadoop-resourcemanager-master.out
[hadoop@slave1 logs]$pwd
/home/hadoop/hadoop2/logs
[hadoop@slave1 logs]$ll yarn-hadoop-nodemanager-slave1.*
-rw-rw-r-- 1 hadoophadoop 28167 Aug 4 19:29yarn-hadoop-nodemanager-slave1.log
-rw-rw-r-- 1 hadoophadoop 1508 Aug 4 19:29 yarn-hadoop-nodemanager-slave1.out
[hadoop@slave1 logs]$jps
2913 NodeManager
2773 DataNode
3029 Jps
登录Web控制台,查看ResourceManager状态
登录Web控制台,查看NodeManager状态
http://192.168.150.31:8042/node