环境:linux虚拟机3台centos,jdk1.8,hadoop2.6.0
前期准备
1.配置虚拟机环境
验证:互相ping IP
2.配置jdk环境变量
验证:java -version
3.ssh免密码,产生公钥ssh-keygen -t rsa,机器之间的ssh用命令 ssh-copy-id 主机名
验证:ssh localhost
4.下载解压缩hadoop,配置hadoop环境变量
进入/hadoop/etc/hadoop目录,修改配置文件
1.hadoop-env.sh
修改export JAVA_HOME=/usr/local/jdk
2.yarn-env.sh
修改export JAVA_HOME=/usr/local/jdk
3.core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/tmp</value>
<description>Abasefor other temporary directories.</description>
</property>
</configuration>
4.hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>master:9001</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/usr/local/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/usr/local/hadoop/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
5.mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
6.yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8035</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master:8088</value>
</property>
</configuration>
7.slaves文件添加2个从节点主机名
slave1
slave2
配置文件全部完成,拷贝hadoop文件夹到slave1和slave2节点
scp -r hadoop slave1:/usr/local
scp -r hadoop slave2:/usr/local
在主节点格式化namenode,仅此一次
hdfs namenode -format
在各节点分别启动
start-all.sh
主节点进程
30498 NameNode
30733 SecondaryNameNode
19781 ResourceManager
30889 Jps
从节点进程
5265 NodeManager
4787 DataNode
5527 Jps
大工搞成!!
遇到的小问题
第一次启动没问题,虚拟机重启后在启动namenode起不来了,logs日志
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /usr/local/hadoop/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible.
一:重新格式化namenode
hadoop namenode -format
但是格式化会每次删除以前的数据
解决方案就很简单,这些目录的位置都是根据hadoop.tmp.dir的位置确定的,所以只需要在conf/core-site.xml覆盖hadoop.tmp.dir的默认值即可:
<
property
>
<
name
>hadoop.tmp.dir</
name
>
<
value
>/home/javoft/Documents/hadoop/hadoop-${user.name}</
value
>
<
description
>A base for other temporary directories.</
description
>
</
property
>