用虚拟机搭建了一个三台虚拟机的hadoop集群,一台master,两台slaves,系统用的是centos6.5
一 安装jdk
我安装的是jdk1.8,这里就不写了
二 配置hosts文件
三 设置ssh免密码登录
四 安装hadoop
1 下载hadoop2.6.4并解压,我把他解压到了/home文件夹下面
tar -zxvf hadoop-2.6.4.tar.gz
2 配置hadoop-env.sh文件
[root@Master hadoop-2.6.4]# vi etc/hadoop/hadoop-env.sh
在这里配置JAVA_HOME和hadoop文件路径
使用echo $JAVA_HOME命令可以查看java环境变量
[root@Slave1 usr]# echo $JAVA_HOME
/usr/java/jdk1.8.0_92
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Set Hadoop-specific environment variables here.
# The only required environment variable is JAVA_HOME. All others are
# optional. When running a distributed configuration it is best to
# set JAVA_HOME in this file, so that it is correctly defined on
# remote nodes.
# The java implementation to use.
export JAVA_HOME=/usr/java/jdk1.8.0_92
export HADOOP_PREFIX=/home/hadoop-2.6.4
3 配置core-site.xml
[root@Master hadoop-2.6.4]# vi etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master.hadoop:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/local/hadoop/tmp</value>
</property>
</configuration>
fs.defaultFS是NameNode的URI hdfs://主机名:端口/
hadoop.tmp.dir为hadoop默认临时路径,此文件夹应提前创建好
4 配置mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>master.hadoop:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master.hadoop:19888</value>
</property>
</configuration>
注意主机名应换成自己hosts文件里配置的主机名
5 配置hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>master.hadoop:50090</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/local/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/local/hadoop/dfs/data</value>
</property>
</configuration>
dfs.replication是数据需要备份的数量,默认是3,如果此数大于集群的机器数会出错
设置name.dir和data.dir路径时要注意该路径可用的磁盘空间大小,若空间不足会导致hdfs上传文件失败
此处的name和data等目录不能提前创建,如果提前创建会出问题
6 配置master和slaves主从节点
[root@Master hadoop-2.6.4]# vi etc/hadoop/masters
master.hadoop
[root@Master hadoop-2.6.4]# vi etc/hadoop/slaves
slave1.hadoop
slave2.hadoop
7 配置yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master.hadoop</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
8 把配置好的/home/hadoop-2.6.4文件夹拷到另外两个slaves节点上
scp -r /home/hadoop-2.6.4 root@slave1.hadoop:/home/
9 配置环境变量
每台机器都要配置
[root@Master hadoop-2.6.4]# vi /etc/profile
#set hadoop path
export HADOOP_HOME=/home/hadoop-2.6.4
export HADOOP_HOME_WARN_SUPPRESS=1
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
source /etc/profile 使profile文件立即生效
10 在每台机器上执行格式化
hdfs namenode -format
start-all.sh 命令启动服务