1. 关闭防火墙
-# firewall-cmd --state
-# systemctl stop firewalld.service
-# systemctl disable firewalld.service
-# vi /etc/selinux/config
SELINUX=disabled ,然后重启
2) 虚拟机IP配置
BOOTPROTO=static
IPADDR=192.168.220.20
NETMASK=255.255.255.0
GATEWAY=192.168.220.1
3) hostname设置
-# vi /etc/hosts
192.168.220.20 master
192.168.220.21 slave01
192.168.220.22 slave02
4) 卸载系统自带的openjdk
# java –version
# rpm -qa | grep java
# rpm -e --nodeps java-1.8.0-openjdk-1.8.0.71-2.b15.el7_2.x86_64
# rpm -e --nodeps java-1.8.0-openjdk-headless-1.8.0.71-2.b15.el7_2.x86_64
5) 安装jdk,并配置环境变量
# tar -xzvf jdk-7u79-linux-x64.tar.gz -C /usr/local/
# mv jdk1.7.0_79 jdk1.7
#vi /etc/profile 设置环境变量,添加如下3行内容
export JAVA_HOME=/usr/local/jdk1.7
export CLASSPATH=/usr/local/jdk1.7/lib
export PATH=.:$JAVA_HOME/bin:$PATH
# source /etc/profile
# java -version
6) 安装hadoop,并配置环境变量
# tar -xzvf hadoop-2.6.0-x64.tar.gz -C /usr/local/
# mv hadoop-2.6.0 hadoop2.6
#vi /etc/profile 设置环境变量:
export JAVA_HOME=/usr/local/jdk1.7
export HADOOP_HOME=/usr/local/hadoop2.6
export CLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib
export PATH=.:$HADOOP_HOME/bin:$JAVA_HOME/bin:$PATH
# source /etc/profile
# hadoop version
7) 编辑hadoop配置文件
修改$HADOOP_HOME/etc/hadoop下的7个配置文件。
(1)core-site.xml,设置namenode主机,hadoop文件系统
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop2.6/tmp</value>
</property>
<property>
<name>fs.trash.interval</name>
<value>1440</value>
</property>
</configuration>
(2) 修改hdfs-site.xml,设置数据块副本数目
<configuration>
<property>
<name>dfs.name.dir</name>
<value>/usr/local/hadoop2.6/hdfs/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/usr/local/hadoop2.6/hdfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
(3) 修改mapred-site.xml
[cp mapred-site.xml.template mapred-site.xml]
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
(4)修改hadoop-env.sh,设置JAVA_HOME
在hadoop-env.sh中添加,本机中jdk路径
export JAVA_HOME=/usr/local/jdk1.7
(5)修改yarn-env.sh,设置JAVA_HOME
在yarn-env.sh中添加,本机中jdk路径
export JAVA_HOME=/usr/local/jdk1.7
export HADOOP_COMMON_LIB_NATIVE_DIR
=${HADOOP_HOME}/lib/native
export HADOOP_OPTS
="-Djava.library.path=$HADOOP_HOME/lib"
(6)yarn-site.xml,包含MapReduce启动的配置信息。
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8031</value>
</property>
</configuration>
(7)slaves文件
slave01
slave02
8) 克隆master虚拟机至是slave
9) SSH免密码登录设置
10) 初始化hadoop
初始化和运行hadoop只需要在主节点进行,系统会自动登陆到从节点进行相关的操作
-# hdfs namenode –format 格式化HDFS文件系统
INFO common.Storage: Storage directory /usr/local/hadoop2.6/hdfs/name has been successfully formatted.
11) 运行hadoop
进入hadoop的sbin目录
# start-dfs.sh
# start-yarn.sh
12) 简单验证
a.主节点:jps , SecondaryNameNode、NameNode、ResourceManager;
b.从节点:jps,NodeManager、DataNode、
c.浏览器:master:50070,master:8088
13) 程序验证
使用新建的分布式平台运行wordcount程序:
# echo "Hello World.Hello hadoop." > hello.txt
# hadoop fs -mkdir -p input
# hadoop fs –ls 查看文件夹是否创建成功
# hadoop fs -put ../hello.txt input
将本地文件上传到hdfs
# hadoop fs -ls input 查看文件是否上传成功
# hadoop fs -ls hdfs://master:9000/user/root/input 直接查看
# hadoop jar /usr/local/hadoop2.6/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcount input output
# hadoop fs -ls output 查看运行之后产生的文件
# hadoop fs -cat output/part-r-00000 查看运行结果