参考: http://www.cnblogs.com/bovenson/p/5760856.html
准备3台虚拟机,IP地址和主机名分别配置为:
- 192.168.241.100 mini1
- 192.168.241.101 mini2
- 192.168.241.102 mini3
1. 安装JDK
下载jdk,解压到 /opt目录下,并配置环境变量
JDK下载地址为: http://www.oracle.com/technetwork/java/javase/downloads/index.html
tar jdk-8u161-linux-x64.tar.gz
mv jdk1.8.0_161 /opt
vim /etc/profile
#在/etc/profile文件末尾添加如下内容
export JAVA_HOME=/opt/jdk1.8.0_161
export JRE_HOME=$JAVA_HOME/jre
export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib
export PATH=$JAVA_HOME/bin:$PATH
# 使配置生效
source /etc/profile
2. 安装hadoop
下载hadoop并解压,配置相关的环境变量
hadoop下载的地址为: http://hadoop.apache.org/#Download+Hadoop
tar hadoop-2.9.0.tar.gz
vim /etc/profile
# 在/etc/profile文件末尾添加如下内容:
export HADOOP_HOME=/root/hadoop-2.9.0
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
# 使配置生效
source /etc/profile
3.配置Hadoop集群模式
Hadoop相关的配置文件放在 hadoop-2.9.0/etc/hadoop目录下,分别需要编辑如下几个配置文件:
- core-site.xml hadoop的核心配置
- hdfs-site.xml hdfs相关配置
- mapred-site.xml mapreduce相关配置
- yarn-site.xml yarn相关配置
3.1 在core-site.xml文件中指定文件系统为hdfs
其中 /home/hadoop/hdfsdata 为hdfs数据存在的位置,可以自行指定
vim core-site.xml
# 添加如下的配置
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://mini1:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/hdfsdata</value>
</property>
</configuration>
3.2 在hdfs-site.xml文件中指定文件保存的副本数量
vim hdfs-site.xml
# 指定保存的文件副本数(默认为3,也可修改为其他值)
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
</configuration>
3.3 将 mapred-site.xml.template 复制为 mapred-site.xml,然后添加如下的mapreduce配置,指定mapreduce资源管理为yarn
cp mapred-site.xml.template mapred-site.xml
vim mapred-site.xml
# 指定map reduce 管理
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
3.4 在yarn-site.xml指定yarn相关设置
vim yarn-site.xml
# 指定yarn主节点位置和mapreduce附加服务
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>mini1</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
3.5 在hadoop-env.sh指定 JAVA_HOME 设置
# The java implementation to use.
export JAVA_HOME=/opt/jdk1.8.0_161
4. 启动hadoop集群
4.1首先使用hdfs格式化文件系统
hadoop namenode -format
成功格式化之后hdfs数据存放位置的内容为:
[root@mini1 hdfsdata]# tree .
.
└── dfs
└── name
└── current
├── fsimage_0000000000000000000
├── fsimage_0000000000000000000.md5
├── seen_txid
└── VERSION
4.2 启动namenode服务
hadoop-daemon.sh start namenode
启动成功后可以通过50070端口访问hdfs文件系统
4.3 依次启动datanode服务
启动namenode之后可以依次启动datanode服务,datanode会自动连接到指定的namenode节点服务
hadoop-daemon.sh start datanode
启动datanode之后可以在管理界面看到datanode的信息
5 集群自动化启动脚本设置
5.1 hdfs集群自动化启动脚本设置
在hadoop-2.9.0/etc/hadoop/slaves文件可以指定启动集群的机器, vim etc/hadoop/slaves
mini1
mini2
mini3
注: 需要配置mini1到各机器的ssh免密登陆
此时可以通过start-dfs.sh 和 stop-dfs.sh 命令启动和停止hdfs集群
[hadoop@mini1 hadoop]$ start-dfs.sh
Starting namenodes on [mini1]
mini1: starting namenode, logging to /home/hadoop/hadoop-2.9.0/logs/hadoop-hadoop-namenode-mini1.out
mini3: starting datanode, logging to /home/hadoop/hadoop-2.9.0/logs/hadoop-hadoop-datanode-mini3.out
mini1: starting datanode, logging to /home/hadoop/hadoop-2.9.0/logs/hadoop-hadoop-datanode-mini1.out
mini2: starting datanode, logging to /home/hadoop/hadoop-2.9.0/logs/hadoop-hadoop-datanode-mini2.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /home/hadoop/hadoop-2.9.0/logs/hadoop-hadoop-secondarynamenode-mini1.out
[hadoop@mini1 hadoop]$ stop-dfs.sh
Stopping namenodes on [mini1]
mini1: stopping namenode
mini1: stopping datanode
mini2: stopping datanode
mini3: stopping datanode
Stopping secondary namenodes [0.0.0.0]
0.0.0.0: stopping secondarynamenode
[hadoop@mini1 hadoop]$ jps
4449 DataNode
4737 Jps
4310 NameNode
4620 SecondaryNameNode
5.2 yarn集群启动设置
yarn集群可以通过start-yarn.sh 和 stop-yarn.sh 启动和停止
[hadoop@mini1 hadoop]$ start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /home/hadoop/hadoop-2.9.0/logs/yarn-hadoop-resourcemanager-mini1.out
mini2: starting nodemanager, logging to /home/hadoop/hadoop-2.9.0/logs/yarn-hadoop-nodemanager-mini2.out
mini3: starting nodemanager, logging to /home/hadoop/hadoop-2.9.0/logs/yarn-hadoop-nodemanager-mini3.out
mini1: starting nodemanager, logging to /home/hadoop/hadoop-2.9.0/logs/yarn-hadoop-nodemanager-mini1.out
[hadoop@mini1 hadoop]$ jps
4449 DataNode
4931 NodeManager
4310 NameNode
4811 ResourceManager
4620 SecondaryNameNode
5213 Jps
[hadoop@mini1 hadoop]$ stop-yarn.sh
stopping yarn daemons
stopping resourcemanager
mini1: stopping nodemanager
mini3: stopping nodemanager
mini2: stopping nodemanager
mini1: nodemanager did not stop gracefully after 5 seconds: killing with kill -9
mini3: nodemanager did not stop gracefully after 5 seconds: killing with kill -9
mini2: nodemanager did not stop gracefully after 5 seconds: killing with kill -9
no proxyserver to stop
[hadoop@mini1 hadoop]$
此时登陆其它机器可以看到 datanode 和 NodeManager 已经启动
[hadoop@mini2 ~]$ jps
2963 Jps
2501 DataNode
2828 NodeManager
[hadoop@mini3 ~]$ jps
2481 DataNode
2804 NodeManager
2939 Jps