Hadoop集群搭建
虚拟机准备
创建虚拟机
安装3台同属于一个局域网的虚拟机,详细教程可以Google.
我这里创建了三台centos虚拟机,IP为192.168.1.101~192.168.1.103
配置/etc/hosts
同时配置/etc/hosts,使三台主机可以通过主机名互相ping通
关闭防火墙
执行下面命令关闭防火墙,避免集群通信被防火墙阻拦
systemctl disable firewalld.service
服务器间ssh免密登陆
- 在每一个服务器上生成公钥和密钥
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
- 将公钥拷贝至其他服务器的
/root/.ssh/authorized_keys
文件.
JDK安装
获取配套软件资源点我
配置JDK环境变量
插入如下配置
export JAVA_HOME=/home/software/jdk1.8.0_301
export PATH=.:$PATH:$JAVA_HOME/bin
HADOOP 安装
解压&配置环境变量
获取配套软件资源点我
配置HADOOP环境变量
插入如下配置
export HADOOP_HOME=/home/software/hadoop-2.7.4
export PATH=.:$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
HADOOP配置文件
配置文件目录:/home/software/hadoop-2.7.4/etc/hadoop
hadoop-env.sh
配置JAVA_HOME如下
export JAVA_HOME=/home/software/jdk1.8.0_301
hdfs-site.xml
创建数据存储目录
配置node1上的hdfs-site.xml
如下
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.datanode.data.dir</name>
<value>file://home/software/hadoop-2.7.4/data/datanode</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///home/software/hadoop-2.7.4/data/namenode</value>
</property>
<!--
这个配置是错误的,不需要配置。原因看下面链接
https://stackoverflow.com/questions/34410181/why-i-cant-access-http-hadoop-master50070-when-i-define-dfs-namenode-http-ad
<property>
<name>dfs.namenode.http-address</name>
<value>node1:50070</value>
</property>
-->
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>node2:50090</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
yarn-site.xml
<?xml version="1.0"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>node1:8025</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>node1:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>node1:8050</value>
</property>
</configuration>
core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://node1/</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>node1:2181,node2:2181,node3:2181</value>
</property>
</configuration>
slaves
将node1 上的配置拷贝到其他节点上
首先在其他节点上解压&配置环境变量
,再在node1上执行如下命令同步配置至其他节点
scp /home/software/hadoop-2.7.4/etc/hadoop/* root@node2:/home/software/hadoop-2.7.4/etc/hadoop/
scp /home/software/hadoop-2.7.4/etc/hadoop/* root@node3:/home/software/hadoop-2.7.4/etc/hadoop/
格式化文件系统
在主节点(node1)上执行下面命令,格式化文件系统
hdfs namenode -format
HADOOP集群启动&停止
启动
在node1上执行下面命令启动HADOOP集群
start-all.sh
查看每个主节点上的HADOOP进程
查看页面
停止集群
stop-all.sh