hadoop完全分布式集群搭建
环境准备
安装vmware workstation软件(略)
在vmware workstation中创建虚拟机(略)
集群安装规划:
主机名 | IP地址 | namenode | datanode | zookeeper | journalnode | zkfc | resourcemanager | nodemanager |
---|---|---|---|---|---|---|---|---|
hadoop1 | 192.168.8.2 | yes | yes | yes | yes | yes | yes | yes |
hadoop2 | 192.168.8.3 | yes | yes | yes | yes | yes | yes | yes |
hadoop3 | 192.168.8.4 | no | yes | yes | yes | no | no | yes |
- 修改主机名
hostname hadoop1
echo hadoop1 > /etc/hostname
说明:三台机器都需要操作
2. 配置IP地址和主机名映射
执行:ip a 查看本机ip地址(例如:192.168.8.2)
echo 192.168.8.2 hadoop1 >> /etc/hosts
说明:三台机器都需要操作
3. 关闭selinux
setenforce 0
sed -i 's/enforcing/disabled/g' /etc/sysconfig/selinux
说明:三台机器都需要操作
4. 关闭防火墙
systemctl disable firewalld
systemctl stop firewalld
说明:三台机器都需要操作
5. 免密登录
ssh-keygen
ssh-copy-id root@hadoop1
ssh-copy-id root@hadoop2
ssh-copy-id root@hadoop3
说明:只在第一台机器操作
6. 安装java(默认系统已经安装,执行一下命令查看版本)
java -version
说明:三台机器都需要操作
安装zookeeper
下载zookeeper安装包
最新版本下载:https://dlcdn.apache.org/zookeeper/zookeeper-3.8.0/apache-zookeeper-3.8.0-bin.tar.gz
历史版本下载:https://archive.apache.org/dist/zookeeper/
解压安装包
tar zxvf apache-zookeeper-3.8.0-bin.tar.gz
创建zookeeper配置文件(conf/zoo.cfg)
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/data/zookeeper
clientPort=2181
server.1=hadoop1:2888:3888
server.2=hadoop2:2888:3888
server.3=hadoop3:2888:3888
echo 1 > /data/zookeeper/myid
说明:三台机器都需要操作(上面的数字“1” :hadoop1 对应1,hadoop2 对应2,hadoop3对应3)
启动zookeeper
/opt/zookeeper/bin/zkServer.sh start
说明:三台机器都需要操作
安装hadoop
下载hadoop安装包
最新版本下载:https://dlcdn.apache.org/hadoop/common/hadoop-3.3.3/hadoop-3.3.3.tar.gz
历史版本下载:https://archive.apache.org/dist/hadoop/
解压安装包
tar zxvf hadoop-3.3.3.tar.gz
#将解压后的软件包移动到/usr/local/目录中
mv hadoop-3.3.3 /usr/local
修改配置文件
进入hadoop的安装目录,然后修改如下配置
- 修改etc/hadoop/workers
echo -e "hadoop1\nhadoop2\nhadoop3" > etc/hadoop/workers
说明:只在第一台机器操作
2. 修改etc/hadoop/hadoop-env.sh
echo "export JAVA_HOME=$JAVA_HOME" >> etc/hadoop/hadoop-env.sh
说明:只在第一台机器操作
3. 修改etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://cluster</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/data/hadoop</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>hadoop1:2181,hadoop2:2181,hadoop3:2181</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.groups</name>
<value>*</value>
</property>
</configuration>
- 修改etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.nameservices</name>
<value>cluster</value>
</property>
<property>
<name>dfs.ha.namenodes.cluster</name>
<value>nn1,nn2</value>
</property>
<property>
<name>dfs.namenode.rpc-address.cluster.nn1</name>
<value>hadoop1:9000</value>
</property>
<property>
<name>dfs.namenode.http-address.cluster.nn1</name>
<value>hadoop1:50070</value>
</property>
<property>
<name>dfs.namenode.rpc-address.cluster.nn2</name>
<value>hadoop2:9000</value>
</property>
<property>
<name>dfs.namenode.http-address.cluster.nn2</name>
<value>hadoop2:50070</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://hadoop1:8485;hadoop2:8485;hadoop3:8485/cluster</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/data/hadoop/journaldata</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.cluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>
sshfence
shell(/bin/true)
</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/root/.ssh/id_rsa</value>
</property>
</configuration>
- 修改etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
- 修改etc/hadoop/yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>cluster</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>hadoop1</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>hadoop2</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>hadoop1:8088</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>hadoop2:8088</value>
</property>
<property>
<name>hadoop.zk.address</name>
<value>hadoop1:2181,hadoop2:2181,hadoop3:2181</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=/opt/hadoop</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=/opt/hadoop</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=/opt/hadoop</value>
</property>
</configuration>
分发安装包,将hadoop的安装包传输到其他的机器
scp -r /usr/local/hadoop root@hadoop2:/usr/local
scp -r /usr/local/hadoop root@hadoop3:/usr/local
启动journalnode
/usr/local/hadoop/sbin/hadoop-daemon.sh start journalnode
#说明:只在第一台机器操作
格式化zkfc并启动(hadoop1)
/usr/local/hadoop/bin/hdfs zkfc -formatZK
/usr/local/hadoop/sbin/hadoop-daemons.sh start zkfc
#说明:只在第一台机器操作
格式化namenode并启动(hadoop1)
/usr/local/hadoop/bin/hdfs namenode -format
/usr/local/hadoop/sbin/hadoop-daemon.sh start namenode
#说明:只在第一台机器操作
复制元数据到secondarynode(hadoop2)
/usr/local/hadoop/bin/hdfs namenode -bootstrapStandby
#说明:只在第二台机器操作
启动全部服务
/usr/local/hadoop/sbin/start-all.sh
验证
hadoop原生页面
yarn 原生页面
提交mapreduce测试程序:
yarn jar /usr/local/hadoop/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.3.jar pi 1 1
运行结果: