docker 搭建 hadoop集群
Auth hahally date 2019.11.15 abstract hadoop集群搭建 虚拟机`ubuntu18`版本,拉取阿里云的`hadoop`镜像。该镜像中已经安装配置好了`jdk`
docker
下载安装
hahally@hahally:~$ sudo apt-get update #更新安装包,允许apt通过HTTPS使用存储库
hahally@hahally:~$ sudo apt-get install \
apt-transport-https \
ca-certificates \
curl \
software-properties-common
添加Docker
官方的 GPG
密钥
hahally@hahally:~$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
设置安装源
hahally@hahally:~$ sudo add-apt-repository \
"deb [arch=amd64] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) \
stable"
hahally@hahally:~$ sudo apt-get update
安装 docker
hahally@hahally:~$ sudo apt-get install docker-ce
查看 docker
版本
hahally@hahally:~$ sudo docker version
拉取阿里云的 hadoop
镜像
hahally@hahally:~$ sudo mkdir -p /etc/docker
hahally@hahally:~$ sudo tee /etc/docker/daemon.json <<-'EOF'
{
"registry-mirrors": ["https://lqbkkmob.mirror.aliyuncs.com"]
}
EOF
hahally@hahally:~$ sudo systemctl daemon-reload
hahally@hahally:~$ sudo systemctl restart docker
hahally@hahally:~$ sudo docker pull registry.cn-beijing.aliyuncs.com/bitnp/docker-spark-hadoop
使用 sudo docker images
查看镜像
hahally@hahally:~$ sudo docker images
docker run
启动容器
hahally@hahally:~$ sudo docker run -it --name master -h master registry.cn-beijing.aliyuncs.com/bitnp/docker-spark-hadoop /bin/bash
hahally@hahally:~$ sudo docker start master #启动容器
hahally@hahally:~$ sudo docker exec -i -t master /bin/bash # 进入启动的容器
在容器中配置
进入容器后,安装ssh
相关软件
[root@master local]# yum -y install openssh-clients
[root@master local]# yum -y install openssh-server
ssh
服务相关配置
[root@master local]# /usr/sbin/sshd
[root@master local]# /usr/sbin/sshd-keygen -A
[root@master local]# /usr/sbin/sshd
[root@master local]# ssh-keygen -t rsa # 产生秘钥
[root@master local]# cat /root/.ssh/id_rsa.pub >> /root/.ssh/authorized_keys
[root@master local]# cat /root/.ssh/authorized_keys # 输出秘钥
[root@master local]# vi /etc/ssh/sshd_config # 修改配置文件
Port 22
PermitRootLogin yes
PubkeyAuthentication yes
PasswordAuthentication yes
ChallengeResponseAuthentication no
UsePAM yes
PrintLastLog no
[root@master local]# vi /etc/ssh/ssh_config
StrictHostKeyChecking no
hadoop
集群相关配置
[root@master local]# vi /usr/local/hadoop-2.7.5/etc/hadoop/core-site.xml
[root@master local]# vi /usr/local/hadoop-2.7.5/etc/hadoop/hdfs-site.xml
[root@master local]# vi /usr/local/hadoop-2.7.5/etc/hadoop/mapred-site.xml
[root@master local]# vi /usr/local/hadoop-2.7.5/etc/hadoop/yarn-site.xml
[root@master local]# vi /usr/local/hadoop-2.7.5/etc/hadoop/slaves
[root@master local]# vi /etc/hosts
[root@master local]# vi /etc/profile
core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop-2.7.5/tmp</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop-2.7.5/hdfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop-2.7.5/hdfs/data</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>master:50090</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
<description>用户权限检查关闭</description>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master:8088</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
slaves
slave1
slave2
hosts 【ip映射】
172.17.0.2 master
172.17.0.3 slave1
172.17.0.4 slave2
profile
【在文件结尾加入环境变量】
export JAVA_HOME=/usr/local/jdk1.8.0_162
export HADOOP_HOME=/usr/local/hadoop-2.7.5
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
下载一些软件
[root@master local]# yum install git # 下载git,方便将代码部署到集群中
[root@master local]# yum install net-tools # 下载net相关工具命令
搭建hadoop
集群
上面都只是在一个容器中进行的配置*namenode master
*中
为了减少重复性工作,将当前容器 master
制作成镜像,并用此镜像启动三个容器,master slave1 slave2
,启动master
容器时,-p指定端口映射到宿主机,方便从公网访问容器。
hahally@hahally:~$ sudo docker commit master hadoop:hadoop
hahally@hahally:~$ sudo docker images # 查看刚刚制作的hadoop镜像
hahally@hahally:~$ sudo docker rm master # 删除刚刚的容器master或者rename master 重新命名,避免命名冲突
hahally@hahally:~$ sudo docker run -it -p 9000:9000 -p 9001:9001 --name master -h master registry.cn-beijing.aliyuncs.com/bitnp/docker-spark-hadoop /bin/bash
hahally@hahally:~$ sudo docker run -it --name slave1 -h slave1 registry.cn-beijing.aliyuncs.com/bitnp/docker-spark-hadoop /bin/bash
hahally@hahally:~$ sudo docker run -it --name slave2 -h slave2 registry.cn-beijing.aliyuncs.com/bitnp/docker-spark-hadoop /bin/bash
hahally@hahally:~$ sudo docker start master slave1 slave2
hahally@hahally:~$ sudo docker exec -i -t master /bin/bash
容器每次启动会重新覆盖 /etc/hosts
文件,所以在三个容器中依次将上面的 hosts
内容加进去。
在三个容器中启动ssh
服务【三个容器中的/root/.ssh/authorized_keys
文件中要包含三个容器的秘钥】
[root@master local]# /usr/sbin/sshd
[root@slave1 local]# /usr/sbin/sshd
[root@slave2 local]# /usr/sbin/sshd
确定三个容器可以ssh
免密码连接后,便可以启动集群了
[root@master local]# hdfs namenode -format # 格式化namenode
[root@master local]# /usr/local/hadoop-2.7.5/sbin/start-all.sh # 启动集群
[root@master local]# jps # 查看集群启动情况
192 NameNode
562 ResourceManager
824 Jps
392 SecondaryNameNode