docker 搭建 hadoop集群

docker 搭建 hadoop集群

  Auth    hahally
  date    2019.11.15
abstract   hadoop集群搭建
虚拟机`ubuntu18`版本,拉取阿里云的`hadoop`镜像。该镜像中已经安装配置好了`jdk`

docker下载安装

hahally@hahally:~$ sudo apt-get update   #更新安装包,允许apt通过HTTPS使用存储库
hahally@hahally:~$ sudo apt-get install \
    apt-transport-https \
    ca-certificates \
    curl \
    software-properties-common

添加Docker官方的 GPG 密钥

hahally@hahally:~$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -

设置安装源

hahally@hahally:~$ sudo add-apt-repository \
   "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
   $(lsb_release -cs) \
   stable"
hahally@hahally:~$ sudo apt-get update

安装 docker

hahally@hahally:~$ sudo apt-get install docker-ce

查看 docker 版本

hahally@hahally:~$ sudo docker version

拉取阿里云的 hadoop 镜像

hahally@hahally:~$ sudo mkdir -p /etc/docker
hahally@hahally:~$ sudo tee /etc/docker/daemon.json <<-'EOF'
{
  "registry-mirrors": ["https://lqbkkmob.mirror.aliyuncs.com"]
}
EOF
hahally@hahally:~$ sudo systemctl daemon-reload
hahally@hahally:~$ sudo systemctl restart docker
hahally@hahally:~$ sudo docker pull registry.cn-beijing.aliyuncs.com/bitnp/docker-spark-hadoop

使用 sudo docker images 查看镜像

hahally@hahally:~$ sudo docker images

docker run 启动容器

hahally@hahally:~$ sudo docker run -it --name master -h master registry.cn-beijing.aliyuncs.com/bitnp/docker-spark-hadoop /bin/bash
hahally@hahally:~$ sudo docker start master #启动容器
hahally@hahally:~$ sudo docker exec -i -t master /bin/bash  # 进入启动的容器

在容器中配置

进入容器后,安装ssh相关软件

[root@master local]# yum -y install openssh-clients
[root@master local]# yum -y install openssh-server

ssh服务相关配置

[root@master local]# /usr/sbin/sshd
[root@master local]# /usr/sbin/sshd-keygen -A 
[root@master local]# /usr/sbin/sshd
[root@master local]# ssh-keygen -t rsa               # 产生秘钥
[root@master local]# cat /root/.ssh/id_rsa.pub >> /root/.ssh/authorized_keys
[root@master local]# cat /root/.ssh/authorized_keys  # 输出秘钥
[root@master local]# vi /etc/ssh/sshd_config            # 修改配置文件
Port 22
PermitRootLogin yes
PubkeyAuthentication yes
PasswordAuthentication yes
ChallengeResponseAuthentication no
UsePAM yes
PrintLastLog no
[root@master local]# vi /etc/ssh/ssh_config
StrictHostKeyChecking no

hadoop集群相关配置

[root@master local]# vi /usr/local/hadoop-2.7.5/etc/hadoop/core-site.xml
[root@master local]# vi /usr/local/hadoop-2.7.5/etc/hadoop/hdfs-site.xml
[root@master local]# vi /usr/local/hadoop-2.7.5/etc/hadoop/mapred-site.xml
[root@master local]# vi /usr/local/hadoop-2.7.5/etc/hadoop/yarn-site.xml
[root@master local]# vi /usr/local/hadoop-2.7.5/etc/hadoop/slaves
[root@master local]# vi /etc/hosts
[root@master local]# vi /etc/profile

core-site.xml

<configuration>
      <property>
          <name>fs.defaultFS</name>
          <value>hdfs://master:9000</value>
      </property>
      <property>
         <name>io.file.buffer.size</name>
         <value>131072</value>
     </property>
     <property>
          <name>hadoop.tmp.dir</name>
          <value>/usr/local/hadoop-2.7.5/tmp</value>
     </property>
</configuration>

hdfs-site.xml

<configuration>
    <property>
      <name>dfs.replication</name>
      <value>2</value>
    </property>
    <property>
      <name>dfs.namenode.name.dir</name>
      <value>file:/usr/local/hadoop-2.7.5/hdfs/name</value>
    </property>
    <property>
      <name>dfs.datanode.data.dir</name>
      <value>file:/usr/local/hadoop-2.7.5/hdfs/data</value>
    </property>
	<property>
	  <name>dfs.namenode.secondary.http-address</name>
	  <value>master:50090</value>
	</property>
	<property>
	   <name>dfs.permissions</name>
	   <value>false</value>
	   <description>用户权限检查关闭</description>
	</property>
</configuration>

mapred-site.xml

<configuration>
 <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>
</configuration>

yarn-site.xml

<configuration>
     <property>
         <name>yarn.nodemanager.aux-services</name>
         <value>mapreduce_shuffle</value>
     </property>
     <property>
         <name>yarn.resourcemanager.address</name>
         <value>master:8032</value>
     </property>
     <property>
         <name>yarn.resourcemanager.scheduler.address</name>
         <value>master:8030</value>
     </property>
     <property>
         <name>yarn.resourcemanager.resource-tracker.address</name>
         <value>master:8031</value>
     </property>
     <property>
         <name>yarn.resourcemanager.admin.address</name>
         <value>master:8033</value>
     </property>
     <property>
         <name>yarn.resourcemanager.webapp.address</name>
         <value>master:8088</value>
     </property>
     <property>
         <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
         <value>org.apache.hadoop.mapred.ShuffleHandler</value>
     </property>
</configuration>

slaves

slave1
slave2

hosts 【ip映射】

172.17.0.2    master
172.17.0.3    slave1
172.17.0.4    slave2

profile【在文件结尾加入环境变量】

export JAVA_HOME=/usr/local/jdk1.8.0_162
export HADOOP_HOME=/usr/local/hadoop-2.7.5
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

下载一些软件

[root@master local]# yum install git         # 下载git,方便将代码部署到集群中
[root@master local]# yum install net-tools   # 下载net相关工具命令

搭建hadoop集群

上面都只是在一个容器中进行的配置*namenode master*中
为了减少重复性工作,将当前容器 master 制作成镜像,并用此镜像启动三个容器,master slave1 slave2,启动master容器时,-p指定端口映射到宿主机,方便从公网访问容器。

hahally@hahally:~$ sudo docker commit master hadoop:hadoop
hahally@hahally:~$ sudo docker images    # 查看刚刚制作的hadoop镜像
hahally@hahally:~$ sudo docker rm master # 删除刚刚的容器master或者rename master 重新命名,避免命名冲突
hahally@hahally:~$ sudo docker run -it -p 9000:9000 -p 9001:9001 --name master -h master registry.cn-beijing.aliyuncs.com/bitnp/docker-spark-hadoop /bin/bash
hahally@hahally:~$ sudo docker run -it --name slave1 -h slave1 registry.cn-beijing.aliyuncs.com/bitnp/docker-spark-hadoop /bin/bash
hahally@hahally:~$ sudo docker run -it --name slave2 -h slave2 registry.cn-beijing.aliyuncs.com/bitnp/docker-spark-hadoop /bin/bash
hahally@hahally:~$ sudo docker start master slave1 slave2
hahally@hahally:~$ sudo docker exec -i -t master /bin/bash 

容器每次启动会重新覆盖 /etc/hosts 文件,所以在三个容器中依次将上面的 hosts 内容加进去。
在三个容器中启动ssh服务【三个容器中的/root/.ssh/authorized_keys文件中要包含三个容器的秘钥】

[root@master local]# /usr/sbin/sshd
[root@slave1 local]# /usr/sbin/sshd
[root@slave2 local]# /usr/sbin/sshd

确定三个容器可以ssh免密码连接后,便可以启动集群了

[root@master local]# hdfs namenode -format    # 格式化namenode
[root@master local]# /usr/local/hadoop-2.7.5/sbin/start-all.sh   # 启动集群
[root@master local]# jps      # 查看集群启动情况
192 NameNode
562 ResourceManager
824 Jps
392 SecondaryNameNode
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值