Docker虚拟容器部署Hadoop+Spark集群 --展示

Docker虚拟容器部署Hadoop+Spark集群 --展示

DockerFile

FROM centos:centos7.7.1908
MAINTAINER "blue"
LABEL name="Hadoop"

RUN yum -y install openssh-server openssh-clients sudo vim net-tools expect

RUN groupadd -g 1124 hadoop && useradd -m -u 1124 -g hadoop -d /home/hadoop hadoop
RUN echo "hadoop:hadoop" | chpasswd
RUN echo "root:root" | chpasswd
RUN echo "hadoop    ALL=(ALL)   NOPASSWD:ALL" >> /etc/sudoers 
#生成相应的主机密钥文件
RUN ssh-keygen -t rsa -f /etc/ssh/ssh_host_rsa_key
RUN ssh-keygen -t ecdsa -f /etc/ssh/ssh_host_ecdsa_key
RUN ssh-keygen -t ed25519 -f /etc/ssh/ssh_host_ed25519_key
RUN ssh-keygen -b 2048 -t rsa -f ~/.ssh/id_rsa -q -N "" && \
        cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

#创建模块和软件目录并修改权限
RUN mkdir /opt/software && mkdir /opt/moudle

#将宿主机的文件拷贝至镜像(ADD会自动解压)
ADD hadoop-3.2.1.tar.gz /opt/software
ADD jdk-8u212-linux-x64.tar.gz /opt/moudle
RUN mv /opt/software/hadoop-3.2.1 /opt/software/hadoop
RUN mv /opt/moudle/jdk1.8.0_212 /opt/moudle/jdk
ADD spark-3.0.0-preview2-bin-hadoop3.2.tar.gz /opt/software
RUN mv /opt/software/spark-3.0.0-preview2-bin-hadoop3.2 /opt/software/spark
RUN chown -R hadoop:hadoop /opt/moudle && chown -R hadoop:hadoop /opt/software

COPY CopyID /opt/moudle
RUN chmod +x /opt/moudle/CopyID

#设置环境变量
ENV CENTOS_DEFAULT_HOME /opt/software/hadoop
ENV JAVA_HOME /opt/moudle/jdk
ENV HADOOP_HOME /opt/software/hadoop
ENV JRE_HOME ${JAVA_HOME}/jre
ENV CLASSPATH ${JAVA_HOME}/lib:${JRE_HOME}/lib
ENV HADOOP_CONF_DIR=/opt/software/hadoop/etc/hadoop
ENV SPARK_HOME=/opt/software/spark
ENV PATH ${JAVA_HOME}/bin:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:$SPARK_HOME/bin:$PATH
#终端默认登录进来的工作目录
WORKDIR $CENTOS_DEFAULT_HOME
#启动sshd服务并且暴露22端口 
EXPOSE 22
CMD ["/usr/sbin/sshd", "-D"]

为了方便,添加进镜像的hadoop和spark均为配置过的

hadoop具体配置如下

1.配置core-site.xml

$ vim /opt/hadoop/etc/hadoop/core-site.xml
<configuration>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/opt/software/hadoop/tmp</value>
        <description>Abase for other temporary directories.</description>
    </property>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://master:9000</value>
    </property>
</configuration>
  • fs.defaultFS:默认文件系统,HDFS的客户端访问HDFS需要此参数
  • hadoop.tmp.dir:指定Hadoop数据存储的临时目录,其它目录会基于此路径, 建议设置到一个足够空间的地方,而不是默认的/tmp下

2.配置hdfs-site.xml

$ vim /opt/hadoop/etc/hadoop/hdfs-site.xml
<configuration>
    <property>
        <name>dfs.replication</name>
        <value>3</value>
    </property>
    <property>
        <name>dfs.name.dir</name>
        <value>/opt/software/hadoop/hdfs/name</value>
    </property>
    <property>
        <name>dfs.data.dir</name>
        <value>/opt/software/hadoop/hdfs/data</value>
    </property>
</configuration>
  • dfs.replication:数据块副本数
  • dfs.name.dir:指定namenode节点的文件存储目录
  • dfs.data.dir:指定datanode节点的文件存储目录

3.配置mapred-site.xml

<configuration>
  <property>
      <name>mapreduce.framework.name</name>
      <value>yarn</value>
  </property>
  <property>
    <name>mapreduce.application.classpath</name>
    <value>$HADOOP_HOME/share/hadoop/mapreduce/*:$HADOOP_HOME/share/hadoop/mapreduce/lib/*</value>
  </property>
</configuration>

4.配置yarn-site.xml

<configuration>
<!-- Site specific YARN configuration properties -->
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
            <name>yarn.resourcemanager.hostname</name>
            <value>master</value>
    </property>
    <property>
        <name>yarn.nodemanager.env-whitelist</name>
        <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_HOME</value>
    </property>
</configuration>

6.配置workers

slave1
slave2

7.配置hdfs-env.sh

#添加以下内容
export JAVA_HOME=/opt/moudle/jdk
spark具体配置如下
  1. spark-env.sh
export SPARK_DIST_CLASSPATH=$(/opt/software/hadoop/bin/hadoop classpath)
export JAVA_HOME=/opt/moudle/jdk
export SPARK_MASTER_IP=172.20.0.2
export HADOOP_HOME=/opt/software/hadoop
export HADOOP_CONF_DIR=/opt/software/hadoop/etc/hadoop
  1. slaves
slaveOne
slaveTwo

启动脚本

start-hadoop-docker.sh
#!/bin/bash
MASTER=master
SLAVEONE=slaveOne
SLAVETWO=slaveTwo
IMAGES=4a3ac56328f3
arr=($MASTER $SLAVEONE $SLAVETWO)
NETWORK=hadoop
USER=hadoop
STARTNUMOFNODE=3
para=$1

startdocker() {
    for (( i=0;i<$STARTNUMOFNODE;i++ ))
        do
            if [ $i -eq 0 ];then
                docker run -it --name ${arr[$i]} -d --net $NETWORK --ip 172.20.0.2 -P -p 9000:9000 -p9870:9870 -p 8080:8080 --hostname ${arr[$i]} --add-host slaveOne:172.20.0.3  --add-host slaveTwo:172.20.0.4 --privileged=true $IMAGES
            else
            docker run -it --name ${arr[$i]} -d --net $NETWORK --ip 172.20.0.$[$i+2] -P --hostname ${arr[$i]}   --add-host master:172.20.0.2 --add-host slaveOne:172.20.0.3  --add-host slaveTwo:172.20.0.4 --privileged=true $IMAGES
            fi
        done
    echo "正在转发密钥"
    docker exec --user $USER $MASTER /bin/bash -c "/usr/bin/ssh-keygen -b 2048 -t rsa -f ~/.ssh/id_rsa -q -N ''"
    docker exec --user $USER $MASTER /bin/bash -c "cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys"
    echo "正在转发密钥" 
    docker exec  --user $USER $MASTER  /bin/bash -c "/opt/moudle/CopyID hadoop hadoop $SLAVEONE"
    docker exec  --user $USER $MASTER  /bin/bash -c "/opt/moudle/CopyID hadoop hadoop $SLAVETWO"
    echo "容器启动完毕!"
    echo "容器信息:"
    docker ps -a
}

stopdocker() {
    docker stop $(docker ps -qa)
    docker rm $(docker ps -qa)
}

list() {
    echo "容器信息:" 
    docker ps -a
}

ready() {
    if [ $para = start ]
    then
        startdocker
    elif [ $para = list ]
    then 
        list
    elif [ $para = stop ]
    then 
        stopdocker
    elif [ $para = startHdfs ]
    then
        startHdfs
    elif [ $para = stopHdfs ]
    then 
        stopHdfs
    else 
        echo "$1 is not fount"
    fi
}
ready
  • 说明,必须将IMAGE参数修改为你的镜像id

  • 启动之前,您需要自定义各个hadoop网络,去过您不太清楚可以执行下面语句

docker network create --subnet=172.20.0.0/16 hadoop 
  • 通过启动三个容器
sh start-hadoop-docker.sh start
  • 通过stop参数,停止所有容器,并删除容器
sh start-hadoop-docker.sh stop
  • 通过list参数列出容器信息
sh start-hadoop-docker.sh list

使用方法

启动容器后

开启集群
  • 进入master节点容器
docker exec -it --user hadoop master /bin/bash
  • 格式化namenode
hdfs namenode -format
  • 启动hadoop集群
stat-all.sh
  • 启动spark集群
cd /opt/softwarespark
sbin/start-all.sh
关闭集群
  • 关闭hadoop集群
stop-all.sh
  • 关闭spark集群
cd /opt/softwarespark
sbin/stop-all.sh

CopyID说明

脚本内容:
#!/usr/bin/expect  
set timeout 10  
set username [lindex $argv 0]  
set password [lindex $argv 1]  
set hostname [lindex $argv 2]  
spawn ssh-copy-id -i $username@$hostname  
expect "yes/no"  
send "yes\r"  
expect "password:"  
send "$password\r"  
  
expect eof  

这个小jio本就是用来转发密钥,实现免密登录的,找了很多解决办法,最终还是expect好用,淦!

tips:

expect脚本传参数不能用$0,$1…,而应该用set username [lindex $argv 0]

Web页面,端口映射

默认将宿主机的9000,9870,8080映射到容器上

hadoop

访问localhost:9870去管理Hadoop集群

spark

访问localhost:8080管理spark

other

另外一些端口没有进行映射,可以自行访问172.20.0.2:port

  • 3
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值