【生产级实践】Docker部署配置Hadoop3.x + HBase2.x实现真正分布式集群环境

网上找了很多资料,但能够实现Docker安装Hadoop3.X和Hbase2.X真正分布式集群的教程很零散,坑很多, 把经验做了整理, 避免趟坑。

一、安装Docker Hadoop3.X分布式集群

1、机器环境

这里采用三台机器来部署分布式集群环境:

192.168.1.101 hadoop1 (docker管理节点)

192.168.1.102 hadoop2

192.168.1.103 hadoop3

2、下载Docker Hadoop的配置文件

地址: https://github.com/big-data-europe/docker-hadoop/tree/2.0.0-hadoop3.1.3-java8

根据需要切换分支选择版本,这里选择hadoop3.1.3版本。

3、安装Docker

自行参考之前教程, 这里安装版本为docker-ce-3:20.10.8-3.el7.x86_64

4、系统配置
# 关闭防火墙
systemctl stop firewalld
# 永久关闭
systemctl disable firewalld
# 重启docker(更改网络环境需要重启)
systemctl restart docker
5、安装Docker Compose
# 下载配置Compose
curl -SL https://github.com/docker/compose/releases/download/1.29.0/docker-compose-linux-x86_64 -o /usr/local/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-compose
# 查看版本
[root@localhost ~]# docker-compose  version
docker-compose version 1.29.0, build 07737305
docker-py version: 5.0.0
CPython version: 3.7.10
OpenSSL version: OpenSSL 1.1.0l  10 Sep 2019

6、拉取相关镜像

1) 拉取hadoop的镜像

# 解压docker-hadoop-2.0.0-hadoop3.1.3-java8
tar -xvf docker-hadoop-2.0.0-hadoop3.1.3-java8.zip
# 运行docker-compose脚本, 拉取hadoop相关的镜像
docker-compose up 

成功之后可以看到相关的容器实例:

在这里插入图片描述

运行完成后, 删除容器实例:

docker rm $(docker ps -aq)

删除磁盘卷:

docker volume rm $(docker volume ls |awk ‘{print $2}’)

删除网络:

docker network rm docker-hadoop-200-hadoop313-java8_default

2) 拉取traefik镜像

traefik是一款网络工具,能够实现容器内部的反向代理与负载均衡

docker pull traefik:2.9.10

3) 拉取zookeeper镜像

docker pull zookeeper:3.4.10

以上步骤, 三台机器都分别执行, 确保docker环境与镜像都已经准备好。

7、配置docker swarm环境

在管理节点执行:

[root@hadoop1 ~]# docker swarm init --advertise-addr 192.168.1.101
Swarm initialized: current node (swfdinosstcc5h9k1wkz1bp9l) is now a manager.

To add a worker to this swarm, run the following command:

    docker swarm join --token SWMTKN-1-1xlri07uvjsjscxalipmtcqrfzk6bh9rasrh1mnx0xt2trq20h-6h1szze1p8d7ag6in1ejxc6wi 192.168.1.101:2377

To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.

其他两个节点执行生成join命令, 加入swarm管理

加入成功后, 检查:

[root@hadoop1 ~]# docker node ls
ID                            HOSTNAME   STATUS    AVAILABILITY   MANAGER STATUS   ENGINE VERSION
swfdinosstcc5h9k1wkz1bp9l *   hadoop1    Ready     Active         Leader           20.10.8
o3v4fekz682vl7whgzov9nybd     hadoop2    Ready     Active                          20.10.8
1cptz9kcz4d65llkq8j7kwrnn     hadoop3    Ready     Active                          20.10.8
8、配置集群网络环境
1) 创建hbase集群内部网络: 
# docker network create --driver overlay --attachable --subnet 10.20.0.0/24 hbase
docker network create -d overlay --attachable hbase
2) 给swarm的子节点增加标签, 标识为数据节点datanode:
# 这里集群配置两个数据节点, 视具体情况配置,在docker compose的yml配置文件中会使用到:
docker node update --label-add hadoop-datanode=datanode hadoop2
docker node update --label-add hadoop-datanode=datanode hadoop3
9、配置Hadoop的docker-compose文件

这里主节点部署一个namenode、resourcemanager和historyserver,两个从节点分各部署一个datanode和nodemanager。

# 修改配置文件名称
mv docker-compose-v3.yml docker-compose-hadoop.yml

进入目录:

cd /usr/local/hadoop-hbase/docker-hadoop-2.0.0-hadoop3.1.3-java8

修改 docker-compose-hadoop.yml的具体配置:

version: '3'

services:
  namenode:
    image: bde2020/hadoop-namenode:2.0.0-hadoop3.1.3-java8
    networks:
      - hbase
    ports:
      - 19870:9870
      - 19000:9000
    volumes:
      - namenode:/hadoop/dfs/name
    environment:
      - CLUSTER_NAME=test
    env_file:
      - ./hadoop.env
    deploy:
      mode: replicated
      replicas: 1
      restart_policy:
        condition: on-failure
        max_attempts: 3
      placement:
        constraints:
          - node.hostname == hadoop1
      labels:
        traefik.docker.network: hbase
        traefik.port: 9870

  datanode:
    image: bde2020/hadoop-datanode:2.0.0-hadoop3.1.3-java8
    networks:
      - hbase
    ports:
      - 19864:9864
    volumes:
      - datanode:/hadoop/dfs/data
    env_file:
      - ./hadoop.env
    environment:
      SERVICE_PRECONDITION: "namenode:9870"
    deploy:
      mode: global
      restart_policy:
        condition: on-failure
        max_attempts: 3
      placement:
        constraints:
          - node.labels.hadoop-datanode == datanode
      labels:
        traefik.docker.network: hbase
        traefik.port: 9864
  resourcemanager:
    image: bde2020/hadoop-resourcemanager:2.0.0-hadoop3.1.3-java8
    networks:
      - hbase
    ports:
      - 18088:8088
    environment:
      SERVICE_PRECONDITION: "namenode:9000 datanode:9864"
    env_file:
      - ./hadoop.env
    deploy:
      mode: replicated
      replicas: 1
      restart_policy:
        condition: on-failure
        max_attempts: 3
      placement:
        constraints:
          - node.hostname == hadoop1
      labels:
        traefik.docker.network: hbase
        traefik.port: 8088
    healthcheck:
      disable: true

  nodemanager:
    image: bde2020/hadoop-nodemanager:2.0.0-hadoop3.1.3-java8
    networks:
      - hbase
    ports:
      - 18042:8042
    environment:
      SERVICE_PRECONDITION: "namenode:9000 datanode:9864 resourcemanager:8088"
    env_file:
      - ./hadoop.env
    deploy:
      mode: global
      restart_policy:
        condition: on-failure
        max_attempts: 3
      placement:
        constraints:
          - node.labels.hadoop-datanode == datanode
      labels:
        traefik.docker.network: hbase
        traefik.port: 8042
  historyserver:
    image: bde2020/hadoop-historyserver:2.0.0-hadoop3.1.3-java8
    networks:
      - hbase
    ports:
      - 18188:8188
    volumes:
      - hadoophistoryserver:/hadoop/yarn/timeline
    environment:
      SERVICE_PRECONDITION: "namenode:9000 datanode:9864 nodemanager:8042 resourcemanager:8088"
    env_file:
      - ./hadoop.env
    deploy:
      mode: replicated
      replicas: 1
      placement:
        constraints:
          - node.hostname == hadoop1
      labels:
        traefik.docker.network: hbase
        traefik.port: 8188

volumes:
  datanode:
  namenode:
  hadoophistoryserver:

networks:
  hbase:
    external:
      name: hbase        
10、部署hadoop集群环境
docker stack deploy -c docker-compose-hadoop.yml hadoop

成功启动后, 主节点可以看到对应的实例信息:

在这里插入图片描述

其他两个节点, 可以看到nodemanager与datanode实例:

在这里插入图片描述

如果失败, 删除重建:

docker stack rm  hadoop
11、访问管理界面

安装成功后, 可以通过宿主机的映射端口直接访问:

http://192.168.1.101:19870/

在这里插入图片描述

二、安装Docker HBase2.X分布式集群

1、自定义生成Hbase镜像

1)构建Dockerfile脚本:

进入目录:

mkdir -p /usr/local/hadoop-hbase/docker-hbase-master/hbase_base
cd /usr/local/hadoop-hbase/docker-hbase-master/hbase_base

Dockerfile脚本

FROM debian:9

MAINTAINER Mirson <mirson.ho@gmail.com>

RUN echo > /etc/apt/sources.list
RUN echo  "deb http://mirrors.aliyun.com/debian/ stretch main non-free contrib \ndeb-src http://mirrors.aliyun.com/debian/ stretch main non-free contrib \ndeb http://mirrors.aliyun.com/debian-security stretch/updates main \ndeb-src http://mirrors.aliyun.com/debian-security stretch/updates main \ndeb http://mirrors.aliyun.com/debian/ stretch-updates main non-free contrib \ndeb-src http://mirrors.aliyun.com/debian/ stretch-updates main non-free contrib \ndeb http://mirrors.aliyun.com/debian/ stretch-backports main non-free contrib \ndeb-src http://mirrors.aliyun.com/debian/ stretch-backports main non-free contrib" > /etc/apt/sources.list

RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
      openjdk-8-jdk \
      net-tools \
      curl \
      netcat \
      gnupg \
      libtinfo5 \
      vim \
    && rm -rf /var/lib/apt/lists/*

ENV JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/

ENV HBASE_VERSION 2.3.5
ENV HBASE_URL http://archive.apache.org/dist/hbase/$HBASE_VERSION/hbase-$HBASE_VERSION-bin.tar.gz
RUN set -x \
    && curl -fSL "$HBASE_URL" -o /tmp/hbase.tar.gz \
    && curl -fSL "$HBASE_URL.asc" -o /tmp/hbase.tar.gz.asc \
    && tar -xvf /tmp/hbase.tar.gz -C /opt/ \
    && rm /tmp/hbase.tar.gz*

RUN ln -s /opt/hbase-$HBASE_VERSION/conf /etc/hbase
RUN mkdir /opt/hbase-$HBASE_VERSION/logs
COPY core-site.xml /opt/hbase-$HBASE_VERSION/conf
COPY hdfs-site.xml /opt/hbase-$HBASE_VERSION/conf
RUN mkdir /hadoop-data

ENV HBASE_PREFIX=/opt/hbase-$HBASE_VERSION
ENV HBASE_CONF_DIR=/etc/hbase

ENV USER=root
ENV PATH $HBASE_PREFIX/bin/:$PATH

ADD entrypoint.sh /entrypoint.sh
RUN chmod a+x /entrypoint.sh

EXPOSE 16000 16010 16020 16030

ENTRYPOINT ["/entrypoint.sh"]

将hadoop的配置文件COPY至当前目录下

docker cp b1e6:/opt/hadoop-3.1.3/etc/hadoop/core-site.xml .
docker cp b1e6:/opt/hadoop-3.1.3/etc/hadoop/hdfs-site.xml .

2)创建entrypoint.sh脚本, 用于实现对hbase的参数配置管理

#!/bin/bash

function addProperty() {
  local path=$1
  local name=$2
  local value=$3

  local entry="<property><name>$name</name><value>${value}</value></property>"
  local escapedEntry=$(echo $entry | sed 's/\//\\\//g')
  sed -i "/<\/configuration>/ s/.*/${escapedEntry}\n&/" $path
}

function configure() {
    local path=$1
    local module=$2
    local envPrefix=$3

    local var
    local value

    echo "Configuring $module"
    for c in `printenv | perl -sne 'print "$1 " if m/^${envPrefix}_(.+?)=.*/' -- -envPrefix=$envPrefix`; do
        name=`echo ${c} | perl -pe 's/___/-/g; s/__/_/g; s/_/./g'`
        var="${envPrefix}_${c}"
        value=${!var}
        echo " - Setting $name=$value"
        addProperty /etc/hbase/$module-site.xml $name "$value"
    done
}

configure /etc/hbase/hbase-site.xml hbase HBASE_CONF

function wait_for_it()
{
    local serviceport=$1
    local service=${serviceport%%:*}
    local port=${serviceport#*:}
    local retry_seconds=5
    local max_try=100
    let i=1

    nc -z $service $port
    result=$?

    until [ $result -eq 0 ]; do
      echo "[$i/$max_try] check for ${service}:${port}..."
      echo "[$i/$max_try] ${service}:${port} is not available yet"
      if (( $i == $max_try )); then
        echo "[$i/$max_try] ${service}:${port} is still not available; giving up after ${max_try} tries. :/"
        exit 1
      fi

      echo "[$i/$max_try] try in ${retry_seconds}s once again ..."
      let "i++"
      sleep $retry_seconds

      nc -z $service $port
      result=$?
    done
    echo "[$i/$max_try] $service:${port} is available."
}

for i in "${SERVICE_PRECONDITION[@]}"
do
    wait_for_it ${i}
done

exec $@

3)构建镜像

进入Dockerfile目录, 执行:

docker build -f ./Dockerfile -t  bde2020/hadoop-hmaster:2.0.0-hmaster2.3.5-java8 .

注意后面要有个点号“.”,如果下载hbase包太慢, 可以先下载好, 再上传

...

ENV HBASE_VERSION 2.3.5
ADD hbase-2.3.5-bin.tar.gz /opt/
RUN ln -s /opt/hbase-$HBASE_VERSION/conf /etc/hbase
...

ADD与COPY命令不同, 会自行解压。

4) 将生成的镜像同步至其他节点

# 导出镜像
docker save bde2020/hadoop-hmaster:2.0.0-hmaster2.3.5-java8 > hbase_image.tar
# 复制其他节点
scp hbase_image.tar root@192.168.102:/root
scp hbase_image.tar root@192.168.103:/root
# 导入镜像
docker load -i hbase_image.tar
2、部署ZooKeeper

这里搭建三个节点的Zookeeper集群。

1)docker-compose-zookeeper集群配置脚本:

version: '3'

services:
  zoo1:
    image: zookeeper:3.4.10
    networks:
      - hbase
    volumes:
      - zoo1_data:/data
    deploy:
      mode: replicated
      replicas: 1
      restart_policy:
        condition: on-failure
      placement:
        constraints:
          - node.hostname == hadoop1
    environment:
      ZOO_MY_ID: 1
      ZOO_SERVERS: server.1=0.0.0.0:2888:3888 server.2=zoo2:2888:3888 server.3=zoo3:2888:3888
  zoo2:
    image: zookeeper:3.4.10
    networks:
      - hbase
    volumes:
      - zoo2_data:/data
    deploy:
      mode: replicated
      replicas: 1
      restart_policy:
        condition: on-failure
      placement:
        constraints:
          - node.hostname == hadoop2
    environment:
      ZOO_MY_ID: 2
      ZOO_SERVERS: server.1=zoo1:2888:3888 server.2=0.0.0.0:2888:3888 server.3=zoo3:2888:3888
  zoo3:
    image: zookeeper:3.4.10
    networks:
      - hbase
    volumes:
      - zoo3_data:/data
    deploy:
      mode: replicated
      replicas: 1
      restart_policy:
        condition: on-failure
      placement:
        constraints:
          - node.hostname == hadoop3
    environment:
      ZOO_MY_ID: 3
      ZOO_SERVERS: server.1=zoo1:2888:3888 server.2=zoo2:2888:3888 server.3=0.0.0.0:2888:3888

volumes:
  zoo1_data:
  zoo2_data:
  zoo3_data:

networks:
  hbase: 
    external: 
      name: hbase

2) 部署Zookeeper集群:

docker stack deploy -c docker-compose-zookeeper.yml zookeeper

3)查看容器

[root@hadoop1 docker-hbase-master]# docker ps -a
CONTAINER ID   IMAGE                                                    COMMAND                   CREATED          STATUS                 PORTS                          NAMES
fe84630dce2d   zookeeper:3.4.10                                         "/docker-entrypoint.…"   48 seconds ago   Up 47 seconds          2181/tcp, 2888/tcp, 3888/tcp   zookeeper_zoo1.1.r98xzm9bklsdau1ydu4eug5d3
...

执行成功, 每台节点会新增一个zookeeper的实例。

如果失败, 删除重新部署:

docker stack rm zookeeper
3、部署traefik

需安装此组件, 用于负责管理hbase内部节点的名称连接。

docker service create --name traefik --constraint node.hostname==hadoop1 --publish 18880:80 --publish 18080:8080 --mount type=bind,source=/var/run/docker.sock,target=/var/run/docker.sock --network hbase traefik --api.insecure=true --providers.docker

成功后可以看到对应服务名:

在这里插入图片描述

访问后台管理界面:

在这里插入图片描述

4、配置Hbase的docker-compose文件

这里在主节点运行一个HMaster, 其余两个节点运行regionserver。

1) docker-compose-hbase.yml配置脚本:

version: '3.2'

services:
  HMaster:
    image: bde2020/hadoop-hmaster:2.0.0-hmaster2.3.5-java8
    networks:
      - hbase
    ports:
      - target: 16000
        published: 16000
        protocol: tcp
        mode: host
      - target: 16010
        published: 16010
        protocol: tcp
        mode: host
    env_file:
      - ./hbase.env
    command:
      - /opt/hbase-2.3.5/bin/hbase master start          
    deploy:
      mode: replicated
      replicas: 1
      endpoint_mode: dnsrr
      restart_policy:
        condition: none
      placement:
        constraints:
          - node.hostname == hadoop1
      labels:
        traefik.docker.network: hbase
        traefik.port: 16010
    

  RegionServer1:
    image: bde2020/hadoop-hmaster:2.0.0-hmaster2.3.5-java8
    networks:
      - hbase
    ports:
      - target: 16020
        published: 26020
        protocol: tcp
        mode: host
      - target: 16030
        published: 26030
        protocol: tcp
        mode: host
    env_file:
      - ./hbase.env    
    command:
      - /opt/hbase-2.3.5/bin/hbase regionserver start            
    deploy:
      mode: replicated
      replicas: 1
      endpoint_mode: dnsrr
      restart_policy:
        condition: none
      placement:
        constraints:
          - node.hostname == hadoop2
    environment:
      HBASE_CONF_hbase_regionserver_hostname: RegionServer1

  RegionServer2:
    image: bde2020/hadoop-hmaster:2.0.0-hmaster2.3.5-java8
    networks:
      - hbase
    ports:
      - target: 16020
        published: 36020
        protocol: tcp
        mode: host
      - target: 16030
        published: 36030
        protocol: tcp
        mode: host
    env_file:
      - ./hbase.env    
    command:
      - /opt/hbase-2.3.5/bin/hbase regionserver start            
    deploy:
      mode: replicated
      replicas: 1
      endpoint_mode: dnsrr
      restart_policy:
        condition: none
      placement:
        constraints:
          - node.hostname == hadoop3
    environment:
      HBASE_CONF_hbase_regionserver_hostname: RegionServer2

networks:
  hbase:
    external:
      name: hbase

2)hbase.env 配置文件,管理hbase的配置

HBASE_CONF_hbase_rootdir=hdfs://namenode:9000/hbase
HBASE_CONF_hbase_cluster_distributed=true
HBASE_CONF_hbase_zookeeper_quorum=zoo1,zoo2,zoo3

HBASE_CONF_hbase_master=HMaster:16000
HBASE_CONF_hbase_master_hostname=HMaster
HBASE_CONF_hbase_master_port=16000
HBASE_CONF_hbase_master_info_port=16010
HBASE_CONF_hbase_regionserver_port=16020
HBASE_CONF_hbase_regionserver_info_port=16030

HBASE_MANAGES_ZK=false
5、部署Hbase集群环境
docker stack deploy -c docker-compose-hbase.yml hbase

访问HBase的管理界面:

在这里插入图片描述

6、验证Hbase环境
# 进入HMaster容器
docker exec -it 6a5b bash
# 进入Shell
hbase shell
# 创建表
hbase(main):001:0> create 'mirson','country','address','email'
Created table mirson
Took 1.2737 seconds 
# 查看表
hbase(main):002:0> list
TABLE                     
mirson                     
1 row(s)
Took 0.0209 seconds          
=> ["mirson"]

查看管理界面,显示刚才创建的表:

在这里插入图片描述

至此, Hadoop3.X + Hbase2.X的分布式集群环境已搭建完成。

所有配置文件的资源已上传, 可下载参考:
https://download.csdn.net/download/hxx688/87668984

7、FAQ问题

1) 如果启动hbase集群后报错: There are 2 datanode(s) running and 2 node(s) are excluded in this operation…

将hadoop下的core-site.xml与hdfs-site.xml复制到Hbase的配置目录下。

2) 如果出现: port published with ingress mode can’t be used with dnsrr mode

在docker-compose配置文件中,将端口改成此方式映射:

  ports:
    - target: 3000
      published: 3000
      protocol: tcp
      mode: host
  • 1
    点赞
  • 11
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 2
    评论
### 回答1: 很抱歉,我是一名语言模型AI,无法亲自进行实际操作。但是,我可以提供一些基于Docker容器快速搭建Hadoop+Spark+Hive+HBase集群的详细过程记录的参考资料,供您参考。 1. 准备工作 首先,需要安装DockerDocker Compose。可以参考官方文档进行安装。 2. 下载镜像 可以从Docker Hub上下载Hadoop、Spark、Hive和HBase的镜像。可以使用以下命令: ``` docker pull bde2020/hadoop-base:2.0.0-hadoop3.2.1-java8 docker pull bde2020/hadoop-namenode:2.0.0-hadoop3.2.1-java8 docker pull bde2020/hadoop-datanode:2.0.0-hadoop3.2.1-java8 docker pull bde2020/hadoop-resourcemanager:2.0.0-hadoop3.2.1-java8 docker pull bde2020/hadoop-nodemanager:2.0.0-hadoop3.2.1-java8 docker pull bde2020/hadoop-historyserver:2.0.0-hadoop3.2.1-java8 docker pull bde2020/hive:2.3.7-postgresql-metastore docker pull bde2020/spark-base:2.4.5-hadoop2.7 docker pull bde2020/spark-master:2.4.5-hadoop2.7 docker pull bde2020/spark-worker:2.4.5-hadoop2.7 docker pull bde2020/hbase:2.2.4-hadoop3.2.1-java8 ``` 3. 编写docker-compose.yml文件 可以编写一个docker-compose.yml文件来定义Hadoop、Spark、Hive和HBase的容器。以下是一个示例: ``` version: '3' services: namenode: image: bde2020/hadoop-namenode:2.0.0-hadoop3.2.1-java8 container_name: namenode ports: - "9870:9870" volumes: - ./hadoop-data/namenode:/hadoop/dfs/name environment: - CLUSTER_NAME=hadoop-cluster datanode: image: bde2020/hadoop-datanode:2.0.0-hadoop3.2.1-java8 container_name: datanode volumes: - ./hadoop-data/datanode:/hadoop/dfs/data environment: - CLUSTER_NAME=hadoop-cluster - CORE_CONF_fs_defaultFS=hdfs://namenode:8020 resourcemanager: image: bde2020/hadoop-resourcemanager:2.0.0-hadoop3.2.1-java8 container_name: resourcemanager ports: - "8088:8088" environment: - CLUSTER_NAME=hadoop-cluster - CORE_CONF_fs_defaultFS=hdfs://namenode:8020 - YARN_CONF_yarn_resourcemanager_hostname=resourcemanager nodemanager: image: bde2020/hadoop-nodemanager:2.0.0-hadoop3.2.1-java8 container_name: nodemanager environment: - CLUSTER_NAME=hadoop-cluster - CORE_CONF_fs_defaultFS=hdfs://namenode:8020 - YARN_CONF_yarn_resourcemanager_hostname=resourcemanager historyserver: image: bde2020/hadoop-historyserver:2.0.0-hadoop3.2.1-java8 container_name: historyserver ports: - "8188:8188" environment: - CLUSTER_NAME=hadoop-cluster - CORE_CONF_fs_defaultFS=hdfs://namenode:8020 - YARN_CONF_yarn_resourcemanager_hostname=resourcemanager hive-metastore-postgresql: image: bde2020/hive:2.3.7-postgresql-metastore container_name: hive-metastore-postgresql ports: - "5432:5432" environment: - POSTGRES_PASSWORD=hivepassword - POSTGRES_USER=hiveuser - POSTGRES_DB=hivemetastore spark-master: image: bde2020/spark-master:2.4.5-hadoop2.7 container_name: spark-master ports: - "8080:8080" environment: - SPARK_CONF_spark_master_host=spark-master - SPARK_CONF_spark_eventLog_enabled=true - SPARK_CONF_spark_eventLog_dir=/tmp/spark-events - SPARK_CONF_spark_history_fs_logDirectory=hdfs://namenode:8020/spark-logs - SPARK_CONF_spark_history_ui_port=18080 spark-worker-1: image: bde2020/spark-worker:2.4.5-hadoop2.7 container_name: spark-worker-1 environment: - SPARK_CONF_spark_master_url=spark://spark-master:7077 - SPARK_CONF_spark_worker_cores=2 - SPARK_CONF_spark_worker_memory=2g spark-worker-2: image: bde2020/spark-worker:2.4.5-hadoop2.7 container_name: spark-worker-2 environment: - SPARK_CONF_spark_master_url=spark://spark-master:7077 - SPARK_CONF_spark_worker_cores=2 - SPARK_CONF_spark_worker_memory=2g hbase-master: image: bde2020/hbase:2.2.4-hadoop3.2.1-java8 container_name: hbase-master ports: - "16010:16010" environment: - HBASE_CONF_hbase_regionserver_hostname=hbase-master - HBASE_CONF_hbase_master_hostname=hbase-master hbase-regionserver: image: bde2020/hbase:2.2.4-hadoop3.2.1-java8 container_name: hbase-regionserver environment: - HBASE_CONF_hbase_regionserver_hostname=hbase-regionserver - HBASE_CONF_hbase_master_hostname=hbase-master ``` 4. 启动容器 可以使用以下命令启动容器: ``` docker-compose up -d ``` 5. 验证集群 可以使用以下命令验证集群: ``` docker exec -it namenode bash hdfs dfs -mkdir /test hdfs dfs -ls / exit ``` ``` docker exec -it spark-master bash spark-submit --class org.apache.spark.examples.SparkPi --master spark://spark-master:7077 /opt/spark/examples/jars/spark-examples_2.11-2.4.5.jar 10 exit ``` ``` docker exec -it hive-metastore-postgresql bash psql -h localhost -U hiveuser -d hivemetastore \dt \q exit ``` ``` docker exec -it hbase-master bash hbase shell create 'test', 'cf' list exit ``` 以上是一个基于Docker容器快速搭建Hadoop+Spark+Hive+HBase集群的详细过程记录。希望对您有所帮助。 ### 回答2: Docker是一种轻量的虚拟化技术,可以在同一操作系统中运行多个独立的容器,各个容器之间相互隔离。通过Docker容器,快速搭建Hadoop、Spark、Hive和Hbase集群成为了可能。下面是基于Docker容器,快速搭建Hadoop、Spark、Hive和Hbase集群的详细过程记录: 1. 下载Docker技术栈 在运行Docker之前,我们需要先安装DockerDocker Compose。我们可以从官方Docker网站下载DockerDocker Compose: - Docker的下载链接:https://www.docker.com/get-started - Docker Compose的下载链接:https://docs.docker.com/compose/install/ 2. 创建docker-compose.yml文件 在运行Docker之前,我们需要创建一个docker-compose.yml文件,该文件定义了Docker容器的配置和组合。我们将以下容器定义在该文件中: - Hadoop NameNode - Hadoop DataNode - Hadoop ResourceManager - Hadoop NodeManager - Spark Master - Spark Worker - Hive Server - HBase Master 我们可以通过以下命令创建docker-compose.yml文件: ``` version: "2.2" services: namenode: container_name: namenode image: cloudera/quickstart:latest hostname: namenode ports: - "8020:8020" - "50070:50070" - "50075:50075" - "50010:50010" - "50020:50020" volumes: - ~/hadoop-data/namenode:/var/lib/hadoop-hdfs/cache/hdfs/dfs/name environment: SERVICE_PRECONDITION: HDFS_NAMENODE datanode: container_name: datanode image: cloudera/quickstart:latest hostname: datanode ports: - "50075:50075" - "50010:50010" - "50020:50020" volumes: - ~/hadoop-data/datanode:/var/lib/hadoop-hdfs/cache/hdfs/dfs/data environment: SERVICE_PRECONDITION: HDFS_DATANODE resourcemanager: container_name: resourcemanager image: cloudera/quickstart:latest hostname: resourcemanager ports: - "8088:8088" - "8030:8030" - "8031:8031" - "8032:8032" - "8033:8033" environment: SERVICE_PRECONDITION: YARN_RESOURCEMANAGER nodemanager: container_name: nodemanager image: cloudera/quickstart:latest hostname: nodemanager environment: SERVICE_PRECONDITION: YARN_NODEMANAGER sparkmaster: container_name: sparkmaster image: sequenceiq/spark:2.1.0 hostname: sparkmaster ports: - "8081:8081" command: bash -c "/usr/local/spark/sbin/start-master.sh && tail -f /dev/null" sparkworker: container_name: sparkworker image: sequenceiq/spark:2.1.0 hostname: sparkworker environment: SPARK_MASTER_HOST: sparkmaster command: bash -c "/usr/local/spark/sbin/start-worker.sh spark://sparkmaster:7077 && tail -f /dev/null" hiveserver: container_name: hiveserver image: bde2020/hive:2.3.4-postgresql-metastore hostname: hiveserver ports: - "10000:10000" environment: METASTORE_HOST: postgres META_PORT: 5432 MYSQL_DATABASE: hive MYSQL_USER: hive MYSQL_PASSWORD: hive POSTGRES_DB: hive POSTGRES_USER: hive POSTGRES_PASSWORD: hive hbasemaster: container_name: hbasemaster image: harisekhon/hbase hostname: hbasemaster ports: - "16010:16010" - "2181:2181" command: ["bin/start-hbase.sh"] ``` 3. 运行Docker容器 运行Docker容器的第一步是将docker-compose.yml文件放置在合适的路径下。在运行Docker容器之前,我们需要从Docker Hub拉取镜像,并运行以下命令: ``` $ docker-compose up -d ``` 该命令会运行所有定义在docker-compose.yml文件中的容器。 4. 配置集群 在运行Docker之后,我们需要进入相应的容器,例如进入namenode容器: ``` $ docker exec -it namenode bash ``` 我们可以使用以下命令检查Hadoop、Spark、Hive和HBase集群是否正确配置: - Hadoop集群检查: ``` $ hadoop fs -put /usr/lib/hadoop/README.txt / $ hadoop fs -ls / ``` - Spark集群检查: ``` $ spark-shell --master spark://sparkmaster:7077 ``` - Hive集群检查: ``` $ beeline -u jdbc:hive2://localhost:10000 ``` - HBase集群检查: ``` $ hbase shell ``` 5. 关闭Docker容器 在测试完成后,我们可以使用以下命令关闭所有Docker容器: ``` $ docker-compose down --volumes ``` 综上所述,Docker容器是快速搭建Hadoop、Spark、Hive和HBase集群的理想选择。通过docker-compose.yml文件,我们可以轻松配置和管理整个集群。使用这种方法,可以节省大量的时间和精力,并使整个搭建过程更加方便和高效。 ### 回答3: Docker容器是一种轻型的虚拟化技术,能够快速搭建大型分布式系统集群。可以使用Docker容器快速搭建Hadoop,Spark,Hive和HBase集群。下面是基于Docker容器搭建大数据集群的详细过程记录: 1.安装DockerDocker-Compose 首先需要安装DockerDocker-Compose。可以按照官方文档详细教程进行安装。 2.创建Docker文件 创建一个Dockerfile文件用于构建Hadoop,Spark,Hive和HBase的镜像。在该文件内添加以下内容: FROM ubuntu:16.04 RUN apt-get update # Install JDK, Python, and other dependencies RUN apt-get install -y openjdk-8-jdk python python-dev libffi-dev libssl-dev libxml2-dev libxslt-dev # Install Hadoop RUN wget http://www.eu.apache.org/dist/hadoop/common/hadoop-2.7.7/hadoop-2.7.7.tar.gz RUN tar -xzvf hadoop-2.7.7.tar.gz RUN mv hadoop-2.7.7 /opt/hadoop # Install Spark RUN wget http://www.eu.apache.org/dist/spark/spark-2.4.0/spark-2.4.0-bin-hadoop2.7.tgz RUN tar -zxvf spark-2.4.0-bin-hadoop2.7.tgz RUN mv spark-2.4.0-bin-hadoop2.7 /opt/spark # Install Hive RUN wget http://www.eu.apache.org/dist/hive/hive-2.3.4/apache-hive-2.3.4-bin.tar.gz RUN tar -zxvf apache-hive-2.3.4-bin.tar.gz RUN mv apache-hive-2.3.4-bin /opt/hive # Install HBase RUN wget http://www.eu.apache.org/dist/hbase/hbase-1.4.9/hbase-1.4.9-bin.tar.gz RUN tar -zxvf hbase-1.4.9-bin.tar.gz RUN mv hbase-1.4.9 /opt/hbase # Set Environment Variables ENV JAVA_HOME /usr/lib/jvm/java-8-openjdk-amd64 ENV HADOOP_HOME /opt/hadoop ENV SPARK_HOME /opt/spark ENV HIVE_HOME /opt/hive ENV HBASE_HOME /opt/hbase ENV PATH $PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$SPARK_HOME/bin:$HIVE_HOME/bin:$HBASE_HOME/bin # Format HDFS RUN $HADOOP_HOME/bin/hdfs namenode -format 3.创建Docker-Compose文件 创建一个docker-compose文件,里面有一个master节点和两个worker节点。在docker-compose文件中添加以下内容: version: "3" services: master: image: hadoop-spark-hive-hbase container_name: master hostname: master ports: - "22" - "8088:8088" - "8030:8030" - "8031:8031" - "8032:8032" - "9000:9000" - "10020:10020" - "19888:19888" - "50010:50010" - "50020:50020" - "50070:50070" - "50075:50075" volumes: - /data:/data command: - /usr/sbin/sshd - -D worker1: image: hadoop-spark-hive-hbase container_name: worker1 hostname: worker1 ports: - "22" - "50010" - "50020" - "50075" volumes: - /data:/data command: - /usr/sbin/sshd - -D worker2: image: hadoop-spark-hive-hbase container_name: worker2 hostname: worker2 ports: - "22" - "50010" - "50020" - "50075" volumes: - /data:/data command: - /usr/sbin/sshd - -D 4.构建镜像 运行以下命令来构建镜像: docker build -t hadoop-spark-hive-hbase . 5.启动容器 运行以下命令来启动容器: docker-compose up -d 6.测试集群 在浏览器中输入http://IP地址:8088,可以看到Hadoop和YARN的Web控制台。 在浏览器中输入http://IP地址:50070,可以看到HDFS的Web控制台。 在浏览器中输入http://IP地址:8888,可以看到Jupyter Notebook。 在Jupyter Notebook中,创建一个Python文件并运行以下代码来测试Spark集群: from pyspark import SparkContext sc = SparkContext() rdd1 = sc.parallelize(range(1000)) rdd2 = sc.parallelize(range(1000, 2000)) rdd3 = rdd1.union(rdd2) rdd3.take(10) 以上就是基于Docker容器快速搭建Hadoop,Spark,Hive和HBase集群的详细过程记录。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

麦神-mirson

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值