D006 复制粘贴玩大数据之Dockerfile安装HBase集群

本文链接：https://blog.csdn.net/shaock2018/article/details/87170876

教程目录

0x00 教程内容
0x01 Dockerfile文件的编写
0x02 校验HBase集群前准备工作
- - - - 1. 环境及资源准备
0x03 校验是否HBase安装成功
0xFF 总结

0x00 教程内容

Dockerfile文件的编写
校验HBase集群前准备工作
校验是否HBase安装成功

0x01 Dockerfile文件的编写

1. 编写Dockerfile文件

为了方便，我复制了一份zk集群的文件，取名hbase_sny_all。

a. HBase集群安装步骤
参考文章：D005 复制粘贴玩大数据之安装与配置HBase集群

常规安装	Dockerfile安装
1.将安装包放于容器	1.添加安装包并解压
2.解压并配置HBase	2.添加环境变量
3.添加环境变量	3.添加配置文件（含环境变量）
4.同步各节点并启动	4.启动HBase

其实安装内容都是一样的，这里只是就根据我写的步骤整理了一下

2. 编写Dockerfile文件的关键点

与D004 复制粘贴玩大数据之Dockerfile安装Zookeeper集群的“0x01 3. a. Dockerfile参考文件”相比较，不同点体现在：
具体步骤：
a. 添加安装包并解压（ADD指令会自动解压）

#添加HBase
ADD ./hbase-1.2.6-bin.tar.gz /usr/local/

b. 添加环境变量（HBASE_HOME、PATH）

#HBase环境变量
ENV HBASE_HOME /usr/local/hbase-1.2.6

#PATH里面追加内容
$HBASE_HOME/bin:

c. 添加配置文件（注意给之前的语句加“&& \”，表示未结束）

&& \
mv /tmp/init_zk.sh ~/init_zk.sh && \
mv /tmp/hbase-env.sh $HBASE_HOME/conf/hbase-env.sh && \
mv /tmp/hbase-site.xml $HBASE_HOME/conf/hbase-site.xml  && \
mv /tmp/regionservers $HBASE_HOME/conf/regionservers

d. 添加修改权限语句

#修改init_zk.sh权限为700
RUN chmod 700 init_zk.sh

3. 完整的Dockerfile文件参考

a. 安装hadoop、spark、zookeeper、hbase

FROM ubuntu
MAINTAINER shaonaiyi shaonaiyi@163.com

ENV BUILD_ON 2019-02-13

RUN apt-get update -qqy

RUN apt-get -qqy install vim wget net-tools  iputils-ping  openssh-server
#添加JDK
ADD ./jdk-8u161-linux-x64.tar.gz /usr/local/
#添加hadoop
ADD ./hadoop-2.7.5.tar.gz  /usr/local/
#添加scala
ADD ./scala-2.11.8.tgz /usr/local/
#添加spark
ADD ./spark-2.2.0-bin-hadoop2.7.tgz /usr/local/
#添加zookeeper
ADD ./zookeeper-3.4.10.tar.gz /usr/local/
#添加HBase
ADD ./hbase-1.2.6-bin.tar.gz /usr/local/
ENV CHECKPOINT 2019-02-13
#增加JAVA_HOME环境变量
ENV JAVA_HOME /usr/local/jdk1.8.0_161
#hadoop环境变量
ENV HADOOP_HOME /usr/local/hadoop-2.7.5
#scala环境变量
ENV SCALA_HOME /usr/local/scala-2.11.8
#spark环境变量
ENV SPARK_HOME /usr/local/spark-2.2.0-bin-hadoop2.7
#zk环境变量
ENV ZK_HOME /usr/local/zookeeper-3.4.10
#HBase环境变量
ENV HBASE_HOME /usr/local/hbase-1.2.6
#将环境变量添加到系统变量中
ENV PATH $HBASE_HOME/bin:$ZK_HOME/bin:$SCALA_HOME/bin:$SPARK_HOME/bin:$HADOOP_HOME/bin:$JAVA_HOME/bin:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$PATH

RUN ssh-keygen -t rsa -f ~/.ssh/id_rsa -P '' && \
    cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys && \
    chmod 600 ~/.ssh/authorized_keys
#复制配置到/tmp目录
COPY config /tmp
#将配置移动到正确的位置
RUN mv /tmp/ssh_config    ~/.ssh/config && \
    mv /tmp/profile /etc/profile && \
    mv /tmp/masters $SPARK_HOME/conf/masters && \
    cp /tmp/slaves $SPARK_HOME/conf/ && \
    mv /tmp/spark-defaults.conf $SPARK_HOME/conf/spark-defaults.conf && \
    mv /tmp/spark-env.sh $SPARK_HOME/conf/spark-env.sh && \ 
    mv /tmp/hadoop-env.sh $HADOOP_HOME/etc/hadoop/hadoop-env.sh && \
    mv /tmp/hdfs-site.xml $HADOOP_HOME/etc/hadoop/hdfs-site.xml && \ 
    mv /tmp/core-site.xml $HADOOP_HOME/etc/hadoop/core-site.xml && \
    mv /tmp/yarn-site.xml $HADOOP_HOME/etc/hadoop/yarn-site.xml && \
    mv /tmp/mapred-site.xml $HADOOP_HOME/etc/hadoop/mapred-site.xml && \
    mv /tmp/master $HADOOP_HOME/etc/hadoop/master && \
    mv /tmp/slaves $HADOOP_HOME/etc/hadoop/slaves && \
    mv /tmp/start-hadoop.sh ~/start-hadoop.sh && \
    mv /tmp/init_zk.sh ~/init_zk.sh && \
    mkdir -p /usr/local/hadoop2.7/dfs/data && \
    mkdir -p /usr/local/hadoop2.7/dfs/name && \
    mkdir -p /usr/local/zookeeper-3.4.10/datadir && \
    mkdir -p /usr/local/zookeeper-3.4.10/log && \
    mv /tmp/zoo.cfg $ZK_HOME/conf/zoo.cfg && \
	mv /tmp/hbase-env.sh $HBASE_HOME/conf/hbase-env.sh && \
	mv /tmp/hbase-site.xml $HBASE_HOME/conf/hbase-site.xml  && \
	mv /tmp/regionservers $HBASE_HOME/conf/regionservers
	
RUN echo $JAVA_HOME
#设置工作目录
WORKDIR /root
#启动sshd服务
RUN /etc/init.d/ssh start
#修改start-hadoop.sh权限为700
RUN chmod 700 start-hadoop.sh
#修改init_zk.sh权限为700
RUN chmod 700 init_zk.sh
#修改root密码
RUN echo "root:shaonaiyi" | chpasswd
CMD ["/bin/bash"]

0x02 校验HBase集群前准备工作

1. 环境及资源准备

a. 安装Docker
请参考：D001.5 Docker入门（超级详细基础篇）的“0x01 Docker的安装”小节

b. 准备资源
安装ZK集群时的文件：D004 复制粘贴玩大数据之Dockerfile安装Zookeeper集群

c. 准备HBase安装包（hbase-1.2.6-bin.tar.gz），像其他安装包一样

d. 准备HBase的三份配置文件（放于config目录下）
cd /home/shaonaiyi/docker_bigdata/hbase_sny_all/config
配置文件一：vi hbase-env.sh

#
#/**
# * Licensed to the Apache Software Foundation (ASF) under one
# * or more contributor license agreements.  See the NOTICE file
# * distributed with this work for additional information
# * regarding copyright ownership.  The ASF licenses this file
# * to you under the Apache License, Version 2.0 (the
# * "License"); you may not use this file except in compliance
# * with the License.  You may obtain a copy of the License at
# *
# *     http://www.apache.org/licenses/LICENSE-2.0
# *
# * Unless required by applicable law or agreed to in writing, software
# * distributed under the License is distributed on an "AS IS" BASIS,
# * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# * See the License for the specific language governing permissions and
# * limitations under the License.
# */

# Set environment variables here.

# This script sets variables multiple times over the course of starting an hbase process,
# so try to keep things idempotent unless you want to take an even deeper look
# into the startup scripts (bin/hbase, etc.)

# The java implementation to use.  Java 1.7+ required.
# export JAVA_HOME=/usr/java/jdk1.6.0/
export JAVA_HOME=/usr/local/jdk1.8.0_161/
export HBASE_CLASSPATH=/usr/local/hadoop-2.7.5/etc/hadoop
export HBASE_MANAGES_ZK=false
# Extra Java CLASSPATH elements.  Optional.
# export HBASE_CLASSPATH=

# The maximum amount of heap to use. Default is left to JVM default.
# export HBASE_HEAPSIZE=1G

# Uncomment below if you intend to use off heap cache. For example, to allocate 8G of 
# offheap, set the value to "8G".
# export HBASE_OFFHEAPSIZE=1G

# Extra Java runtime options.
# Below are what we set by default.  May only work with SUN JVM.
# For more on why as well as other possible settings,
# see http://wiki.apache.org/hadoop/PerformanceTuning
export HBASE_OPTS="-XX:+UseConcMarkSweepGC"

# Configure PermSize. Only needed in JDK7. You can safely remove it for JDK8+
#export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS -XX:PermSize=128m -XX:MaxPermSize=128m"
#export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -XX:PermSize=128m -XX:MaxPermSize=128m"

# Uncomment one of the below three options to enable java garbage collection logging for the server-side processes.

# This enables basic gc logging to the .out file.
# export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps"

# This enables basic gc logging to its own file.
# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .
# export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH>"

# This enables basic GC logging to its own file with automatic log rolling. Only applies to jdk 1.6.0_34+ and 1.7.0_2+.
# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .
# export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=1 -XX:GCLogFileSize=512M"

# Uncomment one of the below three options to enable java garbage collection logging for the client processes.

# This enables basic gc logging to the .out file.
# export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps"

# This enables basic gc logging to its own file.
# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .
# export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH>"

# This enables basic GC logging to its own file with automatic log rolling. Only applies to jdk 1.6.0_34+ and 1.7.0_2+.
# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .
# export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=1 -XX:GCLogFileSize=512M"

# See the package documentation for org.apache.hadoop.hbase.io.hfile for other configurations
# needed setting up off-heap block caching. 

# Uncomment and adjust to enable JMX exporting
# See jmxremote.password and jmxremote.access in $JRE_HOME/lib/management to configure remote password access.
# More details at: http://java.sun.com/javase/6/docs/technotes/guides/management/agent.html
# NOTE: HBase provides an alternative JMX implementation to fix the random ports issue, please see JMX
# section in HBase Reference Guide for instructions.

# export HBASE_JMX_BASE="-Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false"
# export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10101"
# export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10102"
# export HBASE_THRIFT_OPTS="$HBASE_THRIFT_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10103"
# export HBASE_ZOOKEEPER_OPTS="$HBASE_ZOOKEEPER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10104"
# export HBASE_REST_OPTS="$HBASE_REST_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10105"

# File naming hosts on which HRegionServers will run.  $HBASE_HOME/conf/regionservers by default.
# export HBASE_REGIONSERVERS=${HBASE_HOME}/conf/regionservers

# Uncomment and adjust to keep all the Region Server pages mapped to be memory resident
#HBASE_REGIONSERVER_MLOCK=true
#HBASE_REGIONSERVER_UID="hbase"

# File naming hosts on which backup HMaster will run.  $HBASE_HOME/conf/backup-masters by default.
# export HBASE_BACKUP_MASTERS=${HBASE_HOME}/conf/backup-masters

# Extra ssh options.  Empty by default.
# export HBASE_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HBASE_CONF_DIR"

# Where log files are stored.  $HBASE_HOME/logs by default.
# export HBASE_LOG_DIR=${HBASE_HOME}/logs

# Enable remote JDWP debugging of major HBase processes. Meant for Core Developers 
# export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8070"
# export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8071"
# export HBASE_THRIFT_OPTS="$HBASE_THRIFT_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8072"
# export HBASE_ZOOKEEPER_OPTS="$HBASE_ZOOKEEPER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8073"

# A string representing this instance of hbase. $USER by default.
# export HBASE_IDENT_STRING=$USER

# The scheduling priority for daemon processes.  See 'man nice'.
# export HBASE_NICENESS=10

# The directory where pid files are stored. /tmp by default.
# export HBASE_PID_DIR=/var/hadoop/pids

# Seconds to sleep between slave commands.  Unset by default.  This
# can be useful in large clusters, where, e.g., slave rsyncs can
# otherwise arrive faster than the master can service them.
# export HBASE_SLAVE_SLEEP=0.1

# Tell HBase whether it should manage it's own instance of Zookeeper or not.
# export HBASE_MANAGES_ZK=true

# The default log rolling policy is RFA, where the log file is rolled as per the size defined for the 
# RFA appender. Please refer to the log4j.properties file to see more details on this appender.
# In case one needs to do log rolling on a date change, one should set the environment property
# HBASE_ROOT_LOGGER to "<DESIRED_LOG LEVEL>,DRFA".
# For example:
# HBASE_ROOT_LOGGER=INFO,DRFA
# The reason for changing default to RFA is to avoid the boundary case of filling out disk space as 
# DRFA doesn't put any cap on the log size. Please refer to HBase-5655 for more context.

配置文件二：vi hbase-site.xml

<property>
	<name>hbase.rootdir</name>
	<value>hdfs://hadoop-master:9000/hbase</value>
</property>
<property>
	<name>hbase.cluster.distributed</name>
	<value>true</value>
</property>
<property>
	<name>hbase.zookeeper.quorum</name>
	<value>hadoop-master,hadoop-slave1,hadoop-slave2</value>
</property>

配置文件三：vi regionservers

hadoop-slave1
hadoop-slave2

PS:添加下面两行，配置环境变量：
vi profile

export HBASE_HOME=/usr/local/hbase-1.2.6
export PATH=$PATH:$HBASE_HOME/bin

初始化zookeeper的脚本（后面三句启动命令已从之前的start-hadoop.sh剪切到这里）：
vi init_zk.sh

#!/bin/bash
ssh root@hadoop-master "echo '0' >> $ZK_HOME/datadir/myid"
ssh root@hadoop-slave1 "echo '1' >> $ZK_HOME/datadir/myid"
ssh root@hadoop-slave2 "echo '2' >> $ZK_HOME/datadir/myid"

#修改需要配置及启动zk命令的命令
ssh root@hadoop-master "source /etc/profile;/usr/local/zookeeper-3.4.10/bin/zkServer.sh start"
ssh root@hadoop-slave1 "source /etc/profile;/usr/local/zookeeper-3.4.10/bin/zkServer.sh start"
ssh root@hadoop-slave2 "source /etc/profile;/usr/local/zookeeper-3.4.10/bin/zkServer.sh start"

0x03 校验是否HBase安装成功

1. 修改生成容器脚本

a. 修改start_containers.sh文件（样本镜像名称成shaonaiyi/hbase、ip）
本人把里面的三个shaonaiyi/zk改为了shaonaiyi/hbase，ip最后一位加了1，如：
172.21.0.12改为了172.21.0.22等等~
将hbase的16010端口暴露出去，加上：
\-p 17010:16010
ps:当然，你可以新建一个新的网络，换ip，这里偷懒，用了旧的网络，只换了ip

2. 生成镜像

a. 删除之前的spark集群容器（节省资源），如已删可省略此步
cd /home/shaonaiyi/docker_bigdata/zk_sny_all/config/
chmod 700 stop_containers.sh
./stop_containers.sh
b. 生成装好hadoop、spark、zookeeper、hbase的镜像（如果之前shaonaiyi/spark未删除，则此次会快很多）
cd /home/shaonaiyi/docker_bigdata/hbase_sny_all
docker build -t shaonaiyi/hbase .
在这里插入图片描述

2. 生成容器

a. 生成容器（start_containers.sh如果没权限则给权限）：
config/start_containers.sh
b. 进入master容器
sh ~/master.sh

3. 启动集群并查看进程

a. 启动集群，初始化zk配置：
在这里插入图片描述
./start-hadoop.sh
./init_zk.sh

之前出现了个问题（已修复）：
在windows上写了脚本放到linux上执行报错
在这里插入图片描述
解决方法是：
vi init_zk.sh
用命令:set ff可以看到fileformat=dos
修改:set ff=unix，然后wq!保存退出即可。
重新执行：
./init_zk.sh