Docker本地搭建Hadoop高可用,Hbase,Spark,Flink,Zookeeper集群_基于docker容器,搭建hadoop+spark+hive+hbase+zookeeper sca

img
img

网上学习资料一大堆,但如果学到的知识不成体系,遇到问题时只是浅尝辄止,不再深入研究,那么很难做到真正的技术提升。

需要这份系统化资料的朋友,可以戳这里获取

一个人可以走的很快,但一群人才能走的更远!不论你是正从事IT行业的老鸟或是对IT行业感兴趣的新人,都欢迎加入我们的的圈子(技术交流、学习资源、职场吐槽、大厂内推、面试辅导),让我们一起学习成长!

docker commit hadoopimages hadoop
#查看一下镜像
docker images


![在这里插入图片描述](https://img-blog.csdnimg.cn/20200402094845665.png)


#### 创建网络bigdata,供各种大数据应用共同一个网络


**这里指定的是172.25.0.0/16子网,注意不要和自己的其他子网相冲突,以免一些不必要的麻烦**



docker network create --driver bridge --subnet 172.25.0.0/16 --gateway 172.25.0.1 bigdata




---


## Zookeeper搭建




---


###### 拉取zookeeper镜像



#选取自己合适的镜像即可
docker pull zookeeper:3.4.13


##### 使用docker-compose创建三个zookeeper容器



version: ‘2’
services:
zoo1:
image: zookeeper:3.4.13 # 镜像名称
restart: always # 当发生错误时自动重启
hostname: zoo1
container_name: zoo1
privileged: true
ports: # 端口
- 2181:2181
volumes: # 挂载数据卷
- ./zoo1/data:/data
- ./zoo1/datalog:/datalog
environment:
TZ: Asia/Shanghai
ZOO_MY_ID: 1 # 节点ID
ZOO_PORT: 2181 # zookeeper端口号
ZOO_SERVERS: server.1=zoo1:2888:3888 server.2=zoo2:2888:3888 server.3=zoo3:2888:3888 # zookeeper节点列表
networks:
default:
ipv4_address: 172.25.0.11

zoo2:
image: zookeeper:3.4.13
restart: always
hostname: zoo2
container_name: zoo2
privileged: true
ports:
- 2182:2181
volumes:
- ./zoo2/data:/data
- ./zoo2/datalog:/datalog
environment:
TZ: Asia/Shanghai
ZOO_MY_ID: 2
ZOO_PORT: 2181
ZOO_SERVERS: server.1=zoo1:2888:3888 server.2=zoo2:2888:3888 server.3=zoo3:2888:3888
networks:
default:
ipv4_address: 172.25.0.12

zoo3:
image: zookeeper:3.4.13
restart: always
hostname: zoo3
container_name: zoo3
privileged: true
ports:
- 2183:2181
volumes:
- ./zoo3/data:/data
- ./zoo3/datalog:/datalog
environment:
TZ: Asia/Shanghai
ZOO_MY_ID: 3
ZOO_PORT: 2181
ZOO_SERVERS: server.1=zoo1:2888:3888 server.2=zoo2:2888:3888 server.3=zoo3:2888:3888
networks:
default:
ipv4_address: 172.25.0.13

networks:
default:
external:
name: bigdata



#运行命令
docker-compose up -d
➜ zookeeper docker-compose up -d
Recreating 44dad6cddccd_zoo1 … done
Recreating 9b0f2cfe666f_zoo3 … done
Creating zoo2 … done


![在这里插入图片描述](https://img-blog.csdnimg.cn/20200401232922755.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80MzI3MjYwNQ==,size_16,color_FFFFFF,t_70)


![在这里插入图片描述](https://img-blog.csdnimg.cn/20200401231813368.png)




---


## Kafka集群搭建




---



#拉取Kafka镜像和kafka-manager镜像
docker pull wurstmeister/kafka:2.12-2.3.1
docker pull sheepkiller/kafka-manager


###### 编辑docker-compose.yml文件



version: ‘2’

services:
broker1:
image: wurstmeister/kafka:2.12-2.3.1
restart: always # 出现错误时自动重启
hostname: broker1# 节点主机
container_name: broker1 # 节点名称
privileged: true # 可以在容器里面使用一些权限
ports:
- “9091:9092” # 将容器的9092端口映射到宿主机的9091端口上
environment:
KAFKA_BROKER_ID: 1
KAFKA_LISTENERS: PLAINTEXT://broker1:9092
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker1:9092
KAFKA_ADVERTISED_HOST_NAME: broker1
KAFKA_ADVERTISED_PORT: 9092
KAFKA_ZOOKEEPER_CONNECT: zoo1:2181/kafka1,zoo2:2181/kafka1,zoo3:2181/kafka1
JMX_PORT: 9988 # 负责kafkaManager的端口JMX通信
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- ./broker1:/kafka/kafka-logs-broker1
external_links:
- zoo1
- zoo2
- zoo3
networks:
default:
ipv4_address: 172.25.0.14

broker2:
image: wurstmeister/kafka:2.12-2.3.1
restart: always
hostname: broker2
container_name: broker2
privileged: true
ports:
- “9092:9092”
environment:
KAFKA_BROKER_ID: 2
KAFKA_LISTENERS: PLAINTEXT://broker2:9092
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker2:9092
KAFKA_ADVERTISED_HOST_NAME: broker2
KAFKA_ADVERTISED_PORT: 9092
KAFKA_ZOOKEEPER_CONNECT: zoo1:2181/kafka1,zoo2:2181/kafka1,zoo3:2181/kafka1
JMX_PORT: 9988
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- ./broker2:/kafka/kafka-logs-broker2
external_links: # 连接本compose文件以外的container
- zoo1
- zoo2
- zoo3
networks:
default:
ipv4_address: 172.25.0.15
broker3:
image: wurstmeister/kafka:2.12-2.3.1
restart: always
hostname: broker3
container_name: broker3
privileged: true
ports:
- “9093:9092”
environment:
KAFKA_BROKER_ID: 3
KAFKA_LISTENERS: PLAINTEXT://broker3:9092
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker3:9092
KAFKA_ADVERTISED_HOST_NAME: broker3
KAFKA_ADVERTISED_PORT: 9092
KAFKA_ZOOKEEPER_CONNECT: zoo1:2181/kafka1,zoo2:2181/kafka1,zoo3:2181/kafka1
JMX_PORT: 9988
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- ./broker3:/kafka/kafka-logs-broker3
external_links: # 连接本compose文件以外的container
- zoo1
- zoo2
- zoo3
networks:
default:
ipv4_address: 172.25.0.16
kafka-manager:
image: sheepkiller/kafka-manager:latest
restart: always
container_name: kafka-manager
hostname: kafka-manager
ports:
- “9000:9000”
links: # 连接本compose文件创建的container
- broker1
- broker2
- broker3
external_links: # 连接本compose文件以外的container
- zoo1
- zoo2
- zoo3
environment:
ZK_HOSTS: zoo1:2181/kafka1,zoo2:2181/kafka1,zoo3:2181/kafka1
KAFKA_BROKERS: broker1:9092,broker2:9092,broker3:9092
APPLICATION_SECRET: letmein
KM_ARGS: -Djava.net.preferIPv4Stack=true
networks:
default:
ipv4_address: 172.25.0.10

networks:
default:
external: # 使用已创建的网络
name: bigdata



#运行命令
docker-compose up -d


![在这里插入图片描述](https://img-blog.csdnimg.cn/20200401235440792.png)  
 \*\*看看本地端口9000也确实起来了  
 ![在这里插入图片描述](https://img-blog.csdnimg.cn/20200401235459198.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80MzI3MjYwNQ==,size_16,color_FFFFFF,t_70)  
 ![在这里插入图片描述](https://img-blog.csdnimg.cn/20200401235732298.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80MzI3MjYwNQ==,size_16,color_FFFFFF,t_70)




---


## Hadoop高可用集群搭建




---


#### docker-compose创建集群



version: ‘2’

services:
master:
image: hadoop:latest
restart: always # 出现错误时自动重启
hostname: master# 节点主机
container_name: master # 节点名称
privileged: true # 可以在容器里面使用一些权限
networks:
default:
ipv4_address: 172.25.0.3

master_standby:
image: hadoop:latest
restart: always
hostname: master_standby
container_name: master_standby
privileged: true
networks:
default:
ipv4_address: 172.25.0.4

slave01:
image: hadoop:latest
restart: always
hostname: slave01
container_name: slave01
privileged: true
networks:
default:
ipv4_address: 172.25.0.5

slave02:
image: hadoop:latest
restart: always
container_name: slave02
hostname: slave02
networks:
default:
ipv4_address: 172.25.0.6
slave03:
image: hadoop:latest
restart: always
container_name: slave03
hostname: slave03
networks:
default:
ipv4_address: 172.25.0.7


#### 命令行方式创建



#创建一个master节点
docker run -tid --name master --privileged=true hadoop:latest /usr/sbin/init
#创建热备master_standby节点
docker run -tid --name master_standby --privileged=true hadoop:latest /usr/sbin/init
#创建三个slave
docker run -tid --name slave01 --privileged=true hadoop:latest /usr/sbin/init
docker run -tid --name slave02 --privileged=true hadoop:latest /usr/sbin/init
docker run -tid --name slave03 --privileged=true hadoop:latest /usr/sbin/init


#### 给每台节点配置免密码登陆



ssh-keygen -t rsa
#然后不断会车,最终如下图所示
#每台机器都是如此


![在这里插入图片描述](https://img-blog.csdnimg.cn/20200402095727572.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80MzI3MjYwNQ==,size_16,color_FFFFFF,t_70)


###### 将各自的公钥传到每台机器authorized\_keys里面


**这里有个小问题:先检查安装了passwd没有,如果没有执行以下命令:**



yum install passwd
#然后设置密码
passwd


![在这里插入图片描述](https://img-blog.csdnimg.cn/20200402103504856.png)



#一台机器的公钥都要弄到自己和其他机器的authorized_keys
#以免以后安装其他东西减少不必要的麻烦
cat id_rsa.pub >> .ssh/authorized_keys


###### 编辑/etc/hosts


**注意:这里的master\_standby可能不允许带下划线,有的机器在hdfs格式化的时候会不合法,所以你配置最后不要带特殊字符**  
 ![在这里插入图片描述](https://img-blog.csdnimg.cn/20200402104555308.png)



#将/etc/hosts复制到每台节点
scp /etc/hosts master_standby:/etc/
scp /etc/hosts slave01:/etc/
scp /etc/hosts slave02:/etc/
scp /etc/hosts slave03:/etc/


#### 配置Hadoop



#解压hadoop包
tar -zxvf hadoop-2.8.5.tar.gz


###### 配置环境变量



#配置环境变量
vim ~/.bashrc
#添加以下内容
export HADOOP_HOME=/usr/local/hadoop-2.8.5
export CLASSPATH=.: H A D O O P _ H O M E / l i b : HADOOP\_HOME/lib: HADOOP_HOME/lib:CLASSPATH
export PATH= P A T H : PATH: PATH:HADOOP_HOME/bin
export PATH= P A T H : PATH: PATH:HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME= H A D O O P _ H O M E e x p o r t H A D O O P _ C O M M O N _ H O M E = HADOOP\_HOME export HADOOP\_COMMON\_HOME= HADOOP_HOMEexportHADOOP_COMMON_HOME=HADOOP_HOME
export HADOOP_HDFS_HOME= H A D O O P _ H O M E e x p o r t Y A R N _ H O M E = HADOOP\_HOME export YARN\_HOME= HADOOP_HOMEexportYARN_HOME=HADOOP_HOME
export HADOOP_ROOT_LOGGER=INFO,console
export HADOOP_COMMON_LIB_NATIVE_DIR= H A D O O P _ H O M E / l i b / n a t i v e e x p o r t H A D O O P _ O P T S = " − D j a v a . l i b r a r y . p a t h = HADOOP\_HOME/lib/native export HADOOP\_OPTS="-Djava.library.path= HADOOP_HOME/lib/nativeexportHADOOP_OPTS="Djava.library.path=HADOOP_HOME/lib"
#将这个文件拷到其他机器的下面
scp ~/.bashrc 机器名字:~/
#hadoop命令验证一下


![在这里插入图片描述](https://img-blog.csdnimg.cn/20200402105641627.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80MzI3MjYwNQ==,size_16,color_FFFFFF,t_70)


###### 配置文件


**hdfs-site.xml**



dfs.nameservices mycluster dfs.ha.namenodes.mycluster nn1,nn2 dfs.namenode.rpc-address.mycluster.nn1 master:9000 dfs.namenode.http-address.mycluster.nn1 master:50070 dfs.namenode.rpc-address.mycluster.nn2 master_standby:9000 dfs.namenode.http-address.mycluster.nn2 master_standby:50070 dfs.namenode.shared.edits.dir qjournal://slave01:8485;slave02:8485;slave03:8485/mycluster dfs.journalnode.edits.dir /usr/local/hadoop-2.8.5/journaldata dfs.ha.automatic-failover.enabled true dfs.client.failover.proxy.provider.mycluster org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider dfs.ha.fencing.methods sshfence dfs.ha.fencing.ssh.private-key-files /root/.ssh/id_rsa

img
img

网上学习资料一大堆,但如果学到的知识不成体系,遇到问题时只是浅尝辄止,不再深入研究,那么很难做到真正的技术提升。

需要这份系统化资料的朋友,可以戳这里获取

一个人可以走的很快,但一群人才能走的更远!不论你是正从事IT行业的老鸟或是对IT行业感兴趣的新人,都欢迎加入我们的的圈子(技术交流、学习资源、职场吐槽、大厂内推、面试辅导),让我们一起学习成长!

rty>

dfs.ha.fencing.methods sshfence dfs.ha.fencing.ssh.private-key-files /root/.ssh/id_rsa

[外链图片转存中…(img-9qR9d0nO-1715670647027)]
[外链图片转存中…(img-9NcR1tXG-1715670647027)]

网上学习资料一大堆,但如果学到的知识不成体系,遇到问题时只是浅尝辄止,不再深入研究,那么很难做到真正的技术提升。

需要这份系统化资料的朋友,可以戳这里获取

一个人可以走的很快,但一群人才能走的更远!不论你是正从事IT行业的老鸟或是对IT行业感兴趣的新人,都欢迎加入我们的的圈子(技术交流、学习资源、职场吐槽、大厂内推、面试辅导),让我们一起学习成长!

  • 8
    点赞
  • 30
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
### 回答1: 很抱歉,我是一名语言模型AI,无法亲自进行实际操作。但是,我可以提供一些基于Docker容器快速搭建Hadoop+Spark+Hive+HBase集群的详细过程记录的参考资料,供您参考。 1. 准备工作 首先,需要安装DockerDocker Compose。可以参考官方文档进行安装。 2. 下载镜像 可以从Docker Hub上下载HadoopSparkHiveHBase的镜像。可以使用以下命令: ``` docker pull bde2020/hadoop-base:2.0.0-hadoop3.2.1-java8 docker pull bde2020/hadoop-namenode:2.0.0-hadoop3.2.1-java8 docker pull bde2020/hadoop-datanode:2.0.0-hadoop3.2.1-java8 docker pull bde2020/hadoop-resourcemanager:2.0.0-hadoop3.2.1-java8 docker pull bde2020/hadoop-nodemanager:2.0.0-hadoop3.2.1-java8 docker pull bde2020/hadoop-historyserver:2.0.0-hadoop3.2.1-java8 docker pull bde2020/hive:2.3.7-postgresql-metastore docker pull bde2020/spark-base:2.4.5-hadoop2.7 docker pull bde2020/spark-master:2.4.5-hadoop2.7 docker pull bde2020/spark-worker:2.4.5-hadoop2.7 docker pull bde2020/hbase:2.2.4-hadoop3.2.1-java8 ``` 3. 编写docker-compose.yml文件 可以编写一个docker-compose.yml文件来定义HadoopSparkHiveHBase容器。以下是一个示例: ``` version: '3' services: namenode: image: bde2020/hadoop-namenode:2.0.0-hadoop3.2.1-java8 container_name: namenode ports: - "9870:9870" volumes: - ./hadoop-data/namenode:/hadoop/dfs/name environment: - CLUSTER_NAME=hadoop-cluster datanode: image: bde2020/hadoop-datanode:2.0.0-hadoop3.2.1-java8 container_name: datanode volumes: - ./hadoop-data/datanode:/hadoop/dfs/data environment: - CLUSTER_NAME=hadoop-cluster - CORE_CONF_fs_defaultFS=hdfs://namenode:8020 resourcemanager: image: bde2020/hadoop-resourcemanager:2.0.0-hadoop3.2.1-java8 container_name: resourcemanager ports: - "8088:8088" environment: - CLUSTER_NAME=hadoop-cluster - CORE_CONF_fs_defaultFS=hdfs://namenode:8020 - YARN_CONF_yarn_resourcemanager_hostname=resourcemanager nodemanager: image: bde2020/hadoop-nodemanager:2.0.0-hadoop3.2.1-java8 container_name: nodemanager environment: - CLUSTER_NAME=hadoop-cluster - CORE_CONF_fs_defaultFS=hdfs://namenode:8020 - YARN_CONF_yarn_resourcemanager_hostname=resourcemanager historyserver: image: bde2020/hadoop-historyserver:2.0.0-hadoop3.2.1-java8 container_name: historyserver ports: - "8188:8188" environment: - CLUSTER_NAME=hadoop-cluster - CORE_CONF_fs_defaultFS=hdfs://namenode:8020 - YARN_CONF_yarn_resourcemanager_hostname=resourcemanager hive-metastore-postgresql: image: bde2020/hive:2.3.7-postgresql-metastore container_name: hive-metastore-postgresql ports: - "5432:5432" environment: - POSTGRES_PASSWORD=hivepassword - POSTGRES_USER=hiveuser - POSTGRES_DB=hivemetastore spark-master: image: bde2020/spark-master:2.4.5-hadoop2.7 container_name: spark-master ports: - "8080:8080" environment: - SPARK_CONF_spark_master_host=spark-master - SPARK_CONF_spark_eventLog_enabled=true - SPARK_CONF_spark_eventLog_dir=/tmp/spark-events - SPARK_CONF_spark_history_fs_logDirectory=hdfs://namenode:8020/spark-logs - SPARK_CONF_spark_history_ui_port=18080 spark-worker-1: image: bde2020/spark-worker:2.4.5-hadoop2.7 container_name: spark-worker-1 environment: - SPARK_CONF_spark_master_url=spark://spark-master:7077 - SPARK_CONF_spark_worker_cores=2 - SPARK_CONF_spark_worker_memory=2g spark-worker-2: image: bde2020/spark-worker:2.4.5-hadoop2.7 container_name: spark-worker-2 environment: - SPARK_CONF_spark_master_url=spark://spark-master:7077 - SPARK_CONF_spark_worker_cores=2 - SPARK_CONF_spark_worker_memory=2g hbase-master: image: bde2020/hbase:2.2.4-hadoop3.2.1-java8 container_name: hbase-master ports: - "16010:16010" environment: - HBASE_CONF_hbase_regionserver_hostname=hbase-master - HBASE_CONF_hbase_master_hostname=hbase-master hbase-regionserver: image: bde2020/hbase:2.2.4-hadoop3.2.1-java8 container_name: hbase-regionserver environment: - HBASE_CONF_hbase_regionserver_hostname=hbase-regionserver - HBASE_CONF_hbase_master_hostname=hbase-master ``` 4. 启动容器 可以使用以下命令启动容器: ``` docker-compose up -d ``` 5. 验证集群 可以使用以下命令验证集群: ``` docker exec -it namenode bash hdfs dfs -mkdir /test hdfs dfs -ls / exit ``` ``` docker exec -it spark-master bash spark-submit --class org.apache.spark.examples.SparkPi --master spark://spark-master:7077 /opt/spark/examples/jars/spark-examples_2.11-2.4.5.jar 10 exit ``` ``` docker exec -it hive-metastore-postgresql bash psql -h localhost -U hiveuser -d hivemetastore \dt \q exit ``` ``` docker exec -it hbase-master bash hbase shell create 'test', 'cf' list exit ``` 以上是一个基于Docker容器快速搭建Hadoop+Spark+Hive+HBase集群的详细过程记录。希望对您有所帮助。 ### 回答2: Docker是一种轻量级的虚拟化技术,可以在同一操作系统中运行多个独立的容器,各个容器之间相互隔离。通过Docker容器,快速搭建HadoopSparkHiveHbase集群成为了可能。下面是基于Docker容器,快速搭建HadoopSparkHiveHbase集群的详细过程记录: 1. 下载Docker技术栈 在运行Docker之前,我们需要先安装DockerDocker Compose。我们可以从官方Docker网站下载DockerDocker Compose: - Docker的下载链接:https://www.docker.com/get-started - Docker Compose的下载链接:https://docs.docker.com/compose/install/ 2. 创建docker-compose.yml文件 在运行Docker之前,我们需要创建一个docker-compose.yml文件,该文件定义了Docker容器的配置和组合。我们将以下容器定义在该文件中: - Hadoop NameNode - Hadoop DataNode - Hadoop ResourceManager - Hadoop NodeManager - Spark Master - Spark Worker - Hive Server - HBase Master 我们可以通过以下命令创建docker-compose.yml文件: ``` version: "2.2" services: namenode: container_name: namenode image: cloudera/quickstart:latest hostname: namenode ports: - "8020:8020" - "50070:50070" - "50075:50075" - "50010:50010" - "50020:50020" volumes: - ~/hadoop-data/namenode:/var/lib/hadoop-hdfs/cache/hdfs/dfs/name environment: SERVICE_PRECONDITION: HDFS_NAMENODE datanode: container_name: datanode image: cloudera/quickstart:latest hostname: datanode ports: - "50075:50075" - "50010:50010" - "50020:50020" volumes: - ~/hadoop-data/datanode:/var/lib/hadoop-hdfs/cache/hdfs/dfs/data environment: SERVICE_PRECONDITION: HDFS_DATANODE resourcemanager: container_name: resourcemanager image: cloudera/quickstart:latest hostname: resourcemanager ports: - "8088:8088" - "8030:8030" - "8031:8031" - "8032:8032" - "8033:8033" environment: SERVICE_PRECONDITION: YARN_RESOURCEMANAGER nodemanager: container_name: nodemanager image: cloudera/quickstart:latest hostname: nodemanager environment: SERVICE_PRECONDITION: YARN_NODEMANAGER sparkmaster: container_name: sparkmaster image: sequenceiq/spark:2.1.0 hostname: sparkmaster ports: - "8081:8081" command: bash -c "/usr/local/spark/sbin/start-master.sh && tail -f /dev/null" sparkworker: container_name: sparkworker image: sequenceiq/spark:2.1.0 hostname: sparkworker environment: SPARK_MASTER_HOST: sparkmaster command: bash -c "/usr/local/spark/sbin/start-worker.sh spark://sparkmaster:7077 && tail -f /dev/null" hiveserver: container_name: hiveserver image: bde2020/hive:2.3.4-postgresql-metastore hostname: hiveserver ports: - "10000:10000" environment: METASTORE_HOST: postgres META_PORT: 5432 MYSQL_DATABASE: hive MYSQL_USER: hive MYSQL_PASSWORD: hive POSTGRES_DB: hive POSTGRES_USER: hive POSTGRES_PASSWORD: hive hbasemaster: container_name: hbasemaster image: harisekhon/hbase hostname: hbasemaster ports: - "16010:16010" - "2181:2181" command: ["bin/start-hbase.sh"] ``` 3. 运行Docker容器 运行Docker容器的第一步是将docker-compose.yml文件放置在合适的路径下。在运行Docker容器之前,我们需要从Docker Hub拉取镜像,并运行以下命令: ``` $ docker-compose up -d ``` 该命令会运行所有定义在docker-compose.yml文件中的容器。 4. 配置集群 在运行Docker之后,我们需要进入相应的容器,例如进入namenode容器: ``` $ docker exec -it namenode bash ``` 我们可以使用以下命令检查HadoopSparkHiveHBase集群是否正确配置: - Hadoop集群检查: ``` $ hadoop fs -put /usr/lib/hadoop/README.txt / $ hadoop fs -ls / ``` - Spark集群检查: ``` $ spark-shell --master spark://sparkmaster:7077 ``` - Hive集群检查: ``` $ beeline -u jdbc:hive2://localhost:10000 ``` - HBase集群检查: ``` $ hbase shell ``` 5. 关闭Docker容器 在测试完成后,我们可以使用以下命令关闭所有Docker容器: ``` $ docker-compose down --volumes ``` 综上所述,Docker容器是快速搭建HadoopSparkHiveHBase集群的理想选择。通过docker-compose.yml文件,我们可以轻松配置和管理整个集群。使用这种方法,可以节省大量的时间和精力,并使整个搭建过程更加方便和高效。 ### 回答3: Docker容器是一种轻型的虚拟化技术,能够快速搭建大型分布式系统集群。可以使用Docker容器快速搭建HadoopSparkHiveHBase集群。下面是基于Docker容器搭建大数据集群的详细过程记录: 1.安装DockerDocker-Compose 首先需要安装DockerDocker-Compose。可以按照官方文档详细教程进行安装。 2.创建Docker文件 创建一个Dockerfile文件用于构建HadoopSparkHiveHBase的镜像。在该文件内添加以下内容: FROM ubuntu:16.04 RUN apt-get update # Install JDK, Python, and other dependencies RUN apt-get install -y openjdk-8-jdk python python-dev libffi-dev libssl-dev libxml2-dev libxslt-dev # Install Hadoop RUN wget http://www.eu.apache.org/dist/hadoop/common/hadoop-2.7.7/hadoop-2.7.7.tar.gz RUN tar -xzvf hadoop-2.7.7.tar.gz RUN mv hadoop-2.7.7 /opt/hadoop # Install Spark RUN wget http://www.eu.apache.org/dist/spark/spark-2.4.0/spark-2.4.0-bin-hadoop2.7.tgz RUN tar -zxvf spark-2.4.0-bin-hadoop2.7.tgz RUN mv spark-2.4.0-bin-hadoop2.7 /opt/spark # Install Hive RUN wget http://www.eu.apache.org/dist/hive/hive-2.3.4/apache-hive-2.3.4-bin.tar.gz RUN tar -zxvf apache-hive-2.3.4-bin.tar.gz RUN mv apache-hive-2.3.4-bin /opt/hive # Install HBase RUN wget http://www.eu.apache.org/dist/hbase/hbase-1.4.9/hbase-1.4.9-bin.tar.gz RUN tar -zxvf hbase-1.4.9-bin.tar.gz RUN mv hbase-1.4.9 /opt/hbase # Set Environment Variables ENV JAVA_HOME /usr/lib/jvm/java-8-openjdk-amd64 ENV HADOOP_HOME /opt/hadoop ENV SPARK_HOME /opt/spark ENV HIVE_HOME /opt/hive ENV HBASE_HOME /opt/hbase ENV PATH $PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$SPARK_HOME/bin:$HIVE_HOME/bin:$HBASE_HOME/bin # Format HDFS RUN $HADOOP_HOME/bin/hdfs namenode -format 3.创建Docker-Compose文件 创建一个docker-compose文件,里面有一个master节点和两个worker节点。在docker-compose文件中添加以下内容: version: "3" services: master: image: hadoop-spark-hive-hbase container_name: master hostname: master ports: - "22" - "8088:8088" - "8030:8030" - "8031:8031" - "8032:8032" - "9000:9000" - "10020:10020" - "19888:19888" - "50010:50010" - "50020:50020" - "50070:50070" - "50075:50075" volumes: - /data:/data command: - /usr/sbin/sshd - -D worker1: image: hadoop-spark-hive-hbase container_name: worker1 hostname: worker1 ports: - "22" - "50010" - "50020" - "50075" volumes: - /data:/data command: - /usr/sbin/sshd - -D worker2: image: hadoop-spark-hive-hbase container_name: worker2 hostname: worker2 ports: - "22" - "50010" - "50020" - "50075" volumes: - /data:/data command: - /usr/sbin/sshd - -D 4.构建镜像 运行以下命令来构建镜像: docker build -t hadoop-spark-hive-hbase . 5.启动容器 运行以下命令来启动容器docker-compose up -d 6.测试集群 在浏览器中输入http://IP地址:8088,可以看到Hadoop和YARN的Web控制台。 在浏览器中输入http://IP地址:50070,可以看到HDFS的Web控制台。 在浏览器中输入http://IP地址:8888,可以看到Jupyter Notebook。 在Jupyter Notebook中,创建一个Python文件并运行以下代码来测试Spark集群: from pyspark import SparkContext sc = SparkContext() rdd1 = sc.parallelize(range(1000)) rdd2 = sc.parallelize(range(1000, 2000)) rdd3 = rdd1.union(rdd2) rdd3.take(10) 以上就是基于Docker容器快速搭建HadoopSparkHiveHBase集群的详细过程记录。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值