通过打好的包安装
下载后的配置好的文件的目录结构如下,文件下载地址:
链接:https://pan.baidu.com/s/1i8yO2X25TZ0ofSEXPmIq-g 密码:akfq
├── apt.conf
├── build_network.sh
├── build.sh
├── config
│ ├── apt.conf
│ ├── core-site.xml
│ ├── hadoop-env.sh
│ ├── hdfs-site.xml
│ ├── hive-site.xml
│ ├── init_hive.sh
│ ├── init_mysql.sh
│ ├── mapred-site.xml
│ ├── master
│ ├── masters
│ ├── nohup.out
│ ├── pip.conf
│ ├── profile
│ ├── restart_containers.sh
│ ├── restart-hadoop.sh
│ ├── slaves
│ ├── spark-defaults.conf
│ ├── spark-env.sh
│ ├── ssh_config
│ ├── start_containers.sh
│ ├── start-hadoop.sh
│ ├── stop_containers.sh
│ └── yarn-site.xml
├── Dockerfile
1、下载docker包
docker search spark_cluster
INDEX NAME DESCRIPTION STARS OFFICIAL AUTOMATED
docker.io docker.io/reganzm/spark_cluster image contains hadoop spark cluster and h... 0
docker.io docker.io/reganzm/spark_cluster_python 0
docker.io docker.io/s914211/spark_cluster 0
docker.io docker.io/solumdev1/spark_cluster 0
docker pull docker.io/reganzm/spark_cluster # 下载该打包好的集群包
2、配置网络
新建build_network.sh文件,配置docker的网络,网络名称为spark,网段为172.16.0.0。
内容如下:
echo create network
docker network create --subnet=172.16.0.0/16 spark
echo create success
docker network ls
系统网络如下:
有五个节点:
* hadoop-master:hadoop namenode主节点
* hadoop-node1:hadoop数据节点1
* hadoop-node2:hadoop数据节点2
* hadoop-hive:hive节点
* hadoop-mysql:hive的数据库节点,存放元数据
5个容器的hostname都是以hadoop-*开头,这个命名是有讲究的,因为我们要配置容器之间的SSH我密钥登录,在不生成id_rsa.pub公钥的条件下,我们可以通过配置SSH过滤规则来配置容器间的互通信!具体配置会在后面讲到。
3、启动
启动脚本为start_container.sh
echo start hadoop-hive container...
docker run -itd --restart=always --net spark --ip 172.16.0.5 --privileged --name hive --hostname hadoop-hive --add-host hadoop-node1:172.16.0.3 \
--add-host hadoop-node2:172.16.0.4 --add-host hadoop-mysql:172.16.0.6 --add-host hadoop-maste:172.16.0.2 docker.io/reganzm/spark_cluster /bin/bash
echo start hadoop-mysql container ...
docker run -itd --restart=always --net spark --ip 172.16.0.6 --privileged --name mysql --hostname hadoop-mysql --add-host hadoop-node1:172.16.0.3 --add-host hadoop-node2:172.16.0.4 --add-host hadoop-hive:172.16.0.5 --add-host hadoop-maste:172.16.0.2 docker.io/reganzm/spark_cluster /bin/bash
echo start hadoop-maste container ...
docker run -itd --restart=always \--net spark \--ip 172.16.0.2 \--privileged \-p 18032:8032 \-p 28080:18080 \-p 29888:19888 \-p