docker下的spark集群,调整参数榨干硬件

SPARK_WORKER_CORES: 2

SPARK_WORKER_MEMORY: 2g

SPARK_WORKER_PORT: 8881

SPARK_WORKER_WEBUI_PORT: 8082

SPARK_PUBLIC_DNS: localhost

links:

  • master

expose:

  • 7012

  • 7013

  • 7014

  • 7015

  • 8881

volumes:

  • ./conf/worker2:/conf

  • ./data/worker2:/tmp/data

如上所示,注意volumes参数,都映射在了docker-compose.yml同一层级的conf和data两个目录下,这里只贴出了worker1和worker2的内容,worker3-worker6的内容都是类似的;

hdfs的文件目录导致的磁盘空间不足问题

  1. 先来看下hdfs的文件目录配置:

volumes:

  • hadoop_datanode1:/hadoop/dfs/data
  1. 上面的hadoop_datanode1数据卷的配置在docker-compose.yml的最底部,是默认声明,如下:

volumes:

hadoop_namenode:

hadoop_datanode1:

hadoop_datanode2:

hadoop_datanode3:

hadoop_historyserver:

  1. 在容器运行状态,执行命令docker inspect datanode1查看容器信息,和数据卷相关的信息如下所示:

“Mounts”: [

{

“Type”: “volume”,

“Name”: “temp_hadoop_datanode1”,

“Source”: “/var/lib/docker/volumes/temp_hadoop_datanode1/_data”,

“Destination”: “/hadoop/dfs/data”,

“Driver”: “local”,

“Mode”: “rw”,

“RW”: true,

“Propagation”: “”

}

]

可见hdfs容器的文件目录对应的是宿主机的/var/lib/docker/volumes;

4. 用df -m看看磁盘空间情况,如下所示,“/var/lib/docker/volumes"所在的”/dev/nvme0n1p3"设备可用空间只有20多G(29561),显然在保存大量文件时这个空间是不够的,而且hdfs的默认副本数为3:

root@willzhao-deepin:/data/work/spark/temp# df -m

文件系统 1M-块 已用 可用 已用% 挂载点

udev 7893 0 7893 0% /dev

tmpfs 1584 4 1581 1% /run

/dev/nvme0n1p3 43927 12107 29561 30% /

tmpfs 7918 0 7918 0% /dev/shm

tmpfs 5 1 5 1% /run/lock

tmpfs 7918 0 7918 0% /sys/fs/cgroup

/dev/nvme0n1p4 87854 181 83169 1% /home

/dev/nvme0n1p1 300 7 293 3% /boot/efi

/dev/sda1 468428 109152 335430 25% /data

tmpfs 1584 1 1584 1% /run/user/108

tmpfs 1584 0 1584 0% /run/user/0

  1. 上面的磁盘信息显示设备/dev/sda1还有300G,所以hdfs的文件目录映射到/dev/sda1就能缓解磁盘空间问题了,于是修改docker-compose.yml文件中hdfs的三个数据节点的配置,修改后如下:

datanode1:

image: bde2020/hadoop-datanode:1.1.0-hadoop2.7.1-java8

container_name: datanode1

depends_on:

  • namenode

volumes:

  • ./hadoop/datanode1:/hadoop/dfs/data

env_file:

  • ./hadoop.env

datanode2:

image: bde2020/hadoop-datanode:1.1.0-hadoop2.7.1-java8

container_name: datanode2

depends_on:

  • namenode

volumes:

  • ./hadoop/datanode2:/hadoop/dfs/data

env_file:

  • ./hadoop.env

datanode3:

image: bde2020/hadoop-datanode:1.1.0-hadoop2.7.1-java8

container_name: datanode3

depends_on:

  • namenode

volumes:

  • ./hadoop/datanode3:/hadoop/dfs/data

env_file:

  • ./hadoop.env

再将下面这段配置删除:

volumes:

hadoop_namenode:

hadoop_datanode1:

hadoop_datanode2:

hadoop_datanode3:

hadoop_historyserver:

开发master的4040和work的8080端口

  1. 任务运行过程中,如果有UI页面来观察详情,可以帮助我们更全面直观的了解运行情况,所以需要修改配置开放端口;

  2. 如下所示,expose参数增加4040,表示对外暴露4040端口,ports参数增加4040:4040,表示容器的4040映射到宿主机的4040端口:

master:

image: gettyimages/spark:2.3.0-hadoop-2.8

container_name: master

command: bin/spark-class org.apache.spark.deploy.master.Master -h master

hostname: master

environment:

MASTER: spark://master:7077

SPARK_CONF_DIR: /conf

SPARK_PUBLIC_DNS: localhost

links:

  • namenode

expose:

  • 4040

  • 7001

  • 7002

  • 7003

  • 7004

  • 7005

  • 7077

  • 6066

ports:

  • 4040:4040

  • 6066:6066

  • 7077:7077

  • 8080:8080

volumes:

  • ./conf/master:/conf

  • ./data:/tmp/data

  • ./jars:/root/jars

  1. worker的web端口同样需要打开,访问worker的web页面可以观察worker的状态,并且查看任务日志(这个很重要),这里要注意的是由于有多个worker,所以要映射到宿主机的多个端口,如下配置,workder1的environment.SPARK_WORKER_WEBUI_PORT配置为8081,并且暴露8081,再将容器的8081映射到宿主机的8081,workder2的environment.SPARK_WORKER_WEBUI_PORT配置为8082,并且暴露8082,再将容器的8082映射到宿主机的8082:

worker1:

image: gettyimages/spark:2.3.0-hadoop-2.8

container_name: worker1

command: bin/spark-class org.apache.spark.deploy.worker.Worker spark://master:7077

hostname: worker1

environment:

SPARK_CONF_DIR: /conf

SPARK_WORKER_CORES: 2

SPARK_WORKER_MEMORY: 2g

SPARK_WORKER_PORT: 8881

SPARK_WORKER_WEBUI_PORT: 8081

SPARK_PUBLIC_DNS: localhost

links:

  • master

expose:

  • 7012

  • 7013

  • 7014

  • 7015

  • 8881

  • 8081

ports:

  • 8081:8081

volumes:

  • ./conf/worker1:/conf

  • ./data/worker1:/tmp/data

worker2:

image: gettyimages/spark:2.3.0-hadoop-2.8

container_name: worker2

command: bin/spark-class org.apache.spark.deploy.worker.Worker spark://master:7077

hostname: worker2

environment:

SPARK_CONF_DIR: /conf

SPARK_WORKER_CORES: 2

SPARK_WORKER_MEMORY: 2g

SPARK_WORKER_PORT: 8881

SPARK_WORKER_WEBUI_PORT: 8082

SPARK_PUBLIC_DNS: localhost

links:

  • master

expose:

  • 7012

  • 7013

  • 7014

  • 7015

  • 8881

  • 8082

ports:

  • 8082:8082

volumes:

  • ./conf/worker2:/conf

  • ./data/worker2:/tmp/data

worker3-worker6的配置与上面类似,注意用不同的端口号;

至此,修改已经完成,最终版的docker-compose.yml内容如下:

version: “2.2”

services:

namenode:

image: bde2020/hadoop-namenode:1.1.0-hadoop2.7.1-java8

container_name: namenode

volumes:

  • ./hadoop/namenode:/hadoop/dfs/name

  • ./input_files:/input_files

environment:

  • CLUSTER_NAME=test

env_file:

  • ./hadoop.env

ports:

  • 50070:50070

resourcemanager:

image: bde2020/hadoop-resourcemanager:1.1.0-hadoop2.7.1-java8

container_name: resourcemanager

depends_on:

  • namenode

  • datanode1

  • datanode2

env_file:

  • ./hadoop.env

historyserver:

image: bde2020/hadoop-historyserver:1.1.0-hadoop2.7.1-java8

container_name: historyserver

depends_on:

  • namenode

  • datanode1

  • datanode2

volumes:

  • ./hadoop/historyserver:/hadoop/yarn/timeline

env_file:

  • ./hadoop.env

nodemanager1:

image: bde2020/hadoop-nodemanager:1.1.0-hadoop2.7.1-java8

container_name: nodemanager1

depends_on:

  • namenode

  • datanode1

  • datanode2

env_file:

  • ./hadoop.env

datanode1:

image: bde2020/hadoop-datanode:1.1.0-hadoop2.7.1-java8

container_name: datanode1

depends_on:

  • namenode

volumes:

  • ./hadoop/datanode1:/hadoop/dfs/data

env_file:

  • ./hadoop.env

datanode2:

image: bde2020/hadoop-datanode:1.1.0-hadoop2.7.1-java8

container_name: datanode2

depends_on:

  • namenode

volumes:

  • ./hadoop/datanode2:/hadoop/dfs/data

env_file:

  • ./hadoop.env

datanode3:

image: bde2020/hadoop-datanode:1.1.0-hadoop2.7.1-java8

container_name: datanode3

depends_on:

  • namenode

volumes:

  • ./hadoop/datanode3:/hadoop/dfs/data

env_file:

  • ./hadoop.env

master:

image: gettyimages/spark:2.3.0-hadoop-2.8

container_name: master

command: bin/spark-class org.apache.spark.deploy.master.Master -h master

hostname: master

environment:

MASTER: spark://master:7077

SPARK_CONF_DIR: /conf

SPARK_PUBLIC_DNS: localhost

links:

  • namenode

expose:

  • 4040

  • 7001

  • 7002

  • 7003

  • 7004

  • 7005

  • 7077

  • 6066

ports:

  • 4040:4040

  • 6066:6066

  • 7077:7077

  • 8080:8080

volumes:

  • ./conf/master:/conf

  • ./data:/tmp/data

  • ./jars:/root/jars

worker1:

image: gettyimages/spark:2.3.0-hadoop-2.8

container_name: worker1

command: bin/spark-class org.apache.spark.deploy.worker.Worker spark://master:7077

hostname: worker1

environment:

SPARK_CONF_DIR: /conf

SPARK_WORKER_CORES: 2

SPARK_WORKER_MEMORY: 2g

SPARK_WORKER_PORT: 8881

SPARK_WORKER_WEBUI_PORT: 8081

SPARK_PUBLIC_DNS: localhost

links:

  • master

expose:

  • 7012

  • 7013

  • 7014

  • 7015

  • 8881

  • 8081

ports:

  • 8081:8081

volumes:

  • ./conf/worker1:/conf

  • ./data/worker1:/tmp/data

worker2:

image: gettyimages/spark:2.3.0-hadoop-2.8

container_name: worker2

command: bin/spark-class org.apache.spark.deploy.worker.Worker spark://master:7077

hostname: worker2

environment:

SPARK_CONF_DIR: /conf

SPARK_WORKER_CORES: 2

SPARK_WORKER_MEMORY: 2g

SPARK_WORKER_PORT: 8881

SPARK_WORKER_WEBUI_PORT: 8082

SPARK_PUBLIC_DNS: localhost

links:

  • master

expose:

  • 7012

  • 7013

  • 7014

  • 7015

  • 8881

  • 8082

ports:

  • 8082:8082

volumes:

  • ./conf/worker2:/conf

  • ./data/worker2:/tmp/data

worker3:

image: gettyimages/spark:2.3.0-hadoop-2.8

container_name: worker3

command: bin/spark-class org.apache.spark.deploy.worker.Worker spark://master:7077

hostname: worker3

environment:

SPARK_CONF_DIR: /conf

SPARK_WORKER_CORES: 2

SPARK_WORKER_MEMORY: 2g

SPARK_WORKER_PORT: 8881

SPARK_WORKER_WEBUI_PORT: 8083

SPARK_PUBLIC_DNS: localhost

links:

  • master

expose:

  • 7012

  • 7013

  • 7014

  • 7015

  • 8881

  • 8083

ports:

  • 8083:8083

volumes:

  • ./conf/worker3:/conf

  • ./data/worker3:/tmp/data

worker4:

image: gettyimages/spark:2.3.0-hadoop-2.8

container_name: worker4

command: bin/spark-class org.apache.spark.deploy.worker.Worker spark://master:7077

hostname: worker4

environment:

SPARK_CONF_DIR: /conf

SPARK_WORKER_CORES: 2

SPARK_WORKER_MEMORY: 2g

SPARK_WORKER_PORT: 8881

SPARK_WORKER_WEBUI_PORT: 8084

SPARK_PUBLIC_DNS: localhost

links:

  • master

expose:

  • 7012

  • 7013

  • 7014

  • 7015

  • 8881

  • 8084

ports:

  • 8084:8084

volumes:

  • ./conf/worker4:/conf

  • ./data/worker4:/tmp/data

worker5:

image: gettyimages/spark:2.3.0-hadoop-2.8

container_name: worker5

command: bin/spark-class org.apache.spark.deploy.worker.Worker spark://master:7077

hostname: worker5

environment:

SPARK_CONF_DIR: /conf

SPARK_WORKER_CORES: 2

SPARK_WORKER_MEMORY: 2g

SPARK_WORKER_PORT: 8881

SPARK_WORKER_WEBUI_PORT: 8085

SPARK_PUBLIC_DNS: localhost

links:

  • master

expose:

  • 7012

  • 7013

  • 7014

  • 7015

  • 8881

  • 8085

ports:

  • 8085:8085

volumes:

  • ./conf/worker5:/conf

  • ./data/worker5:/tmp/data

worker6:

image: gettyimages/spark:2.3.0-hadoop-2.8

container_name: worker6

command: bin/spark-class org.apache.spark.deploy.worker.Worker spark://master:7077

hostname: worker6

environment:

SPARK_CONF_DIR: /conf

线程、数据库、算法、JVM、分布式、微服务、框架、Spring相关知识

一线互联网P7面试集锦+各种大厂面试集锦

学习笔记以及面试真题解析

8

container_name: worker5

command: bin/spark-class org.apache.spark.deploy.worker.Worker spark://master:7077

hostname: worker5

environment:

SPARK_CONF_DIR: /conf

SPARK_WORKER_CORES: 2

SPARK_WORKER_MEMORY: 2g

SPARK_WORKER_PORT: 8881

SPARK_WORKER_WEBUI_PORT: 8085

SPARK_PUBLIC_DNS: localhost

links:

  • master

expose:

  • 7012

  • 7013

  • 7014

  • 7015

  • 8881

  • 8085

ports:

  • 8085:8085

volumes:

  • ./conf/worker5:/conf

  • ./data/worker5:/tmp/data

worker6:

image: gettyimages/spark:2.3.0-hadoop-2.8

container_name: worker6

command: bin/spark-class org.apache.spark.deploy.worker.Worker spark://master:7077

hostname: worker6

environment:

SPARK_CONF_DIR: /conf

线程、数据库、算法、JVM、分布式、微服务、框架、Spring相关知识

[外链图片转存中…(img-WmRfnKtO-1719155238251)]

一线互联网P7面试集锦+各种大厂面试集锦

[外链图片转存中…(img-GIn1uz58-1719155238252)]

学习笔记以及面试真题解析

[外链图片转存中…(img-PqZfyRPp-1719155238252)]

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值