Ubuntu虚拟机下使用docker构建cluster运行Spark

1. 安装Docker

参考https://docs.docker.com/engine/installation/linux/ubuntu/ 安装docker

2. 选择基础镜像

这里选择ubuntu:16.04版本

docker pull ubuntu:16.04

运行ubuntu镜像:

docker run --rm -it ubuntu:16.04
root@mark-virtual-machine:/home/mark/dockerspace# docker run -it ubuntu:16.04
root@33afc5817cf2:/# ls
bin boot dev etc home lib lib64 media mnt opt proc root run sbin srv sys tmp usr var

3. 创建Dockerfile

下载Spark:

root@mark-virtual-machine:/home/mark/dockerspace# mkdir sparkbuild
cd sparkbuild
wget http://d3kbcqa49mib13.cloudfront.net/spark-2.1.0-bin-hadoop2.7.tgz
tar xzvf spark-2.1.0-bin-hadoop2.7.tgz

创建Dockerfile

root@mark-virtual-machine:/home/mark/dockerspace# cd sparkbuild/
root@mark-virtual-machine:/home/mark/dockerspace/sparkbuild# vi Dockerfile

内容如下:

FROM ubuntu:16.04
RUN apt-get update && apt-get install -y openjdk-8-jdk
EXPOSE 8080
COPY spark-2.1.0-bin-hadoop2.7 /spark
CMD /bin/bash

保存并build(这里build了一个包含spark的基础镜像,之后可以根据不同的启动命令来启动master和worker):

root@mark-virtual-machine:/home/mark/dockerspace/sparkbuild# docker build -t myspark:0.1 .

运行并简单测试:

docker run --rm -it -P myspark:0.1

测试启动Spark master server:

root@0950cfbe650d:/# ./spark/sbin/start-master.sh
starting org.apache.spark.deploy.master.Master, logging to /spark/logs/spark--org.apache.spark.deploy.master.Master-1-96a43ba2b88a.out

4. 启动Spark

启动spark master(作为后台进程):

docker run --rm -itdP -h spark-master myspark:0.1 ./spark/bin/spark-class org.apache.spark.deploy.master.Master -h spark-master

注:docker中spark需要以foreground形式运行,否则运行完启动脚本后会马上退出

root@mark-virtual-machine:/home/mark/dockerspace/sparkbuild#
docker run --rm -itdP -h spark-master --name spark-master myspark:0.1 ./spark/bin/spark-class org.apache.spark.deploy.master.Master -h spark-master
4cc5a62207336320ea2e12d93a4bfd3b245a089b31ab3c0eafbef3f2c20cdb19root@mark-virtual-machine:/home/mark/dockerspace/sparkbuild# docker psCONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES4cc5a6220733 myspark:0.1 "./spark/bin/spark..." 3 seconds ago Up 2 seconds 0.0.0.0:32770->8080/tcp hardcore_lewin

可以看到虚拟机端口32770映射到了docker内的8080,虚拟机中打开浏览器,输入http://localhost:32770/

接下来启动一个slave

root@mark-virtual-machine:/home/mark/dockerspace/sparkbuild# docker run --rm -itdP -h spark-worker-1 --link spark-master myspark:0.1 ./spark/bin/spark-class org.apache.spark.deploy.worker.Worker spark://spark-master:7077
634d9c4d02397b49231d0fdf54d99f4d95f7a7c48023305d5876048222cd8256

刷新web端,可以看到新的work已经加入成功

Docker Build时无法解析 archive.ubuntu.com 的解决办法

apt-get update时,发现无法解析archive.ubuntu.com,一直卡在0%,解决方法参考:http://stackoverflow.com/questions/24991136/docker-build-could-not-resolve-archive-ubuntu-com-apt-get-fails-to-install-a
试了第一个答案不能用,第二个答案正确。可能不同情况下解决方式不同。我的环境是win10系统使用VMware运行的ubuntu16.04虚拟机,上面run一个ubuntu16.04的docker镜像。
First, let’s verify the problem:

$ docker run busybox nslookup google.com   # takes a long time
nslookup: can't resolve 'google.com'   # <--- appears after a long time
Server:    8.8.8.8
Address 1: 8.8.8.8

If the command appears to hang, but eventually spits out the error “can’t resolve ‘google.com’”, then you have the same problem as me.

The nslookup command queries the DNS server 8.8.8.8 in order to turn the text address of ‘google.com’ into an IP address. Ironically, 8.8.8.8 is Google’s public DNS server. If nslookup fails, public DNS servers like 8.8.8.8 might be blocked by your company (which I assume is for security reasons).

You’d think that adding your company’s DNS servers to DOCKER_OPTS in /etc/default/docker should do the trick, but for whatever reason, it didn’t work for me. I describe what worked for me below.

SOLUTION:

On the host (I’m using Ubuntu 16.04), find out the primary and secondary DNS server addresses:

$ nmcli dev show | grep 'IP4.DNS'
IP4.DNS[1]:              10.0.0.2
IP4.DNS[2]:              10.0.0.3

Using these addresses, create a file /etc/docker/daemon.json:

$ sudo su root
# cd /etc/docker
# touch daemon.json

Put this in /etc/docker/daemon.json:

{                                                                          
    "dns": ["10.0.0.2", "10.0.0.3"]                                                                           
}  

Exit from root:

# exit

Now restart docker:

$ sudo service docker restart

VERIFICATION:

Now check that adding the /etc/docker/daemon.json file allows you to resolve ‘google.com’ into an IP address:

$ docker run busybox nslookup google.com
Server:    10.0.0.2
Address 1: 10.0.0.2
Name:      google.com
Address 1: 2a00:1450:4009:811::200e lhr26s02-in-x200e.1e100.net
Address 2: 216.58.198.174 lhr25s10-in-f14.1e100.net

REFERENCES:

I based my solution on an article by Robin Winslow, who deserves all of the credit for the solution. Thanks, Robin!

“Fix Docker’s networking DNS config.” Robin Winslow. Retrieved 2016-11-09. https://robinwinslow.uk/2016/06/23/fix-docker-networking-dns/

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值