本次结合搭建spark集群进一步学习。
1. 安装spark,具体内容看https://www.jianshu.com/p/ee210190224f,这个博主很详细,但是配置还是有问题,在启动hdfs时无法连接 slave1:8485 从而无法启动 namenode 或 datanode 等等各种诡异bug,原因不明,但是通过修改namenode的相关配置可以解决此问题,在第2步我修改了相关配置。
2.修改配置(本人安装的 hadoop 版本是2.7.7,注意区分 )
进入 /root/soft/apache/hadoop/hadoop-2.7.7/etc/hadoop。
core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/root/soft/apache/hadoop/hadoop-2.7.7/tmp</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/mnt/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/mnt/hadoop/dfs/data</value>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>master:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master:19888</value>
</property>
</configuration>
yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master:8088</value>
</property>
</configuration>
增加一个初始化脚本 /root/soft/shell/format-hostname.sh
echo > /etc/hosts
echo 172.17.0.1 host >> /etc/hosts
echo 172.17.0.2 master >> /etc/hosts
echo 172.17.0.3 slave1 >> /etc/hosts
echo 172.17.0.4 slave2 >> /etc/hosts
if [ $(hostname) = master ]; then
echo 1 > /root/soft/apache/zookeeper/zookeeper-3.4.9/tmp/myid
echo "I am master"
fi
if [ $(hostname) = slave1 ]; then
echo 2 > /root/soft/apache/zookeeper/zookeeper-3.4.9/tmp/myid
echo "I am slave1"
fi
if [ $(hostname) = slave2 ]; then
echo 3 > /root/soft/apache/zookeeper/zookeeper-3.4.9/tmp/myid
echo "I am slave3"
fi
修改完后保存容器为镜像 ubuntu:spark_2
3. 运行3个容器,分别为取名为master,slave1,slave2
主机上编写一个启动脚本 start.sh 可以节省时间
#停止所有容器
docker stop $(docker ps -a -q)
#删除所有容器
docker rm $(docker ps -a -q)
img=ubuntu:spark_2
#运行 master容器
docker run -itd -h master -v /Users/birenjianmo/Desktop/learn/dock/spark/share:/home --name master -p 54040:4040 -p 56066:6066 -p 57077:7077 -p 50070:50070 -p 50030:50030 -p 58088:8088 -p 58080:8080 $img
#运行 slave1,slave2 容器
docker run -itd -p 50085:50075 -h slave1 --name slave1 $img
docker run -itd -p 50095:50075 -h slave2 --name slave2 $img
#在容器内执行 shell 脚本
docker exec -itd master bash "/root/soft/shell/format-hostname.sh"
docker exec -itd slave1 bash "/root/soft/shell/format-hostname.sh"
docker exec -itd slave2 bash "/root/soft/shell/format-hostname.sh"
sleep 2
docker ps -a
上机验证:
输入: sh start.sh 启动容器
结果:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
c191213456cf ubuntu:spark_2 "bash" 3 seconds ago Up 2 seconds 4040/tcp, 6066/tcp, 7077/tcp, 8080/tcp, 8088/tcp, 50030/tcp, 50070/tcp, 0.0.0.0:50095->50075/tcp slave2
4b2af154c7e3 ubuntu:spark_2 "bash" 4 seconds ago Up 3 seconds 4040/tcp, 6066/tcp, 7077/tcp, 8080/tcp, 8088/tcp, 50030/tcp, 50070/tcp, 0.0.0.0:50085->50075/tcp slave1
bc0fe22a8749 ubuntu:spark_2 "bash" 6 seconds ago Up 3 seconds 0.0.0.0:50030->50030/tcp, 0.0.0.0:50070->50070/tcp, 0.0.0.0:54040->4040/tcp, 0.0.0.0:56066->6066/tcp, 0.0.0.0:57077->7077/tcp, 0.0.0.0:58080->8080/tcp, 0.0.0.0:58088->8088/tcp master
birenjianmodeMacBook-Pro:spark birenjianmo$
输入:docker exec -it master bash 进入 master 容器
birenjianmodeMacBook-Pro:spark birenjianmo$ docker exec -it master bash
root@master:/#
输入:hdfs namenode -format; start-dfs.sh; jps 格式化;启动hdfs;查看job
结果:
577 Jps
462 SecondaryNameNode
303 DataNode
175 NameNode
输入:start-yarn.sh; start-all.sh; jps 启动yarn;启动spark;查看job
1104 Worker
737 NodeManager
1028 Master
1141 Jps
636 ResourceManager
462 SecondaryNameNode
303 DataNode
175 NameNode
至此安装成功!