环境配置
- VM:VMware Workstation
- OS:Ubuntu 14.04 LTS
- Hadoop:hadoop-2.5.2
Hadoop集群规划
- 172.17.0.2 hadoop-master
- 172.17.0.3 hadoop-slave1
- 172.17.0.4 hadoop-slave2
基于Dockerfile构建Hadoop基础镜像
创建Dockerfile文件,内容如下 :
FROM ubuntu:14.04
MAINTAINER Rain <>
ENV REFRESHED_AT 2016-09-15
RUN apt-get update
RUN apt-get install -y openssh-server openssh-client
ADD jdk-7u80-linux-x64.tar.gz /usr/local/
ENV JAVA_HOME /usr/local/jdk1.7.0_80
ENV CLASSPATH $JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
ENV PATH $PATH:$JAVA_HOME/bin
RUN addgroup hadoop
RUN useradd -m hadoop -g hadoop -p qazwsx
# RUN sudo usermod -aG sudo hadoop
ADD hadoop-2.5.2.tar.gz /usr/local/
RUN chown -R hadoop:hadoop /usr/local/hadoop-2.5.2
RUN cd /usr/local && ln -s ./hadoop-2.5.2 hadoop
ENV HADOOP_PREFIX /usr/local/hadoop
ENV HADOOP_HOME /usr/local/hadoop
ENV HADOOP_COMMON_HOME /usr/local/hadoop
ENV HADOOP_HDFS_HOME /usr/local/hadoop
ENV HADOOP_MAPRED_HOME /usr/local/hadoop
ENV HADOOP_YARN_HOME /usr/local/hadoop
ENV HADOOP_CONF_DIR /usr/local/hadoop/etc/hadoop
RUN cd /etc/sudoers.d && sudo touch nopasswdsudo && echo "hadoop ALL=(ALL) NOPASSWD : ALL" >> nopasswdsudo
RUN mkdir /var/run/sshd
USER hadoop
RUN ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
RUN cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
EXPOSE 22
构建基础镜像:
$ sudo docker build -t="rain:hadoop-base" .
从镜像启动容器:
$ sudo docker run -t -i rain:hadoop-base /bin/bash
基于Dockerfile构建Hadoop主镜像
创建Dockerfile文件,内容如下 :
FROM rain:hadoop-base
MAINTAINER Rain <>
ENV REFRESHED_AT 2016-09-14
ADD hadoop-env.sh $HADOOP_HOME/etc/hadoop/
ADD mapred-env.sh $HADOOP_HOME/etc/hadoop/
ADD yarn-env.sh $HADOOP_HOME/etc/hadoop/
ADD core-site.xml $HADOOP_HOME/etc/hadoop/
ADD hdfs-site.xml $HADOOP_HOME/etc/hadoop/
ADD mapred-site.xml $HADOOP_HOME/etc/hadoop/
ADD yarn-site.xml $HADOOP_HOME/etc/hadoop/
ADD slaves $HADOOP_HOME/etc/hadoop/
RUN sudo chown -R hadoop:hadoop $HADOOP_HOME/etc/hadoop
RUN sudo mkdir -p /opt/hadoop/data
#RUN cd /opt && sudo mkdir hadoop && cd hadoop && sudo mkdir data
RUN sudo chown -R hadoop:hadoop /opt/hadoop
WORKDIR /home/hadoop
COPY bootstrap.sh /home/hadoop/
RUN sudo chown -R hadoop:hadoop /home/hadoop
RUN sudo chmod 766 /home/hadoop/bootstrap.sh
ENTRYPOINT ["/home/hadoop/bootstrap.sh"]
构建Hadoop主镜像:
$ sudo docker build -t="rain:hadoop-master" .
启动容器:
$ sudo docker run --name hadoop-master -h hadoop-master -d -P -p 50070:50070 -p 8088:8088 rain:hadoop-master
基于Dockerfile构建Hadoop从镜像
创建Dockerfile文件,内容同Hadoop主镜像。
编辑bootstrap.sh用于启动ssh:
#!/bin/bash
sudo /usr/sbin/sshd -D
构建Hadoop从镜像:
$ sudo docker build -t="rain:hadoop-slave" .
启动容器:
$ sudo docker run -t -i --name hadoop-slave1 -h hadoop-slave1 -d rain:hadoop-slave
$ sudo docker run -t -i --name hadoop-slave2 -h hadoop-slave2 -d rain:hadoop-slave
与Hadoop Master和Slave交互
操作命令如下:
docker exec -it hadoop-master /bin/bash
docker exec -it hadoop-slave1 /bin/bash
docker exec -it hadoop-slave2 /bin/bash
配置Host编写脚本:
hadoop@hadoop-master:~$ vi run_hosts.sh
内容如下:
#!/bin/bash
echo 172.17.0.2 hadoop-master >> /etc/hosts
echo 172.17.0.3 hadoop-slave1 >> /etc/hosts
echo 172.17.0.4 hadoop-slave2 >> /etc/hosts
执行脚本:
hadoop@hadoop-master:~$ chmod +x run_hosts.sh
hadoop@hadoop-master:~$ sudo ./run_hosts.sh
复制脚本到其它两个从节点,并执行脚本:
hadoop@hadoop-master:~$ scp run_hosts.sh hadoop@hadoop-slave1:/home/hadoop
hadoop@hadoop-master:~$ scp run_hosts.sh hadoop@hadoop-slave2:/home/hadoop
Hadoop集群操作:
文件格式化:
$ bin/hdfs namenode -format
启动集群:
$ sbin/start-all.sh
节点启动成功后,可以在相应节点上看到下面进程:
hadoop@hadoop-master:/usr/local/hadoop$ jps
534 ResourceManager
888 Jps
400 SecondaryNameNode
181 NameNode
hadoop@hadoop-slave1:~$ jps
196 NodeManager
63 DataNode
318 Jps
hadoop@hadoop-slave2:~$ jps
156 NodeManager
63 DataNode
268 Jps
通过宿主机访问WEB控制台:
http://宿主机IP:50070
http://宿主机IP:8088
界面效果图:
Hadoop配置文件:
1.编辑hadoop-env.sh:
export JAVA_HOME=/usr/local/jdk1.7.0_80
2.编辑core-site.xml:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop-master/</value>
</property>
</configuration>
3.编辑hdfs-site.xml:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/opt/hadoop/data/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/opt/hadoop/data/datanode</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.datanode.balance.bandwidthPerSec</name>
<value>12000000</value>
</property>
<property>
<name>dfs.datanode.du.reserved</name>
<value>5000000000</value>
</property>
<property>
<name>dfs.blocksize</name>
<value>128m</value>
</property>
<property>
<name>dfs.namenode.handler.count</name>
<value>60</value>
</property>
<property>
<name>dfs.datanode.handler.count</name>
<value>10</value>
</property>
<property>
<name>dfs.datanode.max.xcievers</name>
<value>8192</value>
</property>
</configuration>
4.编辑mapred-site.xml:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop-master:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop-master:19888</value>
</property>
<property>
<name>mapred.compress.map.output</name>
<value>true</value>
</property>
<!--<property>
<name>mapred.map.output.compression.codec</name>
<value>org.apache.hadoop.io.compress.SnappyCodec</value>
</property>-->
</configuration>
5.编辑yarn-site.xml:
<?xml version="1.0"?>
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop-master</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>${yarn.resourcemanager.hostname}:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>${yarn.resourcemanager.hostname}:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>${yarn.resourcemanager.hostname}:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>${yarn.resourcemanager.hostname}:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>${yarn.resourcemanager.hostname}:8088</value>
</property>
<!--
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>1024</value>
<discription>The amount of physical memory (in MB) that may be allocated to containers being run by the node manager.</discription>
</property>
-->
<!--<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>100000</value>
</property>
<property>
<name>yarn.log-aggregation.retain-check-interval-seconds</name>
<value>60</value>
</property>
-->
</configuration>
6.编辑slaves:
hadoop-slave1
hadoop-slave2