Win10环境下使用Docker从零开始搭建hadoop集群
原文:Win10环境下使用Docker从零开始搭建hadoop集群
1. 拉取ubuntu镜像
docker pull ubuntu
2. 启动ubuntu容器
docker run -ti ubuntu /bin/bash
3. 退出容器保持后台运行(熟悉命令可不进行退出操作,直接进行第6步)
先Ctrl + P 再 Ctrl + Q
4. 查看正在运行的容器
> docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS NAMES
9e71371cab71 ubuntu "/bin/bash" 55 minutes ago Up 55 minutes lucid_lumiere
5. 重新进入ubuntu容器
docker attach 9e7
6. 安装必要工具
ubuntu 镜像中几乎没有可用工具,需要安装和配置一下环境:
- openjdk-8-jdk
- scala
- vim
- net-tools
- openssh-server
- openssh-client
apt-get update
apt-get install openjdk-8-jdk
apt-get install scala
apt-get install vim
apt-get install openssh-server
apt-get install openssh-client
如果想使用xshell工具远程连接,可参考之前的文章:点击跳转
7. 配置SSH免密登录
# 如果没法进入该目录,执行一次ssh localhost
cd ~/.ssh/
# 三次回车后,该目录下将会产生id_rsa,id_rsa.pub文件
ssh-keygen -t rsa
# 加入授权
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
# 如果不提示输入密码则SSH无密登陆配置成功
ssh localhost
8. 安装hadoop
wget https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-3.2.2/hadoop-3.2.2.tar.gz
# 解压
tar -zxvf hadoop-3.2.2.tar.gz -C /usr/local
cd /usr/local
# 重命名
mv hadoop-3.2.2 hadoop
如果下载链接失效,可以查看最新地址:hadoop清华下载地址
9. 配置JAVA和Hadoop环境
9.1 编辑 /etc/profile
# java
# 需要提前查看下载的JDK名称
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH
# hadoop
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_LIBEXEC_DIR=$HADOOP_HOME/libexec
export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native:$JAVA_LIBRARY_PATH
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HDFS_DATANODE_USER=root
export HDFS_DATANODE_SECURE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export HDFS_NAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
执行 source /etc/profile 生效
source /etc/profile
9.2 编辑 /usr/local/hadoop/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
9.3 编辑 /usr/local/hadoop/etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/root/hadoop/tmp</value>
</property>
</configuration>
创建临时文件夹
mkdir /root/hadoop/tmp
9.4 编辑 /usr/local/hadoop/etc/hadoop/hdfs-site.xml
<configuration>
<property>
<!-- 从节点数量 -->
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<!-- namenode元数据存放的路径 -->
<name>dfs.namenode.name.dir</name>
<value>/root/hadoop/hdfs/name</value>
</property>
<property>
<!-- 数据存放的路径 -->
<name>dfs.namenode.data.dir</name>
<value>/root/hadoop/hdfs/data</value>
</property>
</configuration>
9.5 编辑 /usr/local/hadoop/etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>
/usr/local/hadoop/etc/hadoop,
/usr/local/hadoop/share/hadoop/common/*,
/usr/local/hadoop/share/hadoop/common/lib/*,
/usr/local/hadoop/share/hadoop/hdfs/*,
/usr/local/hadoop/share/hadoop/hdfs/lib/*,
/usr/local/hadoop/share/hadoop/mapreduce/*,
/usr/local/hadoop/share/hadoop/mapreduce/lib/*,
/usr/local/hadoop/share/hadoop/yarn/*,
/usr/local/hadoop/share/hadoop/yarn/lib/*
</value>
</property>
</configuration>
9.6 编辑 /usr/local/hadoop/etc/hadoop/yarn.xml
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
<property>
<!-- 采用mapreduce洗牌的方式进行处理 -->
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
9.7 编辑 /usr/local/hadoop/etc/hadoop/workers
删除localhost,添加两台机器名称
slave1
slave2
9.8 namenode格式化
./bin/hadoop namenode -format
10. 将容器打包成镜像
root@master:exit
docker commit -m "hadoop install" {CONTAINER ID} ubuntu:hadoop
11. 创建网络,保证hadoop使用同一网络
docker network create --driver=bridge hadoop
docker network ls
12. 分别启动master、slave1、slave2容器
docker run -it --network hadoop -h "master" --name "master" -p 9870:9870 -p 8088:8088 -p 20022:22 -p 9000:9000 ubuntu:hadoop /bin/bash
docker run -it --network hadoop -h "slave1" --name "slave1" ubuntu:hadoop /bin/bash
docker run -it --network hadoop -h "slave2" --name "slave1" ubuntu:hadoop /bin/bash
13. 登录master启动所有节点
cd /usr/local/hadoop/sbin/
./start-all.sh
如果一切启动都正常的话,我们可以访问本地 localhost:9870 和 localhost:8088,
如果出现22端口访问不通的情况,需要检查下ssh是否开启
# 检查ssh是否开启
/etc/init.d/ssh status
# 启动ssh
/etc/init.d/ssh start
# 使用 ssh localhost 检查是否成功
14. 小示例:通过命令上传一个小文件
cd /usr/local/hadoop/bin
./hadoop fs -mkdir /input
./hadoop fs -put ../README.txt /input
./hadoop fs -ls /input
备注:虽然使用了docker成功搭建了hadoop集群,但是在后续实际使用过程中还是发现一些不方便的地方:比如端口映射问题,当容器已经在运行,有新增端口需求时修改会比较麻烦,如果你有好的处理方式留言告诉我吧
参考文章:使用docker构建hadoop集群