文章目录
本文所使用的 Linux 发行版为 OpenSUSE 15.6 ,使用 docker 部署从节点, JDK 使用 Oracle JDK 8 ,文章内容仅供作为部署学习的参考,不作为实际生产环境的标准
一、修改主机名
以 root 用户或 sudo 执行命令
localhost:~ # hostnamectl set-hostname master
查看修改是否成功
localhost:~ # hostname
master
二、安装docker
docker 在 OpenSUSE 15.6 的安装可查阅先前发布的这篇文章,本文不作过多赘述
[Docker学习笔记 1] 安装Docker(OpenSUSE 15.6)
三、添加hadoop用户和组
为了便于进行精细化的权限管理,避免 hadoop 直接以 root 用户运行,本文将添加 hadoop 用户和组,也可根据自身实际需要选择相应的用户和组
添加 hadoop 组, 以 root 用户或 sudo 执行
localhost:~ # groupadd hadoop
添加 hadoop 用户并自动创建主目录
localhost:~ # useradd -m hadoop -g hadoop
四、安装Master节点JDK
本文将使用 Oracle JDK 8 作为主节点与从节点的 JDK 可在如下地址下载
Java Archive Downloads - Java SE 8u211 and later
Linux 主机建议下载.tar.gz
格式的压缩包
根据自身需求将压缩包放置到主节点合适的位置,本文将放置到 /opt/software 目录下,并在此目录解压,请根据自身实际情况与需求选择
localhost:/opt/software # ls
jdk-8u441-linux-x64.tar.gz
localhost:/opt/software # tar -zxvf jdk-8u441-linux-x64.tar.gz
其中 jdk-8u441-linux-x64.tar.gz 为本文所使用的 JDK 压缩包文件名,请根据实际文件名作出修改
解压完成后对 Java 目录创建软链接,以便于配置环境变量,切换 Java 版本仅需更改链接目标而无需更改环境变量,本文将在 /opt/softln 目录下创建软链接,此步为可选操作,可根据自身实际需求调整
localhost:/opt/software # ls
jdk1.8.0_441
localhost:/opt/software # ln -sfn /opt/software/jdk1.8.0_441 /opt/softln/java
切换 hadoop 用户配置环境变量,本文使用 vim 修改~/.bashrc
文件
localhost:/opt/software # su hadoop
hadoop@master:/opt/software> cd ~
hadoop@master:~> vim ~/.bashrc
在文件末尾添加以下内容
export JAVA_HOME=/opt/softln/java
export PATH=$PATH:$JAVA_HOME/bin
其中JAVA_HOME
变量的值根据实际作出调整
保存退出后使环境变量生效并验证
hadoop@master:~> source ~/.bashrc
hadoop@master:~> java -version
java version "1.8.0_441"
Java(TM) SE Runtime Environment (build 1.8.0_441-b07)
Java HotSpot(TM) 64-Bit Server VM (build 25.441-b07, mixed mode)
五、配置Master节点ssh免密登录
将用户切换为 hadoop 并进入用户主目录
localhost:~ # su hadoop
hadoop@master:/root> cd ~
创建并进入 .ssh 目录
hadoop@master:~> mkdir -p .ssh && cd .ssh
生成私钥与公钥
hadoop@master:~/.ssh> ssh-keygen -t ecdsa -N '' -f './id_ecdsa' -q
hadoop@master:~/.ssh> cat id_ecdsa.pub > authorized_keys
hadoop@master:~/.ssh> chmod 600 authorized_keys
hadoop@master:~/.ssh> chmod 700 ~/.ssh
此时执行 ssh 免密登录还会出现问询,阻断正常执行
hadoop@master:~/.ssh> ssh localhost
The authenticity of host 'localhost (::1)' can't be established.
ED25519 key fingerprint is SHA256:/iinzs1qbsR4p/Eq9PeJJ+AxgoiDi05UGLtUoKWBsco.
This key is not known by any other names.
Are you sure you want to continue connecting (yes/no/[fingerprint])?
切换 root 用户修改 /etc/ssh/ssh_config
,找到
# StrictHostKeyChecking ask
取消注释改为
StrictHostKeyChecking no
切换回 hadoop 用户测试免密登录
hadoop@master:~/.ssh> ssh localhost
Warning: Permanently added 'localhost' (ED25519) to the list of known hosts.
Have a lot of fun...
hadoop@master:~>
六、安装Master节点Hadoop
国内镜像站或 Apache Archive 均可下载 Hadoop ,本文将安装 Hadoop 2.10.2 ,可在以下链接中下载
Apache Archive
Index of /dist/hadoop/common/hadoop-2.10.2
中国科学技术大学镜像站
Index of /apache/hadoop/common/hadoop-2.10.2/
推荐使用国内镜像站下载
将压缩包放置到主节点合适的位置,本文依然将压缩包放置到 /opt/software 目录下,并在此解压
localhost:/opt/software # ls
hadoop-2.10.2.tar.gz jdk1.8.0_441
localhost:/opt/software # tar -zxvf hadoop-2.10.2.tar.gz
其中 hadoop-2.10.2.tar.gz 为本文所使用的压缩包文件名,请根据实际文件名修改
解压完成后创建对 Hadoop 目录创建软链接,以便于配置环境变量,本文将在 /opt/softln 目录下创建软链接,此步为可选操作
localhost:/opt/software # ls
hadoop-2.10.2 jdk1.8.0_441
localhost:/opt/software # ln -sfn /opt/software/hadoop-2.10.2 /opt/softln/hadoop
更改目录归属,使其归属于 hadoop 用户和组
localhost:/opt/software # chown -R hadoop:hadoop hadoop-2.10.2
切换到 hadoop 用户,并修改~/.bashrc
文件配置环境变量
localhost:/opt/software # su hadoop
hadoop@master:/opt/software> cd ~
hadoop@master:~> vim ~/.bashrc
将文件中所配置的环境变量修改成
export JAVA_HOME=/opt/softln/java
export HADOOP_HOME=/opt/softln/hadoop
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
其中 HADOOP_HOME
变量的值根据实际作出调整
保存以后使环境变量生效
hadoop@master:~> source ~/.bashrc
创建 tmp 目录, name 目录和 data 目录,本文中将这三个目录放置到 $HADOOP_HOME
中,可根据需求灵活选择
hadoop@master:~> mkdir -p $HADOOP_HOME/tmp
hadoop@master:~> mkdir -p $HADOOP_HOME/hdfs/data
hadoop@master:~> mkdir -p $HADOOP_HOME/hdfs/name
配置 $HADOOP_HOME/etc/hadoop/core-site.xml
在 <configuration>
标签中加入配置
<property>
<name>hadoop.tmp.dir</name>
<value>file:/opt/softln/hadoop/tmp</value>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
/opt/softln/hadoop/tmp 值为先前创建的 tmp 目录, hdfs://master:9000 中的 master 为先前配置的主机名,请根据实际情况修改
配置 $HADOOP_HOME/etc/hadoop/hadoop-env.sh
将
export JAVA_HOME=${JAVA_HOME}
改为
export JAVA_HOME=/opt/softln/java
/opt/softln/java 为先前创建的软链接,请根据实际情况修改
配置 $HADOOP_HOME/etc/hadoop/hdfs-site.xml
在 <configuration>
标签中加入配置
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/opt/softln/hadoop/hdfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/opt/softln/hadoop/hdfs/data</value>
</property>
其中 /opt/softln/hadoop/hdfs/name 与 /opt/softln/hadoop/hdfs/data 均为上文所创建的 name 和 data 目录,请根据实际情况作出修改
配置 $HADOOP_HOME/etc/hadoop/mapred-site.xml
执行
hadoop@master:~> cp $HADOOP_HOME/etc/hadoop/mapred-site.xml.template $HADOOP_HOME/etc/hadoop/mapred-site.xml
在 <configuration>
标签中加入配置
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>http://master:9001</value>
</property>
其中 http://master:9001 中的 master 为上文配置的主机名,请根据实际情况修改
配置 $HADOOP_HOME/etc/hadoop/yarn-env.sh
将
JAVA_HOME=$JAVA_HOME
改为
JAVA_HOME=/opt/softln/java
/opt/softln/java 为先前创建的软链接,请根据实际情况修改
配置 $HADOOP_HOME/etc/hadoop/yarn-site.xml
在 <configuration>
标签中加入配置
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
其中 master 为上文配置的主机名,请根据实际情况修改
七、构建Docker从节点
为了提升自动化构建效率,本文将通过 nginx 搭建 http 服务,以使从节点能够方便地通过 web 请求获取所需资源,请先以 root 用户执行以下脚本
#!/bin/bash
HADOOP_HOME=/opt/softln/hadoop
HADOOP_USER_HOME=/home/hadoop
rm -rf ./html
mkdir -p ./html/software
mkdir -p ./html/.ssh
mkdir -p ./html/etc/hadoop
mkdir -p ./html/etc/ssh
cp $HADOOP_USER_HOME/.ssh/authorized_keys ./html/.ssh
cp $HADOOP_USER_HOME/.ssh/id_ecdsa.pub ./html/.ssh
cp $HADOOP_HOME/etc/hadoop/core-site.xml ./html/etc/hadoop
cp $HADOOP_HOME/etc/hadoop/hadoop-env.sh ./html/etc/hadoop
cp $HADOOP_HOME/etc/hadoop/hdfs-site.xml ./html/etc/hadoop
cp $HADOOP_HOME/etc/hadoop/mapred-site.xml ./html/etc/hadoop
cp $HADOOP_HOME/etc/hadoop/yarn-env.sh ./html/etc/hadoop
cp $HADOOP_HOME/etc/hadoop/yarn-site.xml ./html/etc/hadoop
cp /etc/ssh/ssh_config ./html/etc/ssh
cp /etc/ssh/ssh_host_ecdsa_key ./html/etc/ssh
cp /etc/ssh/ssh_host_ecdsa_key.pub ./html/etc/ssh
cp /etc/ssh/ssh_host_ed25519_key ./html/etc/ssh
cp /etc/ssh/ssh_host_ed25519_key.pub ./html/etc/ssh
cp /etc/ssh/ssh_host_rsa_key ./html/etc/ssh
cp /etc/ssh/ssh_host_rsa_key.pub ./html/etc/ssh
cp /etc/ssh/sshd_config ./html/etc/ssh
其中变量HADOOP_HOME
的值为 hadoop 用户所配置的HADOOP_HOME
环境变量值,HADOOP_USER_HOME
变量值为 hadoop 用户主目录,脚本执行完成后请将 jdk-8u441-linux-x64.tar.gz 和 hadoop-2.10.2.tar.gz 手动复制到 ./html/software 目录中,而后将此构建出的的 html 目录部署到 nginx 中, nginx 无须部署到主节点中,部署到可被从节点访问到的主机即可
编写从节点自动配置脚本,并将此脚本放置到 nginx 的 html 资源目录中,我将此脚本命名为autobuild.sh
#!/bin/bash
BASEURL=http://192.168.171.1
mkdir -p /run/sshd
chmod 0755 /run/sshd
mkdir -p /opt/software
mkdir -p /opt/softln
wget -O /opt/software/jdk-8u441-linux-x64.tar.gz $BASEURL/software/jdk-8u441-linux-x64.tar.gz
wget -O /opt/software/hadoop-2.10.2.tar.gz $BASEURL/software/hadoop-2.10.2.tar.gz
tar -zxvf /opt/software/jdk-8u441-linux-x64.tar.gz -C /opt/software/
tar -zxvf /opt/software/hadoop-2.10.2.tar.gz -C /opt/software/
rm -rf /opt/software/jdk-8u441-linux-x64.tar.gz
rm -rf /opt/software/hadoop-2.10.2.tar.gz
groupadd hadoop
useradd -m hadoop -g hadoop -s /bin/bash
ln -sfn /opt/software/jdk1.8.0_441 /opt/softln/java
ln -sfn /opt/software/hadoop-2.10.2 /opt/softln/hadoop
mkdir -p /opt/softln/hadoop/tmp
mkdir -p /opt/softln/hadoop/hdfs/data
mkdir -p /opt/softln/hadoop/hdfs/name
wget -O /opt/softln/hadoop/etc/hadoop/core-site.xml $BASEURL/etc/hadoop/core-site.xml
wget -O /opt/softln/hadoop/etc/hadoop/hdfs-site.xml $BASEURL/etc/hadoop/hdfs-site.xml
wget -O /opt/softln/hadoop/etc/hadoop/yarn-site.xml $BASEURL/etc/hadoop/yarn-site.xml
wget -O /opt/softln/hadoop/etc/hadoop/mapred-site.xml $BASEURL/etc/hadoop/mapred-site.xml
wget -O /opt/softln/hadoop/etc/hadoop/hadoop-env.sh $BASEURL/etc/hadoop/hadoop-env.sh
wget -O /opt/softln/hadoop/etc/hadoop/yarn-env.sh $BASEURL/etc/hadoop/yarn-env.sh
chown -R hadoop:hadoop /opt/software/hadoop-2.10.2
echo "export JAVA_HOME=/opt/softln/java" >> /home/hadoop/.bashrc
echo "export HADOOP_HOME=/opt/softln/hadoop" >> /home/hadoop/.bashrc
echo 'export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin' >> /home/hadoop/.bashrc
mkdir -p /home/hadoop/.ssh
wget -O /home/hadoop/.ssh/authorized_keys $BASEURL/.ssh/authorized_keys
wget -O /home/hadoop/.ssh/id_ecdsa.pub $BASEURL/.ssh/id_ecdsa.pub
chown -R hadoop:hadoop /home/hadoop
chmod 700 /home/hadoop/.ssh
chmod 644 /home/hadoop/.ssh/id_ecdsa.pub
chmod 600 /home/hadoop/.ssh/authorized_keys
wget -O /etc/ssh/ssh_config $BASEURL/etc/ssh/ssh_config
wget -O /etc/ssh/ssh_host_ecdsa_key $BASEURL/etc/ssh/ssh_host_ecdsa_key
wget -O /etc/ssh/ssh_host_ecdsa_key.pub $BASEURL/etc/ssh/ssh_host_ecdsa_key.pub
wget -O /etc/ssh/ssh_host_ed25519_key $BASEURL/etc/ssh/ssh_host_ed25519_key
wget -O /etc/ssh/ssh_host_ed25519_key.pub $BASEURL/etc/ssh/ssh_host_ed25519_key.pub
wget -O /etc/ssh/ssh_host_rsa_key $BASEURL/etc/ssh/ssh_host_rsa_key
wget -O /etc/ssh/ssh_host_rsa_key.pub $BASEURL/etc/ssh/ssh_host_rsa_key.pub
wget -O /etc/ssh/sshd_config $BASEURL/etc/ssh/sshd_config
chmod 600 /etc/ssh/*_key
chmod 600 /etc/ssh/*_key.pub
其中BASEURL
变量的值为 nginx 服务地址,请根据实际环境更改配置
而后可使用 dockerfile 构建从节点镜像,我的 nginx 服务地址为http://192.168.171.1,请根据自己搭建 nginx 服务的地址作出修改
FROM m.daocloud.io/docker.io/library/debian:bookworm
RUN rm -rf /etc/apt/sources.list.d/* && \
echo "deb http://mirrors.ustc.edu.cn/debian bookworm main contrib non-free non-free-firmware" > /etc/apt/sources.list && \
echo "deb http://mirrors.ustc.edu.cn/debian bookworm-updates main contrib non-free non-free-firmware" >> /etc/apt/sources.list && \
echo "deb http://mirrors.ustc.edu.cn/debian-security/ bookworm-security main contrib non-free non-free-firmware" >> /etc/apt/sources.list && \
apt update && \
apt install -y openssh-server wget procps net-tools && \
apt clean all && \
wget -O /opt/autobuild.sh http://192.168.171.1/autobuild.sh && \
sh /opt/autobuild.sh && \
rm -rf /opt/autobuild.sh
CMD /usr/sbin/sshd -f /etc/ssh/sshd_config && tail -f /dev/null
构建 docker 镜像
master:~ # docker build -t hadoop-base:v1 .
八、配置Hadoop网络
由于 IP 地址容量有限, docker 容器网络将使用 Bridge 模式 ,主机名与 IP 映射关系如下表所示
主机名 | IP |
---|---|
master | 192.168.171.130 |
slave1 | 172.18.0.2 |
slave2 | 172.18.0.3 |
docker 创建网络
master:~ # docker network create --driver bridge --subnet 172.18.0.0/24 --gateway 172.18.0.1 hadoop-net
编辑主节点的/etc/hosts
,添加以下内容
192.168.171.130 master
172.18.0.2 slave1
172.18.0.3 slave2
切换 hadoop 用户,编辑$HADOOP_HOME/etc/hadoop/slaves
master:~ # su hadoop
hadoop@master:/root> vim $HADOOP_HOME/etc/hadoop/slaves
用从节点主机名替换文件内容
slave1
slave2
切换回 root 用户,启动 docker 容器
master:~ # docker run -it -d \
> --network hadoop-net \
> --ip 172.18.0.2 \
> --add-host="master:192.168.171.130" --add-host="slave1:172.18.0.2" --add-host="slave2:172.18.0.3" \
> hadoop-base:v1
78a66f42523d9d67e7373cb4e0b499fe30f400898e3fee08b177001d67a09ac4
master:~ # docker run -it -d --network hadoop-net --ip 172.18.0.3 --add-host="master:192.168.171.130" --add-host="slave1:172.18.0.2" --add-host="slave2:172.18.0.3" hadoop-base:v1
07d217d1184cba2f823f15c75937376fb6af4ffde149945a744aafd08b2088a9
查看正在运行的容器
master:~ # docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
07d217d1184c hadoop-base:v1 "/bin/sh -c '/usr/sb…" About a minute ago Up About a minute vigilant_cerf
78a66f42523d hadoop-base:v1 "/bin/sh -c '/usr/sb…" 2 minutes ago Up 2 minutes angry_haibt
设置自动启动
master:~ # docker update --restart=always 07d217d1184c
07d217d1184c
master:~ # docker update --restart=always 78a66f42523d
78a66f42523d
九、格式化NameNode
切换到 hadoop 用户执行
hadoop@master:~> hdfs namenode -format
十、启动dfs和yarn
操作前请先切换到 hadoop 用户
启动 dfs
hadoop@master:~> start-dfs.sh
Starting namenodes on [master]
master: starting namenode, logging to /opt/software/hadoop-2.10.2/logs/hadoop-hadoop-namenode-master.out
slave1: starting datanode, logging to /opt/software/hadoop-2.10.2/logs/hadoop-hadoop-datanode-78a66f42523d.out
slave2: starting datanode, logging to /opt/software/hadoop-2.10.2/logs/hadoop-hadoop-datanode-07d217d1184c.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /opt/software/hadoop-2.10.2/logs/hadoop-hadoop-secondarynamenode-master.out
启动 yarn
hadoop@master:~> start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /opt/software/hadoop-2.10.2/logs/yarn-hadoop-resourcemanager-localhost.localdomain.out
slave2: starting nodemanager, logging to /opt/software/hadoop-2.10.2/logs/yarn-hadoop-nodemanager-07d217d1184c.out
slave1: starting nodemanager, logging to /opt/software/hadoop-2.10.2/logs/yarn-hadoop-nodemanager-78a66f42523d.out
十一、测试hdfs
生成测试文本文件
hadoop@master:~> echo "Hello Hadoop Hello World" > test.txt
hadoop@master:~> ls
bin test.txt
hdfs 创建测试目录
hadoop@master:~> hdfs dfs -mkdir /test
hadoop@master:~> hdfs dfs -ls /
Found 1 items
drwxr-xr-x - hadoop supergroup 0 2025-05-24 20:32 /test
hdfs 向测试目录上传文件
hadoop@master:~> hdfs dfs -put ./test.txt /test
hadoop@master:~> hdfs dfs -ls /test
Found 1 items
-rw-r--r-- 3 hadoop supergroup 25 2025-05-24 20:33 /test/test.txt
查看测试文件内容
hadoop@master:~> hdfs dfs -cat /test/test.txt
Hello Hadoop Hello World
十二、测试yarn
wordcount 测试
hadoop@master:~> hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.10.2.jar wordcount /test/test.txt /testout
查看运行结果
hadoop@master:~> hdfs dfs -ls /
Found 3 items
drwxr-xr-x - hadoop supergroup 0 2025-05-24 20:33 /test
drwxr-xr-x - hadoop supergroup 0 2025-05-24 20:35 /testout
drwx------ - hadoop supergroup 0 2025-05-24 20:35 /tmp
hadoop@master:~> hdfs dfs -ls /testout/
Found 2 items
-rw-r--r-- 3 hadoop supergroup 0 2025-05-24 20:35 /testout/_SUCCESS
-rw-r--r-- 3 hadoop supergroup 25 2025-05-24 20:35 /testout/part-r-00000
hadoop@master:~> hdfs dfs -cat /testout/*
Hadoop 1
Hello 2
World 1