[Hadoop学习笔记 2] Hadoop完全分布式环境部署(OpenSUSE 15.6 + Oracle JDK 8 + Docker)


本文所使用的 Linux 发行版为 OpenSUSE 15.6 ,使用 docker 部署从节点, JDK 使用 Oracle JDK 8 ,文章内容仅供作为部署学习的参考,不作为实际生产环境的标准

一、修改主机名

root 用户或 sudo 执行命令

localhost:~ # hostnamectl set-hostname master

查看修改是否成功

localhost:~ # hostname
master

二、安装docker

dockerOpenSUSE 15.6 的安装可查阅先前发布的这篇文章,本文不作过多赘述

[Docker学习笔记 1] 安装Docker(OpenSUSE 15.6)

三、添加hadoop用户和组

为了便于进行精细化的权限管理,避免 hadoop 直接以 root 用户运行,本文将添加 hadoop 用户和组,也可根据自身实际需要选择相应的用户和组

添加 hadoop 组, 以 root 用户或 sudo 执行

localhost:~ # groupadd hadoop

添加 hadoop 用户并自动创建主目录

localhost:~ # useradd -m hadoop -g hadoop

四、安装Master节点JDK

本文将使用 Oracle JDK 8 作为主节点与从节点的 JDK 可在如下地址下载

Java Archive Downloads - Java SE 8u211 and later

Linux 主机建议下载.tar.gz格式的压缩包

根据自身需求将压缩包放置到主节点合适的位置,本文将放置到 /opt/software 目录下,并在此目录解压,请根据自身实际情况与需求选择

localhost:/opt/software # ls
jdk-8u441-linux-x64.tar.gz
localhost:/opt/software # tar -zxvf jdk-8u441-linux-x64.tar.gz

其中 jdk-8u441-linux-x64.tar.gz 为本文所使用的 JDK 压缩包文件名,请根据实际文件名作出修改

解压完成后对 Java 目录创建软链接,以便于配置环境变量,切换 Java 版本仅需更改链接目标而无需更改环境变量,本文将在 /opt/softln 目录下创建软链接,此步为可选操作,可根据自身实际需求调整

localhost:/opt/software # ls
jdk1.8.0_441
localhost:/opt/software # ln -sfn /opt/software/jdk1.8.0_441 /opt/softln/java

切换 hadoop 用户配置环境变量,本文使用 vim 修改~/.bashrc文件

localhost:/opt/software # su hadoop
hadoop@master:/opt/software> cd ~
hadoop@master:~> vim ~/.bashrc

在文件末尾添加以下内容

export JAVA_HOME=/opt/softln/java
export PATH=$PATH:$JAVA_HOME/bin

其中JAVA_HOME变量的值根据实际作出调整

保存退出后使环境变量生效并验证

hadoop@master:~> source ~/.bashrc
hadoop@master:~> java -version
java version "1.8.0_441"
Java(TM) SE Runtime Environment (build 1.8.0_441-b07)
Java HotSpot(TM) 64-Bit Server VM (build 25.441-b07, mixed mode)

五、配置Master节点ssh免密登录

将用户切换为 hadoop 并进入用户主目录

localhost:~ # su hadoop
hadoop@master:/root> cd ~

创建并进入 .ssh 目录

hadoop@master:~> mkdir -p .ssh && cd .ssh

生成私钥与公钥

hadoop@master:~/.ssh> ssh-keygen -t ecdsa -N '' -f './id_ecdsa' -q
hadoop@master:~/.ssh> cat id_ecdsa.pub > authorized_keys
hadoop@master:~/.ssh> chmod 600 authorized_keys
hadoop@master:~/.ssh> chmod 700 ~/.ssh

此时执行 ssh 免密登录还会出现问询,阻断正常执行

hadoop@master:~/.ssh> ssh localhost
The authenticity of host 'localhost (::1)' can't be established.
ED25519 key fingerprint is SHA256:/iinzs1qbsR4p/Eq9PeJJ+AxgoiDi05UGLtUoKWBsco.
This key is not known by any other names.
Are you sure you want to continue connecting (yes/no/[fingerprint])?

切换 root 用户修改 /etc/ssh/ssh_config ,找到

#   StrictHostKeyChecking ask

取消注释改为

StrictHostKeyChecking no

切换回 hadoop 用户测试免密登录

hadoop@master:~/.ssh> ssh localhost
Warning: Permanently added 'localhost' (ED25519) to the list of known hosts.
Have a lot of fun...
hadoop@master:~>

六、安装Master节点Hadoop

国内镜像站或 Apache Archive 均可下载 Hadoop ,本文将安装 Hadoop 2.10.2 ,可在以下链接中下载

Apache Archive

Index of /dist/hadoop/common/hadoop-2.10.2

中国科学技术大学镜像站

Index of /apache/hadoop/common/hadoop-2.10.2/

推荐使用国内镜像站下载

将压缩包放置到主节点合适的位置,本文依然将压缩包放置到 /opt/software 目录下,并在此解压

localhost:/opt/software # ls
hadoop-2.10.2.tar.gz  jdk1.8.0_441
localhost:/opt/software # tar -zxvf hadoop-2.10.2.tar.gz

其中 hadoop-2.10.2.tar.gz 为本文所使用的压缩包文件名,请根据实际文件名修改

解压完成后创建对 Hadoop 目录创建软链接,以便于配置环境变量,本文将在 /opt/softln 目录下创建软链接,此步为可选操作

localhost:/opt/software # ls
hadoop-2.10.2  jdk1.8.0_441
localhost:/opt/software # ln -sfn /opt/software/hadoop-2.10.2 /opt/softln/hadoop

更改目录归属,使其归属于 hadoop 用户和组

localhost:/opt/software # chown -R hadoop:hadoop hadoop-2.10.2

切换到 hadoop 用户,并修改~/.bashrc文件配置环境变量

localhost:/opt/software # su hadoop
hadoop@master:/opt/software> cd ~
hadoop@master:~> vim ~/.bashrc

将文件中所配置的环境变量修改成

export JAVA_HOME=/opt/softln/java
export HADOOP_HOME=/opt/softln/hadoop
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

其中 HADOOP_HOME 变量的值根据实际作出调整

保存以后使环境变量生效

hadoop@master:~> source ~/.bashrc

创建 tmp 目录, name 目录和 data 目录,本文中将这三个目录放置到 $HADOOP_HOME 中,可根据需求灵活选择

hadoop@master:~> mkdir -p $HADOOP_HOME/tmp
hadoop@master:~> mkdir -p $HADOOP_HOME/hdfs/data
hadoop@master:~> mkdir -p $HADOOP_HOME/hdfs/name

配置 $HADOOP_HOME/etc/hadoop/core-site.xml
<configuration> 标签中加入配置

  <property>
    <name>hadoop.tmp.dir</name>
    <value>file:/opt/softln/hadoop/tmp</value>
  </property>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://master:9000</value>
  </property>

/opt/softln/hadoop/tmp 值为先前创建的 tmp 目录, hdfs://master:9000 中的 master 为先前配置的主机名,请根据实际情况修改

配置 $HADOOP_HOME/etc/hadoop/hadoop-env.sh

export JAVA_HOME=${JAVA_HOME}

改为

export JAVA_HOME=/opt/softln/java

/opt/softln/java 为先前创建的软链接,请根据实际情况修改

配置 $HADOOP_HOME/etc/hadoop/hdfs-site.xml
<configuration> 标签中加入配置

	<property>
		<name>dfs.replication</name>
		<value>3</value>
	</property>
	<property>
		<name>dfs.namenode.name.dir</name>
		<value>/opt/softln/hadoop/hdfs/name</value>
	</property>
	<property>
		<name>dfs.datanode.data.dir</name>
		<value>/opt/softln/hadoop/hdfs/data</value>
	</property>

其中 /opt/softln/hadoop/hdfs/name/opt/softln/hadoop/hdfs/data 均为上文所创建的 namedata 目录,请根据实际情况作出修改

配置 $HADOOP_HOME/etc/hadoop/mapred-site.xml
执行

hadoop@master:~> cp $HADOOP_HOME/etc/hadoop/mapred-site.xml.template $HADOOP_HOME/etc/hadoop/mapred-site.xml

<configuration> 标签中加入配置

	<property>
	  <name>mapreduce.framework.name</name>
	  <value>yarn</value>
	</property>
	<property>
	  <name>mapred.job.tracker</name>
	  <value>http://master:9001</value>
	</property>

其中 http://master:9001 中的 master 为上文配置的主机名,请根据实际情况修改

配置 $HADOOP_HOME/etc/hadoop/yarn-env.sh

JAVA_HOME=$JAVA_HOME

改为

JAVA_HOME=/opt/softln/java

/opt/softln/java 为先前创建的软链接,请根据实际情况修改

配置 $HADOOP_HOME/etc/hadoop/yarn-site.xml
<configuration> 标签中加入配置

	<property>
		<name>yarn.nodemanager.aux-services</name>
		<value>mapreduce_shuffle</value>
	</property>
	<property>
		<name>yarn.resourcemanager.hostname</name>
		<value>master</value>
	</property>

其中 master 为上文配置的主机名,请根据实际情况修改

七、构建Docker从节点

为了提升自动化构建效率,本文将通过 nginx 搭建 http 服务,以使从节点能够方便地通过 web 请求获取所需资源,请先以 root 用户执行以下脚本

#!/bin/bash

HADOOP_HOME=/opt/softln/hadoop
HADOOP_USER_HOME=/home/hadoop

rm -rf ./html
mkdir -p ./html/software
mkdir -p ./html/.ssh
mkdir -p ./html/etc/hadoop
mkdir -p ./html/etc/ssh

cp $HADOOP_USER_HOME/.ssh/authorized_keys ./html/.ssh
cp $HADOOP_USER_HOME/.ssh/id_ecdsa.pub ./html/.ssh

cp $HADOOP_HOME/etc/hadoop/core-site.xml ./html/etc/hadoop
cp $HADOOP_HOME/etc/hadoop/hadoop-env.sh ./html/etc/hadoop
cp $HADOOP_HOME/etc/hadoop/hdfs-site.xml ./html/etc/hadoop
cp $HADOOP_HOME/etc/hadoop/mapred-site.xml ./html/etc/hadoop
cp $HADOOP_HOME/etc/hadoop/yarn-env.sh ./html/etc/hadoop
cp $HADOOP_HOME/etc/hadoop/yarn-site.xml ./html/etc/hadoop

cp /etc/ssh/ssh_config ./html/etc/ssh
cp /etc/ssh/ssh_host_ecdsa_key ./html/etc/ssh
cp /etc/ssh/ssh_host_ecdsa_key.pub ./html/etc/ssh
cp /etc/ssh/ssh_host_ed25519_key ./html/etc/ssh
cp /etc/ssh/ssh_host_ed25519_key.pub ./html/etc/ssh
cp /etc/ssh/ssh_host_rsa_key ./html/etc/ssh
cp /etc/ssh/ssh_host_rsa_key.pub ./html/etc/ssh
cp /etc/ssh/sshd_config ./html/etc/ssh

其中变量HADOOP_HOME的值为 hadoop 用户所配置的HADOOP_HOME环境变量值,HADOOP_USER_HOME变量值为 hadoop 用户主目录,脚本执行完成后请将 jdk-8u441-linux-x64.tar.gzhadoop-2.10.2.tar.gz 手动复制到 ./html/software 目录中,而后将此构建出的的 html 目录部署到 nginx 中, nginx 无须部署到主节点中,部署到可被从节点访问到的主机即可

编写从节点自动配置脚本,并将此脚本放置到 nginxhtml 资源目录中,我将此脚本命名为autobuild.sh

#!/bin/bash

BASEURL=http://192.168.171.1

mkdir -p /run/sshd
chmod 0755 /run/sshd

mkdir -p /opt/software
mkdir -p /opt/softln

wget -O /opt/software/jdk-8u441-linux-x64.tar.gz $BASEURL/software/jdk-8u441-linux-x64.tar.gz
wget -O /opt/software/hadoop-2.10.2.tar.gz $BASEURL/software/hadoop-2.10.2.tar.gz

tar -zxvf /opt/software/jdk-8u441-linux-x64.tar.gz -C /opt/software/
tar -zxvf /opt/software/hadoop-2.10.2.tar.gz -C /opt/software/

rm -rf /opt/software/jdk-8u441-linux-x64.tar.gz
rm -rf /opt/software/hadoop-2.10.2.tar.gz

groupadd hadoop
useradd -m hadoop -g hadoop -s /bin/bash

ln -sfn /opt/software/jdk1.8.0_441 /opt/softln/java
ln -sfn /opt/software/hadoop-2.10.2 /opt/softln/hadoop

mkdir -p /opt/softln/hadoop/tmp
mkdir -p /opt/softln/hadoop/hdfs/data
mkdir -p /opt/softln/hadoop/hdfs/name

wget -O /opt/softln/hadoop/etc/hadoop/core-site.xml $BASEURL/etc/hadoop/core-site.xml
wget -O /opt/softln/hadoop/etc/hadoop/hdfs-site.xml $BASEURL/etc/hadoop/hdfs-site.xml
wget -O /opt/softln/hadoop/etc/hadoop/yarn-site.xml $BASEURL/etc/hadoop/yarn-site.xml
wget -O /opt/softln/hadoop/etc/hadoop/mapred-site.xml $BASEURL/etc/hadoop/mapred-site.xml
wget -O /opt/softln/hadoop/etc/hadoop/hadoop-env.sh $BASEURL/etc/hadoop/hadoop-env.sh
wget -O /opt/softln/hadoop/etc/hadoop/yarn-env.sh $BASEURL/etc/hadoop/yarn-env.sh

chown -R hadoop:hadoop /opt/software/hadoop-2.10.2

echo "export JAVA_HOME=/opt/softln/java" >> /home/hadoop/.bashrc
echo "export HADOOP_HOME=/opt/softln/hadoop" >> /home/hadoop/.bashrc
echo 'export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin' >> /home/hadoop/.bashrc

mkdir -p /home/hadoop/.ssh

wget -O /home/hadoop/.ssh/authorized_keys $BASEURL/.ssh/authorized_keys
wget -O /home/hadoop/.ssh/id_ecdsa.pub $BASEURL/.ssh/id_ecdsa.pub

chown -R hadoop:hadoop /home/hadoop
chmod 700 /home/hadoop/.ssh
chmod 644 /home/hadoop/.ssh/id_ecdsa.pub
chmod 600 /home/hadoop/.ssh/authorized_keys

wget -O /etc/ssh/ssh_config $BASEURL/etc/ssh/ssh_config
wget -O /etc/ssh/ssh_host_ecdsa_key $BASEURL/etc/ssh/ssh_host_ecdsa_key
wget -O /etc/ssh/ssh_host_ecdsa_key.pub $BASEURL/etc/ssh/ssh_host_ecdsa_key.pub
wget -O /etc/ssh/ssh_host_ed25519_key $BASEURL/etc/ssh/ssh_host_ed25519_key
wget -O /etc/ssh/ssh_host_ed25519_key.pub $BASEURL/etc/ssh/ssh_host_ed25519_key.pub
wget -O /etc/ssh/ssh_host_rsa_key $BASEURL/etc/ssh/ssh_host_rsa_key
wget -O /etc/ssh/ssh_host_rsa_key.pub $BASEURL/etc/ssh/ssh_host_rsa_key.pub
wget -O /etc/ssh/sshd_config $BASEURL/etc/ssh/sshd_config

chmod 600 /etc/ssh/*_key
chmod 600 /etc/ssh/*_key.pub

其中BASEURL变量的值为 nginx 服务地址,请根据实际环境更改配置

而后可使用 dockerfile 构建从节点镜像,我的 nginx 服务地址为http://192.168.171.1,请根据自己搭建 nginx 服务的地址作出修改

FROM m.daocloud.io/docker.io/library/debian:bookworm

RUN rm -rf /etc/apt/sources.list.d/* && \
	echo "deb http://mirrors.ustc.edu.cn/debian bookworm main contrib non-free non-free-firmware" > /etc/apt/sources.list && \
	echo "deb http://mirrors.ustc.edu.cn/debian bookworm-updates main contrib non-free non-free-firmware" >> /etc/apt/sources.list && \
	echo "deb http://mirrors.ustc.edu.cn/debian-security/ bookworm-security main contrib non-free non-free-firmware" >> /etc/apt/sources.list && \
	apt update && \
	apt install -y openssh-server wget procps net-tools && \
	apt clean all && \
	wget -O /opt/autobuild.sh http://192.168.171.1/autobuild.sh && \
	sh /opt/autobuild.sh && \
	rm -rf /opt/autobuild.sh
	
CMD /usr/sbin/sshd -f /etc/ssh/sshd_config && tail -f /dev/null

构建 docker 镜像

master:~ # docker build -t hadoop-base:v1 .

八、配置Hadoop网络

由于 IP 地址容量有限, docker 容器网络将使用 Bridge 模式 ,主机名与 IP 映射关系如下表所示

主机名IP
master192.168.171.130
slave1172.18.0.2
slave2172.18.0.3

docker 创建网络

master:~ # docker network create --driver bridge --subnet 172.18.0.0/24 --gateway 172.18.0.1 hadoop-net

编辑主节点的/etc/hosts,添加以下内容

192.168.171.130 master
172.18.0.2 slave1
172.18.0.3 slave2

切换 hadoop 用户,编辑$HADOOP_HOME/etc/hadoop/slaves

master:~ # su hadoop
hadoop@master:/root> vim $HADOOP_HOME/etc/hadoop/slaves

用从节点主机名替换文件内容

slave1
slave2

切换回 root 用户,启动 docker 容器

master:~ # docker run -it -d \
> --network hadoop-net \
> --ip 172.18.0.2 \
> --add-host="master:192.168.171.130" --add-host="slave1:172.18.0.2" --add-host="slave2:172.18.0.3" \
> hadoop-base:v1
78a66f42523d9d67e7373cb4e0b499fe30f400898e3fee08b177001d67a09ac4
master:~ # docker run -it -d --network hadoop-net --ip 172.18.0.3 --add-host="master:192.168.171.130" --add-host="slave1:172.18.0.2" --add-host="slave2:172.18.0.3" hadoop-base:v1
07d217d1184cba2f823f15c75937376fb6af4ffde149945a744aafd08b2088a9

查看正在运行的容器

master:~ # docker ps
CONTAINER ID   IMAGE            COMMAND                   CREATED              STATUS              PORTS     NAMES
07d217d1184c   hadoop-base:v1   "/bin/sh -c '/usr/sb…"   About a minute ago   Up About a minute             vigilant_cerf
78a66f42523d   hadoop-base:v1   "/bin/sh -c '/usr/sb…"   2 minutes ago        Up 2 minutes                  angry_haibt

设置自动启动

master:~ # docker update --restart=always 07d217d1184c
07d217d1184c
master:~ # docker update --restart=always 78a66f42523d
78a66f42523d

九、格式化NameNode

切换到 hadoop 用户执行

hadoop@master:~> hdfs namenode -format

十、启动dfs和yarn

操作前请先切换到 hadoop 用户

启动 dfs

hadoop@master:~> start-dfs.sh
Starting namenodes on [master]
master: starting namenode, logging to /opt/software/hadoop-2.10.2/logs/hadoop-hadoop-namenode-master.out
slave1: starting datanode, logging to /opt/software/hadoop-2.10.2/logs/hadoop-hadoop-datanode-78a66f42523d.out
slave2: starting datanode, logging to /opt/software/hadoop-2.10.2/logs/hadoop-hadoop-datanode-07d217d1184c.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /opt/software/hadoop-2.10.2/logs/hadoop-hadoop-secondarynamenode-master.out

启动 yarn

hadoop@master:~> start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /opt/software/hadoop-2.10.2/logs/yarn-hadoop-resourcemanager-localhost.localdomain.out
slave2: starting nodemanager, logging to /opt/software/hadoop-2.10.2/logs/yarn-hadoop-nodemanager-07d217d1184c.out
slave1: starting nodemanager, logging to /opt/software/hadoop-2.10.2/logs/yarn-hadoop-nodemanager-78a66f42523d.out

十一、测试hdfs

生成测试文本文件

hadoop@master:~> echo "Hello Hadoop Hello World" > test.txt
hadoop@master:~> ls
bin  test.txt

hdfs 创建测试目录

hadoop@master:~> hdfs dfs -mkdir /test
hadoop@master:~> hdfs dfs -ls /
Found 1 items
drwxr-xr-x   - hadoop supergroup          0 2025-05-24 20:32 /test

hdfs 向测试目录上传文件

hadoop@master:~> hdfs dfs -put ./test.txt /test
hadoop@master:~> hdfs dfs -ls /test
Found 1 items
-rw-r--r--   3 hadoop supergroup         25 2025-05-24 20:33 /test/test.txt

查看测试文件内容

hadoop@master:~> hdfs dfs -cat /test/test.txt
Hello Hadoop Hello World

十二、测试yarn

wordcount 测试

hadoop@master:~> hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.10.2.jar wordcount /test/test.txt /testout

查看运行结果

hadoop@master:~> hdfs dfs -ls /
Found 3 items
drwxr-xr-x   - hadoop supergroup          0 2025-05-24 20:33 /test
drwxr-xr-x   - hadoop supergroup          0 2025-05-24 20:35 /testout
drwx------   - hadoop supergroup          0 2025-05-24 20:35 /tmp
hadoop@master:~> hdfs dfs -ls /testout/
Found 2 items
-rw-r--r--   3 hadoop supergroup          0 2025-05-24 20:35 /testout/_SUCCESS
-rw-r--r--   3 hadoop supergroup         25 2025-05-24 20:35 /testout/part-r-00000
hadoop@master:~> hdfs dfs -cat /testout/*
Hadoop  1
Hello   2
World   1
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值