之前使用镜像singularities/hadoop:2.8安装过hdfs,但鉴于这个镜像已经很久没更新了,今天就手动安装一下hadoop,并构建成镜像,方便以后使用。
参考文档:docker构建hadoop镜像并运行_静听枫语的博客-CSDN博客_hadoop镜像
一、准备centos镜像
拉取镜像
docker pull centos:latest
下载并上传hadoop和jdk安装包至服务器,此处使用的版本如下
启动centos镜像
docker run -itd --name hadoop -v /hadoop-3.3.3.tar.gz:/hadoop-3.3.3.tar.gz -v /jdk-8u291-linux-x64.tar.gz:/jdk-8u291-linux-x64.tar.gz centos:latest /bin/bash
二、安装sshd
hadoop节点间通过ssh操作,默认镜像中并不包含sshd服务,因为需要安装.
yum update
上述指令若报错,可能因为centos8停止服务,造成yum源安装失败,使用如下操作可解决,若未报错跳过这一项操作:
1、进入yum的repos目录
cd /etc/yum.repos.d/
2、修改所有的CentOS文件内容
sed -i 's/mirrorlist/#mirrorlist/g' /etc/yum.repos.d/CentOS-*
sed -i 's|#baseurl=http://mirror.centos.org|baseurl=http://vault.centos.org|g' /etc/yum.repos.d/CentOS-*
3、更新yum源为阿里镜像
curl -O /etc/yum.repos.d/CentOS-Base.repo https://mirrors.aliyun.com/repo/Centos-vault-8.5.2111.repo
yum clean all
yum makecache
4、yum安装测试是否可以yum安装
yum install wget –y
5、更新yum
yum update
安装ssd
yum install -y openssl openssh-server
yum install openssh*
一路回车,创建密钥并启动ssh服务
ssh-keygen -t rsa
ssh-keygen -t dsa
ssh-keygen -t ecdsa
ssh-keygen -t ed25519
cp ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys
修改sshd的配置文件
vi /etc/ssh/sshd_config
修改部分为
### 原内容
HostKey /etc/ssh/ssh_host_rsa_key
HostKey /etc/ssh/ssh_host_ecdsa_key
HostKey /etc/ssh/ssh_host_ed25519_key
### 修改为
HostKey /root/.ssh/id_rsa
HostKey /root/.ssh/id_ecdsa
HostKey /root/.ssh/id_ed25519
HostKey /root/.ssh/id_dsa
配置允许远程登陆
vi /etc/pam.d/sshd
# 使用#注释掉此行
# account required pam_nologin.so
启动sshd服务并查看状态
/usr/sbin/sshd
ps -ef | grep sshd
启动成功
root 311 1 0 06:43 ? 00:00:00 /usr/sbin/sshd
root 332 1 0 06:44 pts/0 00:00:00 grep --color=auto sshd
安装net-tools
yum install net-tools
三、解压安装包并修改配置文件
解压hadoop和jdk包,并移动都/usr/local/下
tar -zxvf hadoop-3.3.3.tar.gz
tar -zxvf jdk-8u291-linux-x64.tar.gz
mv /hadoop-3.3.3/ /usr/local/
mv /jdk1.8.0_291/ /usr/local/
root用户配置环境变量
vi ~/.bashrc
添加内容如下
# User specific environment
if ! [[ "$PATH" =~ "$HOME/.local/bin:$HOME/bin:" ]]
then
PATH="$HOME/.local/bin:$HOME/bin:$PATH"
fi
export JAVA_HOME=/usr/local/jdk1.8.0_291
export CLASSPATH=$JAVA_HOME/lib
export PATH=$PATH:$JAVA_HOME/bin
# hadoop env
export HADOOP_HOME=/usr/local/hadoop-3.3.3
export HADOOP_COMMON_HOME=$HADOOP_HOME
export PATH=$PATH:$HADOOP_HOME/bin
PATH=$PATH:$HOME/bin
export PATH
# Uncomment the following line if you don't like systemctl's auto-paging feature:
# export SYSTEMD_PAGER=
更新环境变量
source ~/.bash_profile
创建tmp文件夹和logs文件夹
mkdir /usr/local/hadoop-3.3.3/tmp
mkdir /usr/local/hadoop-3.3.3/logs
修改hadoop配置文件core-site.xml
vi /usr/local/hadoop-3.3.3/etc/hadoop/core-site.xml
<configuration>标签内添加如下
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:8020</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop-3.3.3/tmp</value>
</property>
</configuration>
修改hadoop配置文件hdfs-site.xml
vi /usr/local/hadoop-3.3.3/etc/hadoop/hdfs-site.xml
<configuration>标签内添加如下
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
</property>
</configuration>
修改hadoop配置文件hadoop-env.sh
vi /usr/local/hadoop-3.3.3/etc/hadoop/hadoop-env.sh
添加内容如下
export JAVA_HOME=${JAVA_HOME}
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/etc/hadoop"}
for f in $HADOOP_HOME/contrib/capacity-scheduler/*.jar; do
if [ "$HADOOP_CLASSPATH" ]; then
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$f
else
export HADOOP_CLASSPATH=$f
fi
done
export HADOOP_HEAPSIZE=1024
export HADOOP_NAMENODE_INIT_HEAPSIZE=1024
export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true"
export HDFS_NAMENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_NAMENODE_OPTS"
export HDFS_DATANODE_OPTS="-Dhadoop.security.logger=ERROR,RFAS $HADOOP_DATANODE_OPTS"
export HDFS_SECONDARYNAMENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_SECONDARYNAMENODE_OPTS"
export HADOOP_NFS3_OPTS="$HADOOP_NFS3_OPTS"
export HADOOP_PORTMAP_OPTS="-Xmx1024m $HADOOP_PORTMAP_OPTS"
export HADOOP_CLIENT_OPTS="-Xmx1024m $HADOOP_CLIENT_OPTS"
export HDFS_DATANODE_SECURE_USER=${HDFS_DATANODE_SECURE_USER}
export HADOOP_SECURE_LOG_DIR=${HADOOP_LOG_DIR}/${HADOOP_HDFS_USER}
export HADOOP_PID_DIR=${HADOOP_PID_DIR}
export HADOOP_SECURE_DN_PID_DIR=${HADOOP_PID_DIR}
export HADOOP_IDENT_STRING=hadoop
export HADOOP_LOG_DIR=/usr/local/hadoop-3.3.3/logs
分配权限
chmod +x /usr/local/hadoop-3.3.3/etc/hadoop/hadoop-env.sh
在start-dfs.sh、stop-dfs.sh文件中添加启动用户
vi /usr/local/hadoop-3.3.3/sbin/start-dsf.sh
添加内容如下
HDFS_DATANODE_USER=root
HDFS_DATANODE_SECURE_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
HDFS_JOURNALNODE_USER=root
HDFS_ZKFC_USER=root
stop-dfs.sh同理
三、格式化namenode
hdfs namenode -format
四、创建启动脚本
编辑启动脚本
vi /etc/bootstrap.sh
启动脚本内容如下
source ~/.bash_profile
source /etc/profile
: ${HADOOP_PREFIX:=/usr/local/hadoop-3.3.3}
$HADOOP_PREFIX/etc/hadoop/hadoop-env.sh
/usr/sbin/sshd
$HADOOP_PREFIX/sbin/start-dfs.sh
/bin/bash
分配权限
chmod +x /etc/bootstrap.sh
退出容器
exit
五、生成hadoop镜像
生成镜像包
docker export hadoop > hadoop.tar
导入镜像
docker import hadoop.tar hadoop:3.3.3
六、启动hdfs
配置docker-compose.yml,添加内容如下(配置文件、临时文件夹和日志文件夹需要映射的可添加)
hadoop3:
container_name: hadoop3
image: hadoop:3.3.3
command: /etc/bootstrap.sh
ports:
- "8020:8020"
- "9870:9870"
- "8088:8088"
- "8040:8040"
- "8042:8042"
- "49707:49707"
- "50010:50010"
- "50075:50075"
- "50090:50090"
tty: true
创建hadoop容器
docker-compose -f /work/docker-compose.yml up -d
查看hdfs是否启动成功
ps aux | grep hdfs
hdfs的三个节点都正常启动说明启动成功
动态访问页面:ip:9870