Docker手工安装Hadoop集群

安装Hadoop集群一般来讲比较困难,我们会采用CDH安装等集成环境,不过在安装这些集成环境时,比较臃肿,安装也很困难。我们尝试使用docker进行安装集群,从0开始,根据业务要求进行定制。

只要你认真细致,实际上安装hadoop集群也不是很难哦。

准备docker环境

在这个dockerfile里面,我们先安装jdk1.8,免费后面要继续安装

同事,生成秘钥文件,为了将来机器之间免密访问

# 生成的新镜像以centos镜像为基础
FROM centos
# 指定作者信息
MAINTAINER by Rudolfyan
# 安装openssh-server
RUN yum -y install openssh-server

RUN mkdir /var/run/sshd
RUN ssh-keygen -t rsa -f /etc/ssh/ssh_host_rsa_key
RUN ssh-keygen -t dsa -f /etc/ssh/ssh_host_dsa_key

# 指定root密码
RUN /bin/echo 'root:123456'|chpasswd
RUN /bin/sed -i 's/.*session.*required.*pam_loginuid.so.*/session optional pam_loginuid.so/g' /etc/pam.d/sshd
RUN /bin/echo -e "LANG=\"en_US.UTF-8\"" > /etc/default/local

RUN yum -y install java-1.8.0-openjdk.x86_64

EXPOSE 22
CMD /usr/sbin/sshd -D

准备hadoop环境

我们下载hadoop3.2.1版本,通过URL直接下载,放到和dockerfile相同层次的目录下。


[root@ora-mssql hadoop]# curl https://archive.apache.org/dist/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz -o hadoop-3.2.1.tar.gz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  342M  100  342M    0     0  2583k      0  0:02:15  0:02:15 --:--:-- 4076k
[root@ora-mssql hadoop]# ls
Dockerfile  hadoop-3.2.1.tar.gz

 生成第二个镜像,基础镜像为第一节生成的镜像。

mkdir hadoopqun
cat <<EOF >hadoopqun/Dockerfile
FROM rudolfyan/centosssh:1.0
MAINTAINER will

ENV REFRESHED_AT 2021


ADD hadoop-3.2.1.tar.gz /usr/local/nlp/
ENV HADOOP_HOME /usr/local/nlp/hadoop-3.2.1
ENV PATH $HADOOP_HOME/bin:$PATH

RUN yum install -y which sudo

EOF
docker build -t rudolfyan/hadoopqun:1.0 . -f hadoopqun/Dockerfile

启动三个hadoop qun的docker

docker run --name dkhmaster -p 10022:22 -d rudolfyan/hadoopqun:1.0
docker run --name dkhslave1 -p 10022:23 -d rudolfyan/hadoopqun:1.0
docker run --name dkhslave2 -p 10022:24 -d rudolfyan/hadoopqun:1.0

[root@ora-mssql ~]# docker ps -a
CONTAINER ID        IMAGE                             COMMAND                  CREATED             STATUS                    PORTS                                                    NAMES
9f92ba0ee1a4        rudolfyan/hadoopqun:1.0           "/bin/sh -c '/usr/..."   About an hour ago   Up About an hour          0.0.0.0:10024->22/tcp                                    dkhslave2
bfd5a858efb1        rudolfyan/hadoopqun:1.0           "/bin/sh -c '/usr/..."   About an hour ago   Up About an hour          0.0.0.0:10023->22/tcp                                    dkhslave1
5e35e92b76f0        rudolfyan/hadoopqun:1.0           "/bin/sh -c '/usr/..."   About an hour ago   Up About an hour          0.0.0.0:10022->22/tcp  

在master的机器上生成秘钥,这样可以访问其他的slave机器免密,然后能够直接控制,这一步是比不可少的。

root@5e35e92b76f0 /]# ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:EBkfY93CuuzdHLl6T50vFPTkS0SReirUWg/f+S0kpUA root@5e35e92b76f0
The key's randomart image is:
+---[RSA 3072]----+
|      oo+o .  .oo|
|      .+ oE . .o.|
|      . .o ...o+ |
|       .. .. =ooo|
|       .S...o+B.+|
|        o  o=.o*o|
|       . . o.*..+|
|        . . =.o +|
|          .o ..o.|
+----[SHA256]-----+

其他二台机器执行同样的操作,然后把这三个公钥合并到一个文件当中,再复制到三台机器,达到免密的效果。

具体操作见下

[root@ora-mssql ~]# docker cp dkhmaster:/root/.ssh/id_rsa.pub  master1.key
[root@ora-mssql ~]# docker cp dkhslave1:/root/.ssh/id_rsa.pub  slave1.key
[root@ora-mssql ~]# docker cp dkhslave2:/root/.ssh/id_rsa.pub  slave2.key
[root@ora-mssql ~]# cat master1.key slave1.key slave2.key > authorized_keys

[root@ora-mssql ~]# docker cp authorized_keys dkhmaster:/root/.ssh/authorized_keys
[root@ora-mssql ~]# docker cp authorized_keys dkhslave1:/root/.ssh/authorized_keys
[root@ora-mssql ~]# docker cp authorized_keys dkhslave2:/root/.ssh/authorized_keys

将三台容器的IP地址的到,并将这些IP写到三台容器的/etc/hosts文件当中去

[root@ora-mssql ~]# docker inspect dkhmaster|grep IPA
            "SecondaryIPAddresses": null,
            "IPAddress": "172.17.0.3",
                    "IPAMConfig": null,
                    "IPAddress": "172.17.0.3",
[root@ora-mssql ~]# docker inspect dkhslave1|grep IPA
            "SecondaryIPAddresses": null,
            "IPAddress": "172.17.0.4",
                    "IPAMConfig": null,
                    "IPAddress": "172.17.0.4",
[root@ora-mssql ~]# docker inspect dkhslave2|grep IPA
            "SecondaryIPAddresses": null,
            "IPAddress": "172.17.0.5",
                    "IPAMConfig": null,
                    "IPAddress": "172.17.0.5",

写入文件并保存到docker的./etc/hosts,看起来如下:

127.0.0.1       localhost
::1     localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
172.17.0.3      5e35e92b76f0
172.17.0.3 master
172.17.0.4 slave1
172.17.0.5 slave2

在每一台docker上安装openssh-clients,然后要用ssh登陆。

[root@9f92ba0ee1a4 /]# yum -y install openssh-clients
Failed to set locale, defaulting to C.UTF-8
Last metadata expiration check: 0:03:45 ago on Mon Nov 15 05:54:45 2021.
Dependencies resolved.
===================================================================================================================================================================
 Package                                  Architecture                    Version                                            Repository                       Size
===================================================================================================================================================================
Installing:
 openssh-clients                          x86_64                          8.0p1-6.el8_4.2                                    baseos                          667 k
Installing dependencies:
 libedit                                  x86_64                          3.1-23.20170329cvs.el8                             baseos                          102 k

Transaction Summary
===================================================================================================================================================================
Install  2 Packages

Total download size: 769 k
Installed size: 2.7 M
Downloading Packages:
[MIRROR] libedit-3.1-23.20170329cvs.el8.x86_64.rpm: Status code: 403 for http://mirrors.tuna.tsinghua.edu.cn/centos/8.4.2105/BaseOS/x86_64/os/Packages/libedit-3.1-23.20170329cvs.el8.x86_64.rpm (IP: 101.6.15.130)
(1/2): libedit-3.1-23.20170329cvs.el8.x86_64.rpm                                                                                   253 kB/s | 102 kB     00:00
[MIRROR] openssh-clients-8.0p1-6.el8_4.2.x86_64.rpm: Status code: 403 for http://mirrors.tuna.tsinghua.edu.cn/centos/8.4.2105/BaseOS/x86_64/os/Packages/openssh-clients-8.0p1-6.el8_4.2.x86_64.rpm (IP: 101.6.15.130)
(2/2): openssh-clients-8.0p1-6.el8_4.2.x86_64.rpm                                                                                  537 kB/s | 667 kB     00:01
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
Total                                                                                                                              448 kB/s | 769 kB     00:01
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
  Preparing        :                                                                                                                                           1/1
  Installing       : libedit-3.1-23.20170329cvs.el8.x86_64                                                                                                     1/2
  Installing       : openssh-clients-8.0p1-6.el8_4.2.x86_64                                                                                                    2/2
  Running scriptlet: openssh-clients-8.0p1-6.el8_4.2.x86_64                                                                                                    2/2
  Verifying        : libedit-3.1-23.20170329cvs.el8.x86_64                                                                                                     1/2
  Verifying        : openssh-clients-8.0p1-6.el8_4.2.x86_64                                                                                                    2/2

Installed:
  libedit-3.1-23.20170329cvs.el8.x86_64                                           openssh-clients-8.0p1-6.el8_4.2.x86_64

Complete!

为了将来使用方便,在master机器上安装ansible部署工具,可以做更多的同步工作。

# yum -y install ansible
# yum -y install epel-release
# yum -y install ansible
#  cat<<EOF >> /etc/ansible/hosts
[hadoop]
172.17.0.3
172.17.0.4
172.17.0.5
[slave]
172.17.0.4
172.17.0.5
EOF

进入master,然后修改core-site.xml,位置在/usr/local/hadoop_3.2.1/etc/hadoop,增加这一段。

<configuration>
                <property>
                        <name>fs.defaultFS</name>
                        <value>hdfs://master:9000</value>
                </property>
                <property>
                        <name>hadoop.tmp.dir</name>
                        <value>/usr/local/hadoop-3.2.1/data/tmp</value>
                </property>
</configuration>

修改yarn-site.xml

<configuration>

<!-- Site specific YARN configuration properties -->
                <property>
                <name>yarn.resourcemanager.hostname</name>
                <value>slave1</value>
        </property>
        <property>
                <name>yarn.nodemanager.aux-services</name>
                <value>mapreduce_shuffle</value>
        </property>
         <property>
                 <name>yarn.resourcemanager.webapp.address</name>
                 <value>master:8088</value>
         </property
</configuration>

修改hdfs-site.xml,如下

[root@5e35e92b76f0 hadoop]# cat hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->

<configuration>
                <property>
                        <name>dfs.replication</name>
                        <value>3</value>
                </property>
                <property>
                        <name>dfs.namenode.secondary.http-address</name>
                        <value>slave2:50090</value>
                </property
</configuration>

增加slaves文件,指明slave节点

[root@5e35e92b76f0 hadoop]# cat slaves
master
slave1
slave2

接下来修改文件 maprd-site.xml,这个是map-reduce的配置文件


<configuration>
    <property>
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
        </property>
</configuration>
~

修改hdfs,mapreduce的JAVA_HOME环境变量和用户,两个文件分别是hdfs-env.sh, maprd-env.sh

export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.312.b07-1.el8_4.x86_64/jre
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root

拷贝配置文件到各台slave机器

[root@5e35e92b76f0 hadoop]# ansible slave -m copy -a "src=/usr/local/hadoop-3.2.1/etc/hadoop/core-site.xml dest=/usr/local/hadoop-3.2.1/etc/hadoop"
172.17.0.4 | SUCCESS => {
    "ansible_facts": {
        "discovered_interpreter_python": "/usr/libexec/platform-python"
    },
    "changed": false,
    "checksum": "e241df63cbf84b8384a7f6fc7e9162bee80b5422",
    "dest": "/usr/local/hadoop-3.2.1/etc/hadoop/core-site.xml",
    "gid": 1001,
    "group": "1001",
    "mode": "0644",
    "owner": "1001",
    "path": "/usr/local/hadoop-3.2.1/etc/hadoop/core-site.xml",
    "size": 1116,
    "state": "file",
    "uid": 1001
}
172.17.0.5 | SUCCESS => {
    "ansible_facts": {
        "discovered_interpreter_python": "/usr/libexec/platform-python"
    },
    "changed": false,
    "checksum": "e241df63cbf84b8384a7f6fc7e9162bee80b5422",
    "dest": "/usr/local/hadoop-3.2.1/etc/hadoop/core-site.xml",
    "gid": 1001,
    "group": "1001",
    "mode": "0644",
    "owner": "1001",
    "path": "/usr/local/hadoop-3.2.1/etc/hadoop/core-site.xml",
    "size": 1116,
    "state": "file",
    "uid": 1001
}

执行ansible,强行复制文件并覆盖

[root@5e35e92b76f0 hadoop]# ansible slave -m copy -a "src=/usr/local/hadoop-3.2.1/etc/hadoop/core-site.xml dest=/usr/local/hadoop-3.2.1/etc/hadoop"
172.17.0.4 | SUCCESS => {
    "ansible_facts": {
        "discovered_interpreter_python": "/usr/libexec/platform-python"
    },
    "changed": false,
    "checksum": "e241df63cbf84b8384a7f6fc7e9162bee80b5422",
    "dest": "/usr/local/hadoop-3.2.1/etc/hadoop/core-site.xml",
    "gid": 1001,
    "group": "1001",
    "mode": "0644",
    "owner": "1001",
    "path": "/usr/local/hadoop-3.2.1/etc/hadoop/core-site.xml",
    "size": 1116,
    "state": "file",
    "uid": 1001
}
172.17.0.5 | SUCCESS => {
    "ansible_facts": {
        "discovered_interpreter_python": "/usr/libexec/platform-python"
    },
    "changed": false,
    "checksum": "e241df63cbf84b8384a7f6fc7e9162bee80b5422",
    "dest": "/usr/local/hadoop-3.2.1/etc/hadoop/core-site.xml",
    "gid": 1001,
    "group": "1001",
    "mode": "0644",
    "owner": "1001",
    "path": "/usr/local/hadoop-3.2.1/etc/hadoop/core-site.xml",
    "size": 1116,
    "state": "file",
    "uid": 1001
}

必须初始化hdfs,在master上执行 datanode,namenode format

[root@5e35e92b76f0 hadoop]# hdfs datanode -format
[root@5e35e92b76f0 hadoop]# hdfs namenode -format
2021-11-15 08:24:14,758 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = 5e35e92b76f0/172.17.0.3
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 3.2.1
......
399 bytes saved in 0 seconds .
2021-11-15 08:24:15,900 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
2021-11-15 08:24:15,905 INFO namenode.FSImage: FSImageSaver clean checkpoint: txid=0 when meet shutdown.
2021-11-15 08:24:15,905 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at 5e35e92b76f0/172.17.0.3
************************************************************/

 授予正确的权限

因为包里面解压默认的用户和组是1001,改成root:root

ansible hadoop -m file -a "path=/usr/local/hadoop-3.3.1 state=directory mode=0644 owner=root group=root"

由于启动docker,master的9000端口没有映射到外面,因此需要修改映射端口,实际上可以直接在启动的时候加上9000映射。目前我们通过修改

root@ora-mssql ~]# vi  /var/lib/docker/containers/$containerid/config.v2.json
[root@ora-mssql ~]# vi  /var/lib/docker/containers/$containerid/hostconfig.json

注意: 在主机上直接访问http://172.17.0.5:50090便可访问了

编写二个启动文件,编辑增加自动增加域名解析的文件

# cat /opt/addhosts
cat <<EOF >>/etc/hosts
172.17.0.2 master
172.17.0.3 slave1
172.17.0.4 slave2
EOF
[root@ora-mssql ~]# cat start-hadoop
docker start dkhmaster
docker start dkhslave1
docker start dkhslave2
sleep 2

docker exec -it -d dkhmaster sh  /opt/addhosts
docker exec -it -d dkhslave1  sh /opt/addhosts
docker exec -it -d dkhslave2 sh /opt/addhosts
[root@ora-mssql ~]# cat stop-hadoop
docker stop dkhmaster
docker stop dkhslave1
docker stop dkhslave2

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

老骥又出发

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值