在Linux虚拟机中使用docker搭建hadoop分布式集群,用java操作hdfs(一)

在Linux虚拟机中使用docker搭建hadoop分布式集群,用java操作hdfs(一)

Docker安装

windows下限制较多,docker的linux模式与VM的虚拟服务会冲突,每次使用需要重新开关服务,重启电脑,所以是在windows上的虚拟主机(VM)中实现,此教程适合使用过linux系统的人员

docker安装:https://www.runoob.com/docker/centos-docker-install.html

不同的系统只要安装好docker后,docker的操作都是一样的

docker的基本使用

启动docker服务

service docker start

docker刚安装好是没有镜像的,我用的ubuntu来搭建hadoop集群,也可以用其他linux发行版

docker pull ubuntu //拉取也就是在docker中下载ubuntu镜像,默认是最新的等待下载完成

docker images //查看镜像,可以看到刚拉取的ubuntu镜像

root@linux:~# docker images
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
ubuntu              latest              72300a873c2c        8 weeks ago         64.2MB

ok,因为hadoop集群的通信是需要局域网的,所以我们要创建hadoop的专用网络,用桥接模式

docker network create --driver=bridge hadoop

创建好之后查看创建的网络

docker network ls

root@liunx:~# docker network ls
NETWORK ID          NAME                DRIVER              SCOPE
a520acd0f5eb        bridge              bridge              local
62cb2d841382        hadoop              bridge              local
b7fef15ea068        host                host                local
ba108fc8779a        none                null                local

然后用ubuntu镜像运行一个容器

docker run -it ubuntu /bin/bash //i和t分别是以交互模式和终端运行

root@linux:~# docker run -it ubuntu /bin/bash
root@b434aef5bc5f:/# 

exit可以退出容器,ctrl+p+q切换shell到主机而不退出容器

docker ps //显示运行的容器

docker start 容器名/容器ID //运行一个存在的容器 stop是停止运行的容器

docker attach 容器名/容器ID //进入运行的容器

上面的命令之后会经常使用

ubuntu镜像的配置

做完上面两部分之后,就是配置Ubuntu了,因为初始拉取的ubuntu镜像只有最基础的内核和文件系统,缺少网络工具,jdk,ssh,vim而这些是配置集群不可或缺的部分

root@b434aef5bc5f:/#

1.换源,将下面的内容添加到/etc/apt/sources.list中
deb http://mirrors.aliyun.com/ubuntu/ xenial main
deb-src http://mirrors.aliyun.com/ubuntu/ xenial main

deb http://mirrors.aliyun.com/ubuntu/ xenial-updates main
deb-src http://mirrors.aliyun.com/ubuntu/ xenial-updates main

deb http://mirrors.aliyun.com/ubuntu/ xenial universe
deb-src http://mirrors.aliyun.com/ubuntu/ xenial universe
deb http://mirrors.aliyun.com/ubuntu/ xenial-updates universe
deb-src http://mirrors.aliyun.com/ubuntu/ xenial-updates universe

deb http://mirrors.aliyun.com/ubuntu/ xenial-security main
deb-src http://mirrors.aliyun.com/ubuntu/ xenial-security main
deb http://mirrors.aliyun.com/ubuntu/ xenial-security universe
deb-src http://mirrors.aliyun.com/ubuntu/ xenial-security universe
2.使用apt update来更新源
apt install net-tools
apt install inetutils-ping//安装ping工具,不能直接apt-get install ping,因为在inetutils里面
apt install openjdk-8-jdk
apt install vim
3.安装ssh

ssh是一种应用层的安全远程登录协议,它会生成一对密匙,分别是私匙和公匙,详细解释可以去看ssh官方文档

apt-get install openssh-server

apt-get install openssh-client

使用ssh-keygen -t rsa -P ''来创建无密码登录

root@b434aef5bc5f:/# ssh-keygen -t rsa -P '' 
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa): 
/root/.ssh/id_rsa already exists.
Overwrite (y/n)? y
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:p0UBShQiq/AxMBv+HxEku3a6WuwPkVu01E1HNck4kA8 root@cdbc600b933b
The key's randomart image is:
+---[RSA 2048]----+
|+ ..o++..+=o+o.  |
|.= oooo.oE.+ o.  |
|o.+. +.. .+ .    |
|.o.o= o  . .     |
|. .* =  S o      |
|  o B .  +       |
|   * .  .        |
|  o o            |
| ..o..           |
+----[SHA256]-----+

cat命令读出id_rsa.pub后以流的方式追加到authorized_keys中

cat .ssh/id_rsa.pub >> .ssh/authorized_keys

启动ssh服务

root@b434aef5bc5f:~# service ssh start

Starting OpenBSD Secure Shell server sshd           [ OK ]                                         

172.17.0.2是容器b434aef5bc5f的IP地址

root@b434aef5bc5f:~# ssh 172.17.0.2
Welcome to Ubuntu 18.04.4 LTS (GNU/Linux 5.3.0-kali2-amd64 x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage
   This system has been minimized by removing packages and content that are
   not required on a system that users do not log into.

To restore this content, you can run the 'unminimize' command.

The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.

出现上述问题ssh -o StrictHostKeyChecking=no root@172.17.0.2

然后再次执行

成功!

方便以后使用,追加service ssh start.bashrc文件

vim /root/.bashrc

安装hadoop

下载

wget http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz

解压

tar -zxvf hadoop-3.2.1.tar.gz -C /usr/local

修改 /etc/profile 文件,添加环境变量

vim /etc/profile

追加以下内容

#java
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export JRE_HOME=${JAVA_HOME}/jre    
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib    
export PATH=${JAVA_HOME}/bin:$PATH
#hadoop
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export HADOOP_COMMON_HOME=$HADOOP_HOME 
export HADOOP_HDFS_HOME=$HADOOP_HOME 
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME 
export HADOOP_INSTALL=$HADOOP_HOME 
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native 
export HADOOP_CONF_DIR=$HADOOP_HOME 
export HADOOP_LIBEXEC_DIR=$HADOOP_HOME/libexec 
export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native:$JAVA_LIBRARY_PATH
export HADOOP_CONF_DIR=$HADOOP_PREFIX/etc/hadoop
export HDFS_DATANODE_USER=root
export HDFS_DATANODE_SECURE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export HDFS_NAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root

export HADOOP_CONF_DIR=$HADOOP_PREFIX/etc/hadoop 需要改为

export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop否则运行hadoop命令需要找到路径并加./

如果不嫌繁琐的话也可以不用改,最后几行代表hadoop是哪个用户,我用的root

运行source /etc/profile使其生效

在目录 /usr/local/hadoop/etc/hadoop

hadoop-env.sh 文件

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root

core-site.xml 文件

<configuration>
    <property>
        <name>fs.default.name</name>
        <value>hdfs://h01:9000</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/home/hadoop3/hadoop/tmp</value>
    </property>

</configuration>

hdfs-site.xml文件

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>2</value>
    </property>
    <property>

 <name>dfs.namenode.name.dir</name>
        <value>/home/hadoop3/hadoop/hdfs/name</value>
    </property>
    <property>
        <name>dfs.namenode.data.dir</name>
        <value>/home/hadoop3/hadoop/hdfs/data</value>
    </property>
</configuration>

mapred-site.xml文件

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
        <name>mapreduce.application.classpath</name>
        <value>
            /usr/local/hadoop/etc/hadoop,
            /usr/local/hadoop/share/hadoop/common/*,
            /usr/local/hadoop/share/hadoop/common/lib/*,
            /usr/local/hadoop/share/hadoop/hdfs/*,
            /usr/local/hadoop/share/hadoop/hdfs/lib/*,
            /usr/local/hadoop/share/hadoop/mapreduce/*,
            /usr/local/hadoop/share/hadoop/mapreduce/lib/*,
            /usr/local/hadoop/share/hadoop/yarn/*,
            /usr/local/hadoop/share/hadoop/yarn/lib/*
        </value>
    </property>
</configuration>

yarn-site.xml文件

<configuration>
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>h01</value>
    </property>
    <property>

<name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
</configuration>

worker文件

h00
h01
h02

hadoop终于搞完了

搭建集群

上述步骤相当于在实体机搭建了一台分布式主机,但是我们需要的是hadoop集群,怎么能一台机器呢

所以我们要横向扩展,这也是hadoop集群的特性,易于扩展,也就是死命加机器(手动滑稽)。

在docker中就很方便了,将搞完hadoop的这个容器,也就相当于现实中,一台运行着的机器,直接将其复制无数相同的从节点机,当然我们只要两个从节点就够了,也就是第四步中worker文件的配置,h00为主节点,h01,h02为从节点。好了闲话不多说实操起来。

将容器打包成镜像,因为我们要扩展呀,必须得有镜像

docker commit -m "haddop" -a "hadoop" b434aef5bc5f myhadoop

docker images查看,里面出现了myhadoop

扩展

首先就是主节点,虽然之前的也可以用,但是我们要配置端口映射,不然无法在docker之外操作hadoop集群

docker run -it --network hadoop -h h01 --name "h00" -p 9870:9870 -p 8088:8088 -p 9000:9000 myhadoop /bin/bash //前两个端口是web端,9000是hdfs文件系统的端口映射

好了主节点配置好了,我们的hadoop集群有了大脑,然后增加两个从节点

docker run -it --network hadoop -h h01 --name "h02" myhadoop /bin/bash

docker run -it --network hadoop -h h02 --name "h02" myhadoop /bin/bash

在客户机中创建docker.sh启动脚本里面添加下面内容,并且用chmod +x hadoop.sh使其可执行

service docker start
docker start h01
docker start h02
docker start h03
docker attach h01
root@linux:~# ./docker.sh 
h01
h02
h03
root@h01:/# cat hadoop.sh 
./usr/local/hadoop/sbin/start-all.sh
root@h01:/# 
启动集群

在h01节点中先初始化hdfs,以后启动不用初始化,否则会报错

root@h01:/usr/local/hadoop/bin#./hadoop namenode -format

/目录下创建hadoop集群启动脚本hadoop.sh,写入下面内容

./usr/local/hadoop/sbin/start-all.sh

同docker启动脚本一样

每次启动h01运行source /etc/profile,执行命令就不用加./

好了,hadoop完全分布式集群搭建到此结束,之后会写利用eclipse来写java程序操作hdfs

  • 1
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值