docker hadoop集群配置

12 篇文章 1 订阅
7 篇文章 0 订阅

环境

1、操作系统: CentOS  64位

网路设置

 

hostnameip
cluster-master172.18.0.2
cluster-slave1172.18.0.3
cluster-slave2172.18.0.4
cluster-slave3172.18.0.5

一、docker 安装

二、拉去centos最新版本镜像

docker pull centos 

2.1 按照集群的架构,创建容器时需要设置固定IP,所以先要在docker使用如下命令创建固定IP的子网

docker network create --subnet=172.18.0.0/16 netgroup

docker的子网创建完成之后就可以创建固定IP的容器了

#cluster-master
#-p 设置docker映射到容器的端口 后续查看web管理页面使用
docker run -d --privileged -ti -v /sys/fs/cgroup:/sys/fs/cgroup --name cluster-master -h cluster-master -p 18088:18088 -p 9870:9870 --net netgroup --ip 172.18.0.2 daocloud.io/library/centos /usr/sbin/init

#cluster-slaves
docker run -d --privileged -ti -v /sys/fs/cgroup:/sys/fs/cgroup --name cluster-slave1 -h cluster-slave1 --net netgroup --ip 172.18.0.3 daocloud.io/library/centos /usr/sbin/init

docker run -d --privileged -ti -v /sys/fs/cgroup:/sys/fs/cgroup --name cluster-slave2 -h cluster-slave2 --net netgroup --ip 172.18.0.4 daocloud.io/library/centos /usr/sbin/init

docker run -d --privileged -ti -v /sys/fs/cgroup:/sys/fs/cgroup --name cluster-slave3 -h cluster-slave3 --net netgroup --ip 172.18.0.5 daocloud.io/library/centos /usr/sbin/init

启动控制台并进入docker容器中:

docker exec -it cluster-master /bin/bash

2.2安装OpenSSH免密登录

2.2.1、cluster-master安装:

#cluster-master需要修改配置文件--特殊

#安装openssh
[root@cluster-master /]# yum -y install openssh openssh-server openssh-clients

[root@cluster-master /]# systemctl start sshd
####ssh自动接受新的公钥
####master设置ssh登录自动添加kown_hosts
[root@cluster-master /]# vi /etc/ssh/ssh_config
#将原来的StrictHostKeyChecking ask
#设置StrictHostKeyChecking为no
#保存
[root@cluster-master /]# systemctl restart sshd

2.2.2、分别对slaves安装OpenSSH

#进入容器
docker exec -it cluster-slave1 /bin/bash

#安装openssh
[root@cluster-slave1 /]#yum -y install openssh openssh-server openssh-clients

[root@cluster-slave1 /]# systemctl start sshd

2.2.3、cluster-master公钥分发

在master机上执行
ssh-keygen -t rsa
并一路回车,完成之后会生成~/.ssh目录,目录下有id_rsa(私钥文件)和id_rsa.pub(公钥文件),再将id_rsa.pub重定向到文件authorized_keys

ssh-keygen -t rsa
#碰见输入密码地方一路回车即可

[root@cluster-master /]# cat ~/.ssh/id_rsa.pub > ~/.ssh/authorized_keys

文件生成之后用scp将公钥文件分发到集群slave主机

[root@cluster-master /]# ssh root@cluster-slave1 'mkdir ~/.ssh'
[root@cluster-master /]# scp ~/.ssh/authorized_keys root@cluster-slave1:~/.ssh
[root@cluster-master /]# ssh root@cluster-slave2 'mkdir ~/.ssh'
[root@cluster-master /]# scp ~/.ssh/authorized_keys root@cluster-slave2:~/.ssh
[root@cluster-master /]# ssh root@cluster-slave3 'mkdir ~/.ssh'
[root@cluster-master /]# scp ~/.ssh/authorized_keys root@cluster-slave3:~/.ssh

如果出现输入密码直接可以去对应slave创建文件,并从master拷贝。

分发完成之后测试(ssh root@cluster-slave1)是否已经可以免输入密码登录

Ansible安装

[root@cluster-master /]# yum -y install epel-release
[root@cluster-master /]# yum -y install ansible
#这样的话ansible会被安装到/etc/ansible目录下

此时我们再去编辑ansible的hosts文件

vi /etc/ansible/hosts

 

软件环境配置:

下载jdk,hadoop3 到/opt目录下,解压安装包,并创建链接文件

wget https://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-3.3.0/hadoop-3.3.0.tar.gz
tar -xzvf hadoop-3.3.0.tar.gz
ln -s hadoop-3.3.0 hadoop

配置java和hadoop环境变量

编辑 ~/.bashrc文件

# hadoop
export HADOOP_HOME=/opt/hadoop-3.3.0
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH

#java
export JAVA_HOME=/opt/jdk8
export PATH=$HADOOP_HOME/bin:$PATH

执行命令使。bashrc文件生效

source .bashrc

配置hadoop运行所需配置文件

#进入hadoop目录进行配置
cd $HADOOP_HOME/etc/hadoop/

1、修改core-site.xml

<configuration>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/home/hadoop/tmp</value>
        <description>A base for other temporary directories.</description>
    </property>
    <!-- file system properties -->
    <property>
        <name>fs.default.name</name>
        <value>hdfs://cluster-master:9000</value>
    </property>
    <property>
    <name>fs.trash.interval</name>
        <value>4320</value>
    </property>
</configuration>

2、修改hdfs-site.xml

<configuration>
<property>
   <name>dfs.namenode.name.dir</name>
   <value>/home/hadoop/tmp/dfs/name</value>
 </property>
 <property>
   <name>dfs.datanode.data.dir</name>
   <value>/home/hadoop/data</value>
 </property>
 <property>
   <name>dfs.replication</name>
   <value>3</value>
 </property>
 <property>
   <name>dfs.webhdfs.enabled</name>
   <value>true</value>
 </property>
 <property>
   <name>dfs.permissions.superusergroup</name>
   <value>staff</value>
 </property>
 <property>
   <name>dfs.permissions.enabled</name>
   <value>false</value>
 </property>
 </configuration>

3、修改mapred-site.xml

<configuration>
<property>
  <name>mapreduce.framework.name</name>
  <value>yarn</value>
</property>
<property>
    <name>mapred.job.tracker</name>
    <value>cluster-master:9001</value>
</property>
<property>
  <name>mapreduce.jobtracker.http.address</name>
  <value>cluster-master:50030</value>
</property>
<property>
  <name>mapreduce.jobhisotry.address</name>
  <value>cluster-master:10020</value>
</property>
<property>
  <name>mapreduce.jobhistory.webapp.address</name>
  <value>cluster-master:19888</value>
</property>
<property>
  <name>mapreduce.jobhistory.done-dir</name>
  <value>/jobhistory/done</value>
</property>
<property>
  <name>mapreduce.intermediate-done-dir</name>
  <value>/jobhisotry/done_intermediate</value>
</property>
<property>
  <name>mapreduce.job.ubertask.enable</name>
  <value>true</value>
</property>
</configuration>

4、yarn-site.xml

<configuration>
    <property>
   <name>yarn.resourcemanager.hostname</name>
   <value>cluster-master</value>
 </property>
 <property>
   <name>yarn.nodemanager.aux-services</name>
   <value>mapreduce_shuffle</value>
 </property>
 <property>
   <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
   <value>org.apache.hadoop.mapred.ShuffleHandler</value>
 </property>
 <property>
   <name>yarn.resourcemanager.address</name>
   <value>cluster-master:18040</value>
 </property>
<property>
   <name>yarn.resourcemanager.scheduler.address</name>
   <value>cluster-master:18030</value>
 </property>
 <property>
   <name>yarn.resourcemanager.resource-tracker.address</name>
   <value>cluster-master:18025</value>
 </property> <property>
   <name>yarn.resourcemanager.admin.address</name>
   <value>cluster-master:18141</value>
 </property>
<property>
   <name>yarn.resourcemanager.webapp.address</name>
   <value>cluster-master:18088</value>
 </property>
<property>
   <name>yarn.log-aggregation-enable</name>
   <value>true</value>
 </property>
<property>
   <name>yarn.log-aggregation.retain-seconds</name>
   <value>86400</value>
 </property>
<property>
   <name>yarn.log-aggregation.retain-check-interval-seconds</name>
   <value>86400</value>
 </property>
<property>
   <name>yarn.nodemanager.remote-app-log-dir</name>
   <value>/tmp/logs</value>
 </property>
<property>
   <name>yarn.nodemanager.remote-app-log-dir-suffix</name>
   <value>logs</value>
 </property>
</configuration>

打包hadoop 向slaves分发

tar -cvf hadoop-dis.tar hadoop hadoop-3.3.0

使用ansible-playbook分发.bashrc和hadoop-dis.tar至slave主机

---
- hosts: cluster
  tasks:
    - name: copy .bashrc to slaves
      copy: src=~/.bashrc dest=~/
      notify:
        - exec source
    - name: copy hadoop-dis.tar to slaves
      unarchive: src=/opt/hadoop-dis.tar dest=/opt

  handlers:
    - name: exec source
      shell: source ~/.bashrc

将以上yaml保存为playbook01.yml,并执行  参考文档

ansible-playbook playbook01.yml

hadoop-dis.tar会自动解压到slave主机的/opt目录下 

 

Hadoop 启动

格式化namenode

hadoop namenode -format

如果看到storage format success等字样,即可格式化成功 

再次启动集群服务:

start-all.sh

如果start-all.sh 出现报错:ERROR: Attempting to operate on hdfs namenode as root
        
        ERROR: but there is no HDFS_NAMENODE_USER defined. Aborting operation.

参考:https://www.cnblogs.com/Mr-nie/p/11133416.html

 

验证服务

访问:

http://host:18088
http://host:9870

 

 

hadoop 查看文件夹
./hadoop fs -ls /
创建文件夹
./hadoop fs -mkdir /test

上传文件:
./hadoop fs -put /Users/wangyun/Desktop/lala.txt /test/

查看文件
./hadoop fs -ls /test

读取文件
./hadoop fs -text /test/lala.txt

 

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

风雨「83」

你的鼓励将是我创作最大的动力!

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值