Cloudera Hadoop环境搭建之一:环境准备与搭建cloudera manager软件

Cloudera Hadoop环境搭建之一:环境准备

本系列笔记基于Hadoop发行版本CDH 5.12.2,CDH相关组件采用手动下载离线本地离线安装的方式,另外的全在线安装以及yum源下载软件包的安装方式没有进行测试过(搜索了很多文档说这两种方式在国内速度很慢不靠谱,也就放弃使用了)。

零、集群结构与服务器信息

本系列笔记的目标是搭建一套完全分布式的hadoop集群,准备采用操作系统为CentOS7.3.1611(core)的7台KVM虚拟机(构建在openstack平台之上)进行搭建。节点与角色如下

主机名称别名IP地址角色
hadoop-node-1nn110.120.67.9CM,CMagent,Mysql,
namenode,DFSZKFailoverController,
ResourceManager,JobHistoryServicer,
HiveMetastoreServer,HiveServer2,WebHcatServer,Gateway,
ImpalaStateStore,ImpalaCalalogServer,
HBaseMaster,HbaseThriftServer,HBaseRestServer,
HueServer,HueLoadBalancer,OOzieSever
hadoop-node-2nn210.120.67.10基本同master-1,组成HA方案
hadoop-node-3slave110.120.67.11CMagent,zk,JournalNode,DataNode,NodeManager,HiveGateway,ImpalaD,RegionServer
hadoop-node-4slave210.120.67.12CMagent,zk,JournalNode,DataNode,NodeManager,HiveGateway,ImpalaD,RegionServer
hadoop-node-5slave310.120.67.13CMagent,zk,JournalNode,DataNode,NodeManager,HiveGateway,ImpalaD,RegionServer

一、服务器准备工作

1. 关闭selinux和firewalld(iptables)

所有节点关闭selinux以及filrewalld设置10.120.67.0/24段内互通,这里为了为了简化也是测试直接关闭了防火墙。

~]# sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config
~]# setenforce 0

~]# systemctl stop firewalld.service
~]# systemctl disable firewalld.service

2.配置主机名称与/etc/hosts文件

# 5台主机的编号从1到5,每台主机名如下命令进行设置
~]# hostnamectl set-hostname hadoop-node-${编号}

# 7台主机修改/etc/hosts文件,添加如下内容
~]# cat /etc/hosts
...
10.120.67.9     hadoop-node-1   nn1
10.120.67.10    hadoop-node-2   nn2
10.120.67.11    hadoop-node-3   slave1
10.120.67.12    hadoop-node-4   slave2
10.120.67.13    hadoop-node-5   slave3

3. 配置ssh免密登录

hadoop-node-1 ~]# ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa

# 将公钥拷贝到7个节点上(包括自己节点1)
~]# ssh-copy-id hadoop-node-1
~]# ssh-copy-id hadoop-node-2
~]# ssh-copy-id hadoop-node-3
~]# ssh-copy-id hadoop-node-4
~]# ssh-copy-id hadoop-node-5

# 不知道其他节点是否也需要相互无密码访问,直接将整个.ssh目录共享
~]# for var in {1..5};do scp -r .ssh hadoop-node-$var:/root/;done

4. Linux系统参数调整

这里粘贴我的系统的配置,不一定全部需要或者合理,先这样配置了。

~]# cat /etc/sysctl.conf
kernel.sysrq = 0
kernel.core_uses_pid = 1
kernel.msgmax = 1048560
kernel.msgmnb = 1073741824
kernel.shmall = 4294967296
kernel.shmmax = 68719476736
net.core.wmem_default = 8388608
net.core.rmem_default = 8388608
net.core.wmem_max = 16777216
net.core.rmem_max = 16777216
net.core.somaxconn = 256
net.core.optmem_max = 20480
net.core.somaxconn = 32768
net.core.netdev_max_backlog = 32768
net.ipv4.tcp_wmem = 8192  436600  16777216
net.ipv4.tcp_rmem = 32768 436600  16777216
net.ipv4.tcp_mem = 786432 1048576 1572864
net.ipv4.tcp_timestamps = 0
net.ipv4.tcp_retries2 = 5
net.ipv4.tcp_synack_retries = 3
net.ipv4.tcp_syn_retries = 3
net.ipv4.tcp_keepalive_time = 1200
net.ipv4.tcp_keepalive_intvl = 30
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.ip_local_port_range = 4096  65000
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 0
net.ipv4.tcp_fin_timeout = 30
net.ipv4.tcp_max_tw_buckets = 5000
net.ipv4.tcp_max_syn_backlog = 65536
net.ipv4.ip_forward = 0
net.ipv4.conf.default.accept_source_route = 0
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.lo.arp_ignore = 0
net.ipv4.conf.lo.arp_announce = 0
net.ipv4.conf.all.arp_ignore = 0
net.ipv4.conf.all.arp_announce = 0
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.default.accept_redirects = 0
vm.swappiness = 0
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv4.ip_local_port_range = 1024 65000

# 将改配置参数拷贝到每个节点
~]# for var in {1..5};do scp sysctl.conf hadoop-node-$var:/etc/;done
~]# for var in {1..5};do ssh hadoop-node-$var 'sysctl -p ';done

# 每个节点执行如下调整
echo never > /sys/kernel/mm/transparent_hugepage/defrag  
echo never > /sys/kernel/mm/transparent_hugepage/enabled
# 将改调整写入/etc/rc.locl文件,开机生效
~]# vi /etc/rc.local
echo never > /sys/kernel/mm/transparent_hugepage/defrag  
echo never > /sys/kernel/mm/transparent_hugepage/enabled
~]# chmod +x /etc/rc.d/rc.local

5. 所有节点安装JDK

这里采用官方网站下载JDK文件,解压安装的方式安装。文件:jdk-8u172-linux-x64.tar.gz,下载地址:JDK

# 5个节点执行JDK安装
~]# mkdir /usr/java
~]# tar zxvf jdk-8u172-linux-x64.tar.gz
~]# mv jdk1.8.0_172 /usr/java/
~]# vi /etc/profile #[最后追加java环境变量]
    export JAVA_HOME=/usr/java/jdk1.8.0_172
    export JRE_HOME=$JAVA_HOME/jre
    export PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin
    export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
~]# source /etc/profile
~]# java -version
    java version "1.8.0_172"
    Java(TM) SE Runtime Environment (build 1.8.0_172-b11)
    Java HotSpot(TM) 64-Bit Server VM (build 25.172-b11, mixed mode)

# [批量处理JDK安装]
~]# for var in {2..5};do ssh hadoop-node-$var 'mkdir /usr/java';done
~]# for var in {2..5};do scp jdk-8u172-linux-x64.tar.gz hadoop-node-$var:/usr/java/;done
~]# for var in {2..5};do ssh hadoop-node-$var '(cd /usr/java/; tar zxvf jdk-8u172-linux-x64.tar.gz)';done
~]# for var in {2..5};do scp profile hadoop-node-$var:/etc/;done #包含java环境变量
~]# for var in {2..5};do ssh hadoop-node-$var 'source /etc/profile';done 

6. 节点NTP时钟同步

将节点1 hadoop-node-1作为集群的网络时钟服务器,其他节点同步该节点时钟。

hadoop-node-1 ~]# vi /etc/chrony.conf
# 修改如下配置
allow 10.120.67/24

# 重启chronyd服务
hadoop-node-1 ~]# systemctl restart chronyd.service

# 其他6个节点ntp服务chronyd全部指向节点1
~]# vi /etc/chronyd.conf
# 删除其他服务器节点,配置为节点1的地址或者主机名称(/etc/hosts)
server 10.120.67.9 iburst
# 重启chronyd服务
~]# systemctl restart chronyd.service
~]# ~]# timedatectl 
      Local time: Fri 2018-04-27 01:40:50 CST
  Universal time: Thu 2018-04-26 17:40:50 UTC
        RTC time: Thu 2018-04-26 17:40:50
       Time zone: Asia/Shanghai (CST, +0800)
     NTP enabled: yes
NTP synchronized: yes
 RTC in local TZ: no
      DST active: n/a
~]# chronyc sources
210 Number of sources = 1
MS Name/IP address         Stratum Poll Reach LastRx Last sample
===============================================================================
^* hadoop-node-1              4   6    17     1   -267us[-1130us] +/-  101ms

二、数据库安装

cloudera是有内嵌的数据库的,但是官方使用独立的关系数据库,这里我才用mysql5.7作为cloudera以及hadoop相关组件的数据库,实际产线可以构建数据库的HA方案或者采用主备方案,因为我们是验证环境直接使用单节点。mysql的安装包是定制的,具体执行与配置与其他安装方法有所不同。这个数据库是很独立与hadoop组件之外的系统,请自行构建即可。

1. 我们在节点1 hadoop-node-1上配置数据库

下载数据库yum源文件
~]# wget http://dev.mysql.com/get/mysql57-community-release-el7-11.noarch.rpm
~]# yum localinstall mysql57-community-release-el7-11.noarch.rpm
~]# yum install mysql-community-server
~]# systemctl start mysqld
~]# systemctl enable mysqld
~]# systemctl daemon-reload
# 获取初次登录密码
~]# grep 'temporary password' /var/log/mysqld.log 
~]# mysql -uroot -p
mysql> set GLOBAL validate_password_policy=0;
mysql> set GLOBAL validate_password_length=4;
mysql> ALTER USER 'root'@'localhost' IDENTIFIED BY 'hadoop';

~]# vi /etc/my.cnf
[mysqld]  
user=mysql  
server-id=20180427
port=3306
bind-address=0.0.0.0
datadir=/var/lib/mysql  
socket=/var/lib/mysql/mysql.sock  
#default_password_lifetime=0
#validate_password=off
max_connections=8096
max_connect_errors=10000
max_user_connections=8000
symbolic-links=0  
transaction-isolation = READ-COMMITTED  
character_set_server = utf8mb4
key_buffer_size = 32M  
max_allowed_packet = 32M  
thread_stack = 256K  
thread_cache_size = 64  
query_cache_limit = 8M  
query_cache_size = 64M  
query_cache_type = 1  
#binlog_format = mixed  
read_buffer_size = 2M  
read_rnd_buffer_size = 16M  
sort_buffer_size = 8M  
join_buffer_size = 8M  
innodb_file_per_table = 1  
innodb_flush_log_at_trx_commit  = 2  
innodb_log_buffer_size = 64M  
innodb_buffer_pool_size = 2G  
innodb_thread_concurrency = 8  
innodb_flush_method = O_DIRECT  
innodb_log_file_size = 512M  
[mysqld_safe]  
log-error=/var/log/mysqld.log  
pid-file=/var/run/mysqld/mysqld.pid  
sql_mode=STRICT_ALL_TABLES  

~]# systemctl restart mysqld
~]# mysql -uroot -p
mysql> create database amon DEFAULT CHARACTER SET utf8mb4;
mysql> create database metastore DEFAULT CHARACTER SET latin1;
mysql> create database sentry DEFAULT CHARACTER SET utf8mb4;
mysql> create database oozie DEFAULT CHARACTER SET latin1;
mysql> create database hue DEFAULT CHARACTER SET utf8mb4;
mysql> set global validate_password_policy=0;
mysql> set global validate_password_length=1;
mysql> grant all on amon.* TO 'amon'@'%' IDENTIFIED BY 'amon';
mysql> grant all on metastore.* TO 'hive'@'%' IDENTIFIED BY 'hive';
mysql> grant all on sentry.* TO 'sentry'@'%' IDENTIFIED BY 'sentry';
mysql> grant all on oozie.* TO 'oozie'@'%' IDENTIFIED BY 'oozie';
mysql> grant all on hue.* TO 'hue'@'%' IDENTIFIED BY 'hue';

mysql> GRANT ALL PRIVILEGES ON *.* TO root@"%" IDENTIFIED BY 'hadoop'  WITH GRANT OPTION;
mysql> GRANT ALL PRIVILEGES ON *.* TO root@"localhost" IDENTIFIED BY 'hadoop'  WITH GRANT OPTION;
mysql> GRANT ALL PRIVILEGES ON *.* TO root@"hadoop-node-1" IDENTIFIED BY 'hadoop'  WITH GRANT OPTION;
mysql> GRANT ALL PRIVILEGES ON *.* TO root@"10.120.67.9" IDENTIFIED BY 'hadoop'  WITH GRANT OPTION;

本测试Hadoop集群使用到mysql数据库的组件有Cloudera Manager,Act
ivity Monitor
,Hive Metastore Server,Hue Server。预先在数据库中创建相应的数据库以及账户授权。

2. 下载mysql jdbc driver驱动并配置

并不是所有节点都需要该驱动[之后查找资料再补充],此处简单化,再所有节点配置该驱动。

# 下载驱动
~]# wget https://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-8.0.11.tar.gz
~]# tar zxvf mysql-connector-java-8.0.11.tar.gztar.gz
~]# mv mysql-connector-java-8.0.11.jar mysql-connector-java.jar
~]# for var in {1..5};do ssh hadoop-node-$var 'mkdir -p /usr/share/java/';done
~]# for var in {1..5};do scp mysql-connector-java.jar hadoop-node-$var:/usr/share/java/;done

四、下载安装Cloudera Manager Server

1. 本验证换使用的5.12.2版本,下载该版本的CMS以及parcel文件

CM下载地址: https://archive.cloudera.com/cm5/cm/5/
parcels下载地址: http://archive.cloudera.com/cdh5/parcels/
我已经下载好了相关文件。列表如下

CDH-5.12.2-1.cdh5.12.2.p0.4-el7.parcel
CDH-5.12.2-1.cdh5.12.2.p0.4-el7.parcel.sha  [此处sha1重命名为sha]
cloudera-manager-centos7-cm5.12.2_x86_64.tar.gz
manifest.json

2. 配置CM server

# (1) 配置相关目录(每个节点执行),tar包拷贝到所有节点
~]# for var in {1..5};do ssh hadoop-node-$var 'mkdir -p /opt/cloudera-manager';done
~]# for var in {1..5};do scp cloudera-manager-centos7-cm5.12.2_x86_64.tar.gz hadoop-node-$var:/opt/cloudera-manager;done
~]# for var in {1..5};do ssh hadoop-node-$var '(cd /opt/cloudera-manager/;tar zxvf cloudera-manager-centos7-cm5.12.2_x86_64.tar.gz )';done

useradd --system --home=/opt/cloudera-manager/cm-5.12.2/run/cloudera-scm-server --no-create-home --shell=/bin/false --comment "SCM service" cloudera-scm

# (2) 创建CM Server的本地数据存储目录
#节点1,也就是CM server节点执行
mkdir -p /var/lib/cloudera-scm-server
# chown cloudera-scm:cloudera-scm /var/lib/cloudera-scm-server

# (3)配置所有节点上的Cloudera Manager Agents
sed -i 's/server_host=localhost/server_host=hadoop-node-1/' /opt/cloudera-manager/cm-5.12.2/etc/cloudera-scm-agent/config.ini

# (4)创建和初始化CM Server数据库
# 数据库是安装在第一个节点上,在hadoop-node-1上准备JDBC驱动包:
cp  mysql-connector-java.jar  /opt/cloudera-manager/cm-5.12.2/share/cmf/lib

# 运行初始化脚本
/opt/cloudera-manager/cm-5.12.2/share/cmf/schema/scm_prepare_database.sh mysql -hhadoop-node-1 -uroot -phadoop scm scm scm --force
#输出信息截取日下
JAVA_HOME=/usr/java/jdk1.8.0_172
Verifying that we can write to /opt/cloudera-manager/cm-5.12.2/etc/cloudera-scm-server
Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver class is `com.mysql.cj.jdbc.Driver'. The driver is automatically registered via the SPI and manual loading of the driver class is generally unnecessary.
...
java.sql.SQLException: Access denied for user 'scm'@'hadoop-node-1'
... 
All done, your SCM database is configured correctly!

登录mysql,增加对CM Server主机的访问授权:
grant all on scm.* TO 'scm'@'hadoop-node-1' IDENTIFIED BY 'scm';
grant all on scm.* TO 'scm'@'10.120.67.9' IDENTIFIED BY 'scm';
~]# cat /opt/cloudera-manager/cm-5.12.2/etc/cloudera-scm-server/db.properties
com.cloudera.cmf.db.type=mysql
com.cloudera.cmf.db.host=hadoop-node-1
com.cloudera.cmf.db.name=scm
com.cloudera.cmf.db.user=scm
com.cloudera.cmf.db.setupType=EXTERNAL
com.cloudera.cmf.db.password=scm

# (5) 创建CDH Parcel资源目录
# 第一个节点
mkdir -p /opt/cloudera
mkdir /opt/cloudera/parcel-repo
cp CDH-5.12.2-1.cdh5.12.2.p0.4-el7.parcel.sha CDH-5.12.2-1.cdh5.12.2.p0.4-el7.parcel manifest.json /opt/cloudera/parcel-repo
#chown -R cloudera-scm:cloudera-scm /opt/cloudera/parcel-repo

所有节点:
for var in {1..5};do ssh hadoop-node-$var 'mkdir -p /opt/cloudera';done
for var in {1..5};do ssh hadoop-node-$var 'mkdir -p /opt/cloudera/parcels';done
#for var in {1..5};do ssh hadoop-node-$var 'chown cloudera-scm:cloudera-scm /opt/cloudera/parcels';done


# (6)启动cloudera manager server and agents
# 直接使用root运行启动脚本即可,脚本中已经默认设定为会切换到cloudera-scm用户运行进程[参考文档,实际存在问题]
#for var in {1..5};do ssh hadoop-node-$var 'chown -R cloudera-scm:cloudera-scm /opt/cloudera-manager';done
# CM server节点(第一个节点)
/opt/cloudera-manager/cm-5.12.2/etc/init.d/cloudera-scm-server  start
[报错]/opt/app/cloudera-manager/cm-5.12.2/etc/init.d/cloudera-scm-server: line 109: pstree: command not found,
因为Centos7最小化安装没有安装,使用yum进行安装
yum install psmisc -y

# CM agent节点(所有节点)
/opt/cloudera-manager/cm-5.12.2/etc/init.d/cloudera-scm-agent  start

五、完成CMServer和Agent安装

到这里已经完成了CMServer和Agent安装,在浏览器可以访问http://10.120.67.9:7180/cmf
初始账户与密码是: admin/admin
CMserver 和Agent安装完成登录

到这里本记录阶段完成,下一个阶段为使用UI界面安装hadoop生态圈的各种组件以及配置HA。

HUE安装时需要:
yum install -y cyrus-sasl-devel cyrus-sasl-gssapi cyrus-sasl-plain httpd mod_ssl

参考文档:
1. 官方网站
2. https://blog.csdn.net/watermelonbig/article/details/77102187

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值