在生产环境中,很多时候集群管理者并没有开放root权限给你来安装CDH,这时候管理者只会开放部分权限,这时涉及这些已经开放的权限时,你必须运用sudo执行。本文接下来先从root权限入手,通过开放部分权限给huaxin这个普通用户, 然后huaxin这个用户可以安装CDH。
- 创建用户和分配权限
创建普通用户(root操作,每个节点执行)
[root@localhost ~]# useradd huaxin
[root@localhost ~]# passwd huaxin
Changing password for user huaxin.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.
给huaxin这个用户分配权限,在/etc/sudoers文件中添加以下内容:
User_Alias CDH_INSTALLER=huaxin
Cmnd_Alias CDH_CMD= /usr/bin/chown, /usr/sbin/service, /usr/bin/systemctl, /usr/bin/rm, /usr/bin/id, /usr/bin/install, /usr/sbin/chkconfig, /usr/bin/yum, /usr/bin/sed, /usr/bin/mv, /usr/sbin/ntpdate
CDH_INSTALLER ALL=(ALL) NOPASSWD:CDH_CMD
huaxin ALL =(ALL) NOPASSWD: CDH_CMD
%huaxin ALL =(ALL) NOPASSWD: CDH_CMD
cloudera-scm ALL=(ALL) NOPASSWD:ALL
- 网络配置
配置网关(root操作,每个节点执行)
[root@localhost ~]# vi /etc/sysconfig/network
master节点增加如下内容(其它节点HOSTNAME改为相应主机名,比如hxslave1、hxslave2等):
NETWORKING=yes
HOSTNAME=hxmaster
配置IP地址(root操作,每个节点执行)
[root@localhost ~]# vi /etc/sysconfig/network-scripts/ifcfg-ens3
将内容修改如下(IPV6的内容注释掉,IP地址和网关根据节点不同而修改):
TYPE="Ethernet"
BOOTPROTO="none"
DEFROUTE="yes"
IPV4_FAILURE_FATAL="no"
#IPV6INIT="yes"
#IPV6_AUTOCONF="yes"
#IPV6_DEFROUTE="yes"
#IPV6_FAILURE_FATAL="no"
NAME="ens3"
UUID="b8cc6007-9cc6-46da-8bff-b0a06a693996"
DEVICE="ens3"
ONBOOT="yes"
IPADDR="10.10.1.7"
PREFIX="24"
GATEWAY="10.10.1.254"
DNS1="202.101.172.35"
#IPV6_PEERDNS="yes"
#IPV6_PEERROUTES="yes"
#IPV6_PRIVACY="no"
修改IP地址和主机名映射(root操作):
[root@localhost ~]# vi /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
10.10.1.7 hxmaster
10.10.1.8 hxslave1
10.10.1.9 hxslave2
10.10.1.10 hxslave3
scp到其它节点执行
scp /etc/hosts 10.10.1.8:/etc/
scp /etc/hosts 10.10.1.9:/etc/
scp /etc/hosts 10.10.1.10:/etc/
重启网络,是修改生效(root操作,每个节点执行):
[root@localhost ~]# systemctl restart network.service
- 关闭防火墙(root操作)
集群所有节点都需要关闭防火墙
(1) 关闭firewall
[root@localhost ~]# systemctl stop firewalld
关闭后查看firewalld状态
[root@localhost ~]# firewall-cmd --state
显示not running,表示关闭成功。
或者也可以通过以下命令查看防火墙状态:
[root@localhost ~]# systemctl status firewalld.service
(2) 禁止防火墙开机自启动
[root@localhost ~]# systemctl disable firewalld.service
- 关闭SElinux(root操作)
查看SElinux是否关闭(Enforcing表示开启)
[root@hxmaster ~]# getenforce
Enforcing
修改/etc/selinux/config文件,将SELINUX=enforcing改为SELINUX=disabled,执行该命令后重启网络生效
[root@hxmaster ~]# vi /etc/selinux/config
# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
# enforcing - SELinux security policy is enforced.
# permissive - SELinux prints warnings instead of enforcing.
# disabled - No SELinux policy is loaded.
SELINUX=disabled
# SELINUXTYPE= can take one of three two values:
# targeted - Targeted processes are protected,
# minimum - Modification of targeted policy. Only selected processes are protected.
# mls - Multi Level Security protection.
SELINUXTYPE=targeted
- SSH无密钥登录(普通用户操作)
考虑到后期在多台主机上配置SSH,为了避免登录每台操作这种繁琐工作,这里用到了shell脚本批量配置,其中用到expect,只要在master节点安装expect就行:
安装expect:
yum install -y expect
将下面内容保存到batch_ssh.sh脚本(password根据你的主机来修改),batch_ssh.sh会读取当前目录下ip.txt这个文件,里面保存有集群的各个主机名和密码:
#!/bin/bash
#create local pub key
expect -c "
spawn ssh-keygen -t rsa
expect {
"*key*" {send "\\n"; exp_continue}
"*Overwrite*" {send "y\\n"; exp_continue}
"*passphrase*" {send "\\n"; exp_continue}
"*again*" {send "\\n"}
}
"
ssh-add ~/.ssh/id_rsa
for ip in $(cat ip.txt)
do
ip=$(echo "$ip" | cut -f1 -d ":")
password=$(echo "$ip" | cut -f2 -d ":")
expect -c "
spawn ssh-copy-id -i /home/huaxin/.ssh/id_rsa.pub $ip
expect {
"*yes/no*" {send "yes\\r"; exp_continue}
"*password*" {send "123456\\r"; exp_continue}
"*password*" {send "123456\\r";}
}
"
done
将下面内容保存为ip.txt
10.10.1.7:123456
10.10.1.8:123456
10.10.1.9:123456
10.10.1.10:123456
执行这个batch_ssh.sh这个脚本就可完成集群ssh无密钥登录配置(普通用户执行):
[huaxin@hxmaster ~]$ bash bash_ssh.sh
- 安装JDK
每个节点的JDK都安装在/usr/java路径下(这是默认的安装路径):
[root@hxmaster java]# tar -zxf jdk-8u121-linux-x64.tar.gz
[root@hxmaster java]# ls
jdk1.8.0_121 jdk-8u121-linux-x64.tar.gz
[root@hxmaster java]# pwd
/usr/java
[root@hxmaster java]# ll
total 178956
drwxr-xr-x. 8 10 143 4096 Dec 13 2016 jdk1.8.0_121
-rw-r--r--. 1 root root 183246769 Jul 11 10:18 jdk-8u121-linux-x64.tar.gz
修改huaxin用户的.bashrc文件:
[huaxin@hxmaster ~]$ vi .bashrc
添加以下内容:
export JAVA_HOME=/usr/java/jdk1.8.0_121
export PATH=$JAVA_HOME/bin:$PATH
将.bashrc文件scp到集群其它节点:
[huaxin@hxmaster ~]$ scp .bashrc hxslave1:/home/huaxin/
[huaxin@hxmaster ~]$ scp .bashrc hxslave2:/home/huaxin/
[huaxin@hxmaster ~]$ scp .bashrc hxslave3:/home/huaxin/
在每个节点source一下,使配置生效:
[huaxin@hxmaster ~]$ source .bashrc
在huaxin用户目录下创建install/cloudera_manager目录,并将cm包拷贝到cloudera_manager目录下(每个节点操作)
[root@localhost ~]# su - huaxin
[huaxin@localhost ~]$ mkdir -p /home/huaxin/install/cloudera_manager
查看需要拷贝的内容:
[huaxin@localhost cloudera_manager]$ ll
total 771828
-rw-r--r--. 1 huaxin huaxin 9128 Jul 10 20:19 cloudera-cdh-5-0.x86_64.rpm
-rw-r--r--. 1 huaxin huaxin 9813720 Jul 10 20:19 cloudera-manager-agent-5.14.2-1.cm5142.p0.8.el7.x86_64.rpm
-rw-r--r--. 1 huaxin huaxin 780499508 Jul 10 20:20 cloudera-manager-daemons-5.14.2-1.cm5142.p0.8.el7.x86_64.rpm
-rw-r--r--. 1 huaxin huaxin 8692 Jul 10 20:19 cloudera-manager-server-5.14.2-1.cm5142.p0.8.el7.x86_64.rpm
-rw-r--r--. 1 huaxin huaxin 10608 Jul 10 20:19 cloudera-manager-server-db-2-5.14.2-1.cm5142.p0.8.el7.x86_64.rpm
- NTP对时
安装ntp:
[huaxin@hxmaster ~]$ yum install -y ntp
修改master节点/etc/ntp.conf文件中,在文件中加入server 127.1.1.0,其中server设置127.127.1.0为其自身:
server 127.1.1.0
修改每个slave节点的/etc/ntp.conf文件,加入server 10.10.1.7,其中10.10.1.7是master节点ip地址。表示每个slave要和master节点时钟同步:
server 10.10.1.7
注意:在master节点和slave节点启动ntp和ntp校时的操作顺序有先后之别,在master节点先启动ntp再校时,在slave节点先校时然后再启动ntp。
master节点(先启动再校时):
开启ntp服务并开启开机自启动
[huaxin@hxmaster ~]$ sudo systemctl start ntpdate.service
[huaxin@hxmaster ~]$ sudo systemctl enable ntpdate.service
[huaxin@hxmaster ~]$ sudo ntpdate -u hxmaster
11 Jul 14:27:16 ntpdate[6107]: adjust time server 10.10.1.7 offset 0.000005 sec
有时使用ntpdate hxmaster会出现“11 Jul 14:27:03 ntpdate[6105]: the NTP socket is in use, exiting”,这是使用ntpdate -u hxmaster问题就可解决。
slave节点(先校时再启动):
[huaxin@hxslave1 ~]$ sudo ntpdate hxmaster
11 Jul 14:31:03 ntpdate[3427]: adjust time server 10.10.1.7 offset -0.015980 sec
[huaxin@hxslave1 ~]$ sudo systemctl start ntpdate.service
[huaxin@hxslave1 ~]$ sudo systemctl enable ntpdate.service
- 安装MySQL
这里用到了已经编译好的mysql zip包:
[huaxin@hxmaster ~]$ ll mysql-5.7.21-linux-x86_64.zip
-rw-r--r--. 1 root root 629769438 Jul 11 14:54 mysql-5.7.21-linux-x86_64.zip
[huaxin@hxmaster ~]$ unzip mysql-5.7.21-linux-x86_64.zip
(1) 安装mysql服务(在mysql安装包目录下执行)
由于huaxin用户没有mysql/bin目录下文件的执行权限,需要root用户赋予777权限:
[root@hxmaster ~]# chmod 777 -R /home/huaxin/mysql/bin/
初始化mysqld
[huaxin@hxmaster mysql]$ bin/mysqld --user=huaxin --basedir=/home/huaxin/mysql/ --datadir=/home/huaxin/data --initialize
记住生成的随机密码,第一次登录时需要,比如生成密码 mzx=EE6_=orW
(2) 新建my.cnf文件,配置如下
[client]
[mysqld]
max_connections=2000
innodb_file_per_table=1
innodb_thread_concurrency=32
innodb_buffer_pool_size=8G
innodb_buffer_pool_instances=2
innodb_open_files=5000
innodb_flush_method=O_DIRECT
innodb_log_file_size=256M
innodb_log_buffer_size=64M
innodb_flush_log_at_trx_commit=2
skip-name-resolve
join_buffer_size=1M
sort_buffer_size=2M
read_rnd_buffer_size=1M
basedir=/home/huaxin/mysql
datadir=/home/huaxin/data
socket=/tmp/mysql.sock
log-error=/home/huaxin/data/error.log
pid-file=/home/huaxin/data/mysql.pid
user=root
port=3306
character-set-server=utf8
#character_set_client=utf8
sql_mode=NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES
#set default system charset
character_set_server=utf8
(3) 在bin目录下,启动mysql服务:
bin/mysqld --defaults-file=/app/wlbd/mysql/my.cnf
有时候安装需要指定user为root,具体情况具体分析:
bin/mysqld --defaults-file=/app/wlbd/mysql/my.cnf --user=root
(4) 在bin目录下,开启客户端:
bin/mysql --defaults-file=./my.cnf -uroot -p
输入(1)生成的随机密码
第一次登录后会立刻要求修改密码
alter user 'root'@'localhost' identified by '123456';
创建CDH可能用到的数据库:
create database hive DEFAULT CHARSET utf8 COLLATE utf8_general_ci;
create database amon DEFAULT CHARSET utf8 COLLATE utf8_general_ci;
create database hue DEFAULT CHARSET utf8 COLLATE utf8_general_ci;
create database oozie DEFAULT CHARSET utf8 COLLATE utf8_general_ci;
create database sentry DEFAULT CHARSET utf8 COLLATE utf8_general_ci;
为了可以远程登录,增加一个’root’@’%’账号:
grant all privileges on *.* to 'root'@'%' identified by '123456' with grant option;
刷新权限,命令如下:
flush privileges;
修改.bashrc文件,添加以下内容:
export MYSQL_HOME=/home/huaxin/mysql
export PATH=$MYSQL_HOME/bin:$PATH
使修改生效:
[huaxin@hxmaster ~]$ source .bashrc
将MySQL JDBC拷贝到/usr/share/java目录下
更改JDBC所属用户和用户组:
[root@hxmaster java]# chown huaxin:huaxin mysql-connector-java-5.1.42-bin.jar
将JDBC的jar包命名为 mysql-connector-java.jar
[root@hxmaster java]# ll
total 976
-rw-r--r--. 1 huaxin huaxin 996444 Jul 10 21:06 mysql-connector-java-5.1.42-bin.jar
[root@hxmaster java]# mv mysql-connector-java-5.1.42-bin.jar mysql-connector-java.jar
[root@hxmaster java]# ll
total 976
-rw-r--r--. 1 huaxin huaxin 996444 Jul 10 21:06 mysql-connector-java.jar
- Cloudera Manager安装
安装rpm包,需要在集群所有节点执行:
创建cloudera_manager目录:
[huaxin@hxmaster ~]$ mkdir -p install/cloudera_manager
将以下rpm包拷贝到cloudera_manager目录下:
cloudera-cdh-5-0.x86_64.rpm
cloudera-manager-server-5.14.2-1.cm5142.p0.8.el7.x86_64.rpm
cloudera-manager-agent-5.14.2-1.cm5142.p0.8.el7.x86_64.rpm
cloudera-manager-server-db-2-5.14.2-1.cm5142.p0.8.el7.x86_64.rpm
cloudera-manager-daemons-5.14.2-1.cm5142.p0.8.el7.x86_64.rpm
安装rpm包
[huaxin@hxmaster cloudera_manager]$ sudo yum -y localinstall *.rpm
Cloudera Manager配置及本地源配置
[root@hxmaster ~]# vi /etc/cloudera-scm-agent/config.ini
将server_host=localhost 改为server_host=hxmaster
修改文件权限(root操作):
将下面命令保存到 chmod_file.sh
for i in {7..10};
do
ssh 10.10.1.$i "chmod 777 -R /opt; chmod 777 -R /var/lib/cloudera-scm-server; chmod 777 -R /etc/default/cloudera-scm-server; chmod 777 -R /etc/default/cloudera-scm/agent; chmod 777 -R /var/log/cloudera-scm-server; chmod 777 -R /var/log/cloudera-scm-agent; chmod 777 -R /etc/cloudera-scm-agent; chmod 777 -R /etc/cloudera-scm-server; chmod 777 -R /usr/sbin/cmf-server; chmod 777 -R /usr/share/cmf; chmod 777 -R /usr/share/cmf/bin/cmf-server"
done
在主节点初始化CM5的数据库:
/usr/share/cmf/schema/scm_prepare_database.sh mysql cm -h127.0.0.1 -uroot -p123456 --scm-host 127.0.0.1 scm scm scm
在集群的每个节点创建cloudera-scm用户
[huaxin@hxmaster ~]$ sudo useradd --system --home=/run/cloudera-scm-server --no-create-home --shell=/bin/false --comment "Cloudera SCM User" cloudera-scm
在master节点启动Cloudera Manager的Server和Client服务:
[huaxin@hxmaster ~]$ sudo systemctl start cloudera-scm-server
[huaxin@hxmaster ~]$ sudo systemctl start cloudera-scm-agent
在每个slave节点启动Cloudera Manager的Client服务
[huaxin@hxmaster ~]$ sudo systemctl start cloudera-scm-agent
如果server和client启动没有错误,我们就能登录CM界面了:
创建parcel-repo文件夹:
[huaxin@hxmaster ~]$ mkdir -p /opt/cloudera/parcel-repo
将下面的文件拷贝到该目录下:
CDH-5.14.2-1.cdh5.14.2.p0.3-el7.parcel
CDH-5.14.2-1.cdh5.14.2.p0.3-el7.parcel.sha1
manifest.json
将CDH-5.14.2-1.cdh5.14.2.p0.3-el7.parcel.sha1命名为CDH-5.14.2-1.cdh5.14.2.p0.3-el7.parcel.sha
[huaxin@hxmaster parcel-repo]$ sudo mv CDH-5.14.2-1.cdh5.14.2.p0.3-el7.parcel.sha1 CDH-5.14.2-1.cdh5.14.2.p0.3-el7.parcel.sha
安装过程中出现以下情况:
需要在每个节点输入(root执行):
echo 0 > /proc/sys/vm/swappiness
echo never > /sys/kernel/mm/transparent_hugepage/defrag
echo never > /sys/kernel/mm/transparent_hugepage/enabled
然后重新运行集群设置
- 挂载目录
当安装完CDH时候,有时候我们的磁盘目录会很小,只有几十个G,这样会为后期的开发埋下隐患。下图是我将/data目录添加到Namenode、SecondaryNamenode和DataNode之后的情况,磁盘空间总共有300多个G,而刚装完CDH时只有40个G。
格式化挂载请参考我之前的一片文章,这里只是介绍如何将格式化后新开辟出300多个G的/data目录添加到HDFS的数据目录下。
在NamaNode下选择NameNode数据目录,依据它已有的目录/dfs/nn的格式,将/data/nn添加进来,
其它SecondaryNameNode和DataNode操作类似:
SecondaryNameNode,将/data/snn添加进来
DataNode,将/data/dn添加进来
安装过程中遇到的问题:
1.
PersistenceException: org.hibernate.exception.GenericJDBCException: Could not open connection
Caused by: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'entityManagerFactoryBean': FactoryBean threw exception on object creation; nested exception is javax.persistence.PersistenceException: org.hibernate.exception.GenericJDBCException: Could not open connection
Caused by: javax.persistence.PersistenceException: org.hibernate.exception.GenericJDBCException: Could not open connection
Caused by: org.hibernate.exception.GenericJDBCException: Could not open connection
Caused by: java.sql.SQLException: Connections could not be acquired from the underlying database!
Caused by: com.mchange.v2.resourcepool.CannotAcquireResourceException: A ResourcePool could not acquire a resource from its primary factory or source.
解决方案:
数据库连接错误,忘记了在主节点初始化CM5的数据库
/usr/share/cmf/schema/scm_prepare_database.sh mysql cm -h127.0.0.1 -uroot -p123456 --scm-host 127.0.0.1 scm scm scm
ERROR Failed fetching torrent: HTTP Error 404: Not Found
File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.14.2-py2.7.egg/cmf/https.py", line 221, in http_error_default
raise e
HTTPError: HTTP Error 404: Not Found
[12/Jul/2018 08:49:54 +0000] 38906 Thread-13 downloader ERROR Failed fetching torrent: HTTP Error 404: Not Found
Traceback (most recent call last):
File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.14.2-py2.7.egg/cmf/downloader.py", line 263, in download
cmf.https.ssl_url_opener.fetch_to_file(torrent_url, torrent_file)
File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.14.2-py2.7.egg/cmf/https.py", line 191, in fetch_to_file
resp = self.open(req_url)
File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.14.2-py2.7.egg/cmf/https.py", line 186, in open
return self.opener(url, *pargs, **kwargs)
File "/usr/lib64/python2.7/urllib2.py", line 127, in urlopen
return _opener.open(url, data, timeout)
File "/usr/lib64/python2.7/urllib2.py", line 410, in open
response = meth(req, response)
File "/usr/lib64/python2.7/urllib2.py", line 523, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib64/python2.7/urllib2.py", line 448, in error
return self._call_chain(*args)
File "/usr/lib64/python2.7/urllib2.py", line 382, in _call_chain
result = func(*args)
File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.14.2-py2.7.egg/cmf/https.py", line 221, in http_error_default
解决方法:
首先想到的是防火墙是不是忘了关闭,于是systemctl status firewalld.service,发现每个节点防火墙已经关闭,所以不是防火墙问题。
接着查看/opt/cloudera/parcel-repo/的各个文件权限,发现这些文件的用户和用户组都是root,这是由于用root用户上传的这些文件。所以原因基本定位了:文件是root用户的,所以CM在分发parcel包时是没有权限分发的。将这些文件的用户和用户组都改为huaxin,并赋予777权限,问题解决。
3
解决方法:
rm –f /opt/cloudera-manager/cm-5.11.1/lib/cloudera-scm-agent/cm_guid
4
first time: '/opt/cm-5.14.2/lib/cloudera-scm-agent/uuid'
[13/Jul/2018 16:55:19 +0000] 41501 MainThread __init__ INFO Agent UUID file was last modified at 2018-07-13 16:55:19.631289
[13/Jul/2018 16:55:19 +0000] 41501 MainThread agent INFO ================================================================================
[13/Jul/2018 16:55:19 +0000] 41501 MainThread agent INFO SCM Agent Version: 5.14.2
解决方法:
rm -rf /opt/cm-5.14.2/lib/cloudera-scm-agent/uuid
cloudera-scm-agent supervisor exist
解决方法:
Gred –aux supervisor
Kill -9 id
oozie安装报错:
Completed only 1/2 steps. First failure: Failed to execute command Create Oozie Database Tables on service Oozie Support admin Back
解决方法:
写入权限问题,root权限修改 chmod 755 -R /var/lib/oozie/