普通用户权限(sudo)安装CDH

在生产环境中,很多时候集群管理者并没有开放root权限给你来安装CDH,这时候管理者只会开放部分权限,这时涉及这些已经开放的权限时,你必须运用sudo执行。本文接下来先从root权限入手,通过开放部分权限给huaxin这个普通用户, 然后huaxin这个用户可以安装CDH。

  1. 创建用户和分配权限

创建普通用户(root操作,每个节点执行)

[root@localhost ~]# useradd huaxin
[root@localhost ~]# passwd huaxin
Changing password for user huaxin.
New password: 
Retype new password: 
passwd: all authentication tokens updated successfully.

给huaxin这个用户分配权限,在/etc/sudoers文件中添加以下内容:

User_Alias CDH_INSTALLER=huaxin
Cmnd_Alias CDH_CMD= /usr/bin/chown, /usr/sbin/service, /usr/bin/systemctl, /usr/bin/rm, /usr/bin/id, /usr/bin/install, /usr/sbin/chkconfig, /usr/bin/yum, /usr/bin/sed, /usr/bin/mv, /usr/sbin/ntpdate
CDH_INSTALLER     ALL=(ALL)       NOPASSWD:CDH_CMD
huaxin     ALL    =(ALL)   NOPASSWD: CDH_CMD
%huaxin     ALL    =(ALL)   NOPASSWD: CDH_CMD
cloudera-scm    ALL=(ALL)    NOPASSWD:ALL

  1. 网络配置

配置网关(root操作,每个节点执行)

[root@localhost ~]# vi /etc/sysconfig/network

master节点增加如下内容(其它节点HOSTNAME改为相应主机名,比如hxslave1、hxslave2等):

NETWORKING=yes
HOSTNAME=hxmaster

配置IP地址(root操作,每个节点执行)

[root@localhost ~]# vi /etc/sysconfig/network-scripts/ifcfg-ens3

将内容修改如下(IPV6的内容注释掉,IP地址和网关根据节点不同而修改):

TYPE="Ethernet"
BOOTPROTO="none"
DEFROUTE="yes"
IPV4_FAILURE_FATAL="no"
#IPV6INIT="yes"
#IPV6_AUTOCONF="yes"
#IPV6_DEFROUTE="yes"
#IPV6_FAILURE_FATAL="no"
NAME="ens3"
UUID="b8cc6007-9cc6-46da-8bff-b0a06a693996"
DEVICE="ens3"
ONBOOT="yes"
IPADDR="10.10.1.7"
PREFIX="24"
GATEWAY="10.10.1.254"
DNS1="202.101.172.35"
#IPV6_PEERDNS="yes"
#IPV6_PEERROUTES="yes"
#IPV6_PRIVACY="no"

修改IP地址和主机名映射(root操作):

[root@localhost ~]# vi /etc/hosts

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
10.10.1.7 hxmaster
10.10.1.8 hxslave1
10.10.1.9 hxslave2
10.10.1.10 hxslave3

scp到其它节点执行

scp /etc/hosts 10.10.1.8:/etc/
scp /etc/hosts 10.10.1.9:/etc/
scp /etc/hosts 10.10.1.10:/etc/

重启网络,是修改生效(root操作,每个节点执行):

[root@localhost ~]# systemctl restart network.service
  1. 关闭防火墙(root操作)

集群所有节点都需要关闭防火墙

(1) 关闭firewall

[root@localhost ~]# systemctl stop firewalld

关闭后查看firewalld状态

[root@localhost ~]# firewall-cmd --state

显示not running,表示关闭成功。
或者也可以通过以下命令查看防火墙状态:

[root@localhost ~]# systemctl status firewalld.service

(2) 禁止防火墙开机自启动

[root@localhost ~]# systemctl disable firewalld.service
  1. 关闭SElinux(root操作)

查看SElinux是否关闭(Enforcing表示开启)

[root@hxmaster ~]# getenforce
Enforcing

修改/etc/selinux/config文件,将SELINUX=enforcing改为SELINUX=disabled,执行该命令后重启网络生效

[root@hxmaster ~]# vi /etc/selinux/config


# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
#     enforcing - SELinux security policy is enforced.
#     permissive - SELinux prints warnings instead of enforcing.
#     disabled - No SELinux policy is loaded.
SELINUX=disabled
# SELINUXTYPE= can take one of three two values:
#     targeted - Targeted processes are protected,
#     minimum - Modification of targeted policy. Only selected processes are protected.
#     mls - Multi Level Security protection.
SELINUXTYPE=targeted
  1. SSH无密钥登录(普通用户操作)

考虑到后期在多台主机上配置SSH,为了避免登录每台操作这种繁琐工作,这里用到了shell脚本批量配置,其中用到expect,只要在master节点安装expect就行

安装expect:

yum install -y expect

将下面内容保存到batch_ssh.sh脚本(password根据你的主机来修改),batch_ssh.sh会读取当前目录下ip.txt这个文件,里面保存有集群的各个主机名和密码:

#!/bin/bash

#create local pub key
 expect -c "
   spawn ssh-keygen -t rsa 
     expect {
        "*key*" {send "\\n"; exp_continue}
	"*Overwrite*" {send "y\\n"; exp_continue}
        "*passphrase*" {send "\\n"; exp_continue}
        "*again*" {send "\\n"}
     }
"

ssh-add ~/.ssh/id_rsa



for ip in $(cat ip.txt)
do
 ip=$(echo "$ip" | cut -f1 -d ":")
 password=$(echo "$ip" | cut -f2 -d ":")

expect -c "
  spawn ssh-copy-id -i /home/huaxin/.ssh/id_rsa.pub $ip
        expect {
             "*yes/no*" {send "yes\\r"; exp_continue}
             "*password*" {send "123456\\r"; exp_continue}
             "*password*" {send "123456\\r";}
        }
"
done

将下面内容保存为ip.txt

10.10.1.7:123456
10.10.1.8:123456
10.10.1.9:123456
10.10.1.10:123456

执行这个batch_ssh.sh这个脚本就可完成集群ssh无密钥登录配置(普通用户执行):

[huaxin@hxmaster ~]$ bash bash_ssh.sh
  1. 安装JDK

每个节点的JDK都安装在/usr/java路径下(这是默认的安装路径):

[root@hxmaster java]# tar -zxf jdk-8u121-linux-x64.tar.gz 
[root@hxmaster java]# ls
jdk1.8.0_121  jdk-8u121-linux-x64.tar.gz
[root@hxmaster java]# pwd
/usr/java
[root@hxmaster java]# ll
total 178956
drwxr-xr-x. 8   10  143      4096 Dec 13  2016 jdk1.8.0_121
-rw-r--r--. 1 root root 183246769 Jul 11 10:18 jdk-8u121-linux-x64.tar.gz

修改huaxin用户的.bashrc文件:

[huaxin@hxmaster ~]$ vi .bashrc

添加以下内容:

export JAVA_HOME=/usr/java/jdk1.8.0_121
export PATH=$JAVA_HOME/bin:$PATH

将.bashrc文件scp到集群其它节点:

[huaxin@hxmaster ~]$ scp .bashrc hxslave1:/home/huaxin/
[huaxin@hxmaster ~]$ scp .bashrc hxslave2:/home/huaxin/
[huaxin@hxmaster ~]$ scp .bashrc hxslave3:/home/huaxin/

在每个节点source一下,使配置生效:

[huaxin@hxmaster ~]$ source .bashrc 

在huaxin用户目录下创建install/cloudera_manager目录,并将cm包拷贝到cloudera_manager目录下(每个节点操作)

[root@localhost ~]# su - huaxin
[huaxin@localhost ~]$ mkdir -p /home/huaxin/install/cloudera_manager

查看需要拷贝的内容:

[huaxin@localhost cloudera_manager]$ ll
total 771828
-rw-r--r--. 1 huaxin huaxin      9128 Jul 10 20:19 cloudera-cdh-5-0.x86_64.rpm
-rw-r--r--. 1 huaxin huaxin   9813720 Jul 10 20:19 cloudera-manager-agent-5.14.2-1.cm5142.p0.8.el7.x86_64.rpm
-rw-r--r--. 1 huaxin huaxin 780499508 Jul 10 20:20 cloudera-manager-daemons-5.14.2-1.cm5142.p0.8.el7.x86_64.rpm
-rw-r--r--. 1 huaxin huaxin      8692 Jul 10 20:19 cloudera-manager-server-5.14.2-1.cm5142.p0.8.el7.x86_64.rpm
-rw-r--r--. 1 huaxin huaxin     10608 Jul 10 20:19 cloudera-manager-server-db-2-5.14.2-1.cm5142.p0.8.el7.x86_64.rpm

  1. NTP对时

安装ntp:

 [huaxin@hxmaster ~]$ yum install -y ntp

修改master节点/etc/ntp.conf文件中,在文件中加入server 127.1.1.0,其中server设置127.127.1.0为其自身:

server 127.1.1.0

修改每个slave节点的/etc/ntp.conf文件,加入server 10.10.1.7,其中10.10.1.7是master节点ip地址。表示每个slave要和master节点时钟同步:

server 10.10.1.7

注意:在master节点和slave节点启动ntp和ntp校时的操作顺序有先后之别,在master节点先启动ntp再校时,在slave节点先校时然后再启动ntp。

master节点(先启动再校时):
开启ntp服务并开启开机自启动

[huaxin@hxmaster ~]$ sudo systemctl start ntpdate.service
[huaxin@hxmaster ~]$ sudo systemctl enable ntpdate.service
[huaxin@hxmaster ~]$ sudo ntpdate -u hxmaster
11 Jul 14:27:16 ntpdate[6107]: adjust time server 10.10.1.7 offset 0.000005 sec

有时使用ntpdate hxmaster会出现“11 Jul 14:27:03 ntpdate[6105]: the NTP socket is in use, exiting”,这是使用ntpdate -u hxmaster问题就可解决。

slave节点(先校时再启动):

[huaxin@hxslave1 ~]$ sudo ntpdate hxmaster
11 Jul 14:31:03 ntpdate[3427]: adjust time server 10.10.1.7 offset -0.015980 sec
[huaxin@hxslave1 ~]$ sudo systemctl start ntpdate.service
[huaxin@hxslave1 ~]$ sudo systemctl enable ntpdate.service
  1. 安装MySQL

这里用到了已经编译好的mysql zip包:

[huaxin@hxmaster ~]$ ll mysql-5.7.21-linux-x86_64.zip 
-rw-r--r--. 1 root root 629769438 Jul 11 14:54 mysql-5.7.21-linux-x86_64.zip

[huaxin@hxmaster ~]$ unzip mysql-5.7.21-linux-x86_64.zip

(1) 安装mysql服务(在mysql安装包目录下执行)
由于huaxin用户没有mysql/bin目录下文件的执行权限,需要root用户赋予777权限:

[root@hxmaster ~]# chmod 777 -R /home/huaxin/mysql/bin/

初始化mysqld

[huaxin@hxmaster mysql]$  bin/mysqld  --user=huaxin --basedir=/home/huaxin/mysql/ --datadir=/home/huaxin/data --initialize

记住生成的随机密码,第一次登录时需要,比如生成密码 mzx=EE6_=orW

(2) 新建my.cnf文件,配置如下

[client]

[mysqld]
max_connections=2000
innodb_file_per_table=1
innodb_thread_concurrency=32
innodb_buffer_pool_size=8G
innodb_buffer_pool_instances=2
innodb_open_files=5000

innodb_flush_method=O_DIRECT
innodb_log_file_size=256M
innodb_log_buffer_size=64M
innodb_flush_log_at_trx_commit=2
skip-name-resolve

join_buffer_size=1M
sort_buffer_size=2M
read_rnd_buffer_size=1M

basedir=/home/huaxin/mysql
datadir=/home/huaxin/data
socket=/tmp/mysql.sock
log-error=/home/huaxin/data/error.log
pid-file=/home/huaxin/data/mysql.pid
user=root

port=3306
character-set-server=utf8
#character_set_client=utf8

sql_mode=NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES
#set default system charset
character_set_server=utf8

(3) 在bin目录下,启动mysql服务:

bin/mysqld --defaults-file=/app/wlbd/mysql/my.cnf

有时候安装需要指定user为root,具体情况具体分析:

bin/mysqld --defaults-file=/app/wlbd/mysql/my.cnf --user=root

(4) 在bin目录下,开启客户端:

bin/mysql --defaults-file=./my.cnf -uroot -p

输入(1)生成的随机密码

第一次登录后会立刻要求修改密码

alter user 'root'@'localhost' identified by '123456';

创建CDH可能用到的数据库:

create database hive DEFAULT CHARSET utf8 COLLATE utf8_general_ci;
create database amon DEFAULT CHARSET utf8 COLLATE utf8_general_ci;
create database hue DEFAULT CHARSET utf8 COLLATE utf8_general_ci;
create database oozie DEFAULT CHARSET utf8 COLLATE utf8_general_ci;
create database sentry DEFAULT CHARSET utf8 COLLATE utf8_general_ci;

为了可以远程登录,增加一个’root’@’%’账号:

grant all privileges on *.* to 'root'@'%' identified by '123456' with grant option;

刷新权限,命令如下:

flush privileges;

修改.bashrc文件,添加以下内容:

export MYSQL_HOME=/home/huaxin/mysql
export PATH=$MYSQL_HOME/bin:$PATH

使修改生效:

[huaxin@hxmaster ~]$ source .bashrc 

将MySQL JDBC拷贝到/usr/share/java目录下

更改JDBC所属用户和用户组:

[root@hxmaster java]# chown huaxin:huaxin mysql-connector-java-5.1.42-bin.jar

将JDBC的jar包命名为 mysql-connector-java.jar

[root@hxmaster java]# ll
total 976
-rw-r--r--. 1 huaxin huaxin 996444 Jul 10 21:06 mysql-connector-java-5.1.42-bin.jar
[root@hxmaster java]# mv mysql-connector-java-5.1.42-bin.jar mysql-connector-java.jar 
[root@hxmaster java]# ll
total 976
-rw-r--r--. 1 huaxin huaxin 996444 Jul 10 21:06 mysql-connector-java.jar

  1. Cloudera Manager安装

安装rpm包,需要在集群所有节点执行:

创建cloudera_manager目录:

[huaxin@hxmaster ~]$ mkdir -p install/cloudera_manager

将以下rpm包拷贝到cloudera_manager目录下:

cloudera-cdh-5-0.x86_64.rpm
cloudera-manager-server-5.14.2-1.cm5142.p0.8.el7.x86_64.rpm
cloudera-manager-agent-5.14.2-1.cm5142.p0.8.el7.x86_64.rpm
cloudera-manager-server-db-2-5.14.2-1.cm5142.p0.8.el7.x86_64.rpm
cloudera-manager-daemons-5.14.2-1.cm5142.p0.8.el7.x86_64.rpm

安装rpm包

[huaxin@hxmaster cloudera_manager]$ sudo yum -y localinstall *.rpm

Cloudera Manager配置及本地源配置

[root@hxmaster ~]# vi /etc/cloudera-scm-agent/config.ini

将server_host=localhost 改为server_host=hxmaster

修改文件权限(root操作):
将下面命令保存到 chmod_file.sh

for i in {7..10};
do
  ssh 10.10.1.$i "chmod 777 -R /opt; chmod 777 -R /var/lib/cloudera-scm-server; chmod 777 -R /etc/default/cloudera-scm-server; chmod 777 -R /etc/default/cloudera-scm/agent; chmod 777 -R /var/log/cloudera-scm-server; chmod 777 -R /var/log/cloudera-scm-agent; chmod 777 -R /etc/cloudera-scm-agent; chmod 777 -R /etc/cloudera-scm-server; chmod 777 -R /usr/sbin/cmf-server; chmod 777 -R /usr/share/cmf; chmod 777 -R /usr/share/cmf/bin/cmf-server"
done

在主节点初始化CM5的数据库:

/usr/share/cmf/schema/scm_prepare_database.sh mysql cm -h127.0.0.1 -uroot -p123456 --scm-host 127.0.0.1 scm scm scm

集群的每个节点创建cloudera-scm用户

[huaxin@hxmaster ~]$ sudo useradd --system --home=/run/cloudera-scm-server  --no-create-home --shell=/bin/false --comment "Cloudera SCM User" cloudera-scm

在master节点启动Cloudera Manager的Server和Client服务:

[huaxin@hxmaster ~]$ sudo systemctl start cloudera-scm-server
[huaxin@hxmaster ~]$ sudo systemctl start cloudera-scm-agent

在每个slave节点启动Cloudera Manager的Client服务

[huaxin@hxmaster ~]$ sudo systemctl start cloudera-scm-agent

如果server和client启动没有错误,我们就能登录CM界面了:
这里写图片描述

创建parcel-repo文件夹:

[huaxin@hxmaster ~]$ mkdir -p /opt/cloudera/parcel-repo

将下面的文件拷贝到该目录下:
CDH-5.14.2-1.cdh5.14.2.p0.3-el7.parcel
CDH-5.14.2-1.cdh5.14.2.p0.3-el7.parcel.sha1
manifest.json

将CDH-5.14.2-1.cdh5.14.2.p0.3-el7.parcel.sha1命名为CDH-5.14.2-1.cdh5.14.2.p0.3-el7.parcel.sha

[huaxin@hxmaster parcel-repo]$ sudo mv CDH-5.14.2-1.cdh5.14.2.p0.3-el7.parcel.sha1 CDH-5.14.2-1.cdh5.14.2.p0.3-el7.parcel.sha

安装过程中出现以下情况:
这里写图片描述
需要在每个节点输入(root执行):

echo 0 > /proc/sys/vm/swappiness
echo never > /sys/kernel/mm/transparent_hugepage/defrag
echo never > /sys/kernel/mm/transparent_hugepage/enabled

然后重新运行集群设置
这里写图片描述

  1. 挂载目录

当安装完CDH时候,有时候我们的磁盘目录会很小,只有几十个G,这样会为后期的开发埋下隐患。下图是我将/data目录添加到Namenode、SecondaryNamenode和DataNode之后的情况,磁盘空间总共有300多个G,而刚装完CDH时只有40个G。
这里写图片描述
格式化挂载请参考我之前的一片文章,这里只是介绍如何将格式化后新开辟出300多个G的/data目录添加到HDFS的数据目录下。

在NamaNode下选择NameNode数据目录,依据它已有的目录/dfs/nn的格式,将/data/nn添加进来,
这里写图片描述
其它SecondaryNameNode和DataNode操作类似:

SecondaryNameNode,将/data/snn添加进来
这里写图片描述

DataNode,将/data/dn添加进来
这里写图片描述

安装过程中遇到的问题:
1.

PersistenceException: org.hibernate.exception.GenericJDBCException: Could not open connection
	 
Caused by: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'entityManagerFactoryBean': FactoryBean threw exception on object creation; nested exception is javax.persistence.PersistenceException: org.hibernate.exception.GenericJDBCException: Could not open connection
 
Caused by: javax.persistence.PersistenceException: org.hibernate.exception.GenericJDBCException: Could not open connection
	 
Caused by: org.hibernate.exception.GenericJDBCException: Could not open connection
 
Caused by: java.sql.SQLException: Connections could not be acquired from the underlying database!
 
Caused by: com.mchange.v2.resourcepool.CannotAcquireResourceException: A ResourcePool could not acquire a resource from its primary factory or source.

解决方案:
数据库连接错误,忘记了在主节点初始化CM5的数据库

/usr/share/cmf/schema/scm_prepare_database.sh mysql cm -h127.0.0.1 -uroot -p123456 --scm-host 127.0.0.1 scm scm scm
ERROR    Failed fetching torrent: HTTP Error 404: Not Found
File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.14.2-py2.7.egg/cmf/https.py", line 221, in http_error_default
    raise e
HTTPError: HTTP Error 404: Not Found
[12/Jul/2018 08:49:54 +0000] 38906 Thread-13 downloader   ERROR    Failed fetching torrent: HTTP Error 404: Not Found
Traceback (most recent call last):
  File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.14.2-py2.7.egg/cmf/downloader.py", line 263, in download
    cmf.https.ssl_url_opener.fetch_to_file(torrent_url, torrent_file)
  File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.14.2-py2.7.egg/cmf/https.py", line 191, in fetch_to_file
    resp = self.open(req_url)
  File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.14.2-py2.7.egg/cmf/https.py", line 186, in open
    return self.opener(url, *pargs, **kwargs)
  File "/usr/lib64/python2.7/urllib2.py", line 127, in urlopen
    return _opener.open(url, data, timeout)
  File "/usr/lib64/python2.7/urllib2.py", line 410, in open
    response = meth(req, response)
  File "/usr/lib64/python2.7/urllib2.py", line 523, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib64/python2.7/urllib2.py", line 448, in error
    return self._call_chain(*args)
  File "/usr/lib64/python2.7/urllib2.py", line 382, in _call_chain
    result = func(*args)
  File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.14.2-py2.7.egg/cmf/https.py", line 221, in http_error_default

解决方法:
首先想到的是防火墙是不是忘了关闭,于是systemctl status firewalld.service,发现每个节点防火墙已经关闭,所以不是防火墙问题。
接着查看/opt/cloudera/parcel-repo/的各个文件权限,发现这些文件的用户和用户组都是root,这是由于用root用户上传的这些文件。所以原因基本定位了:文件是root用户的,所以CM在分发parcel包时是没有权限分发的。将这些文件的用户和用户组都改为huaxin,并赋予777权限,问题解决。
这里写图片描述

3
这里写图片描述
解决方法:

rm –f /opt/cloudera-manager/cm-5.11.1/lib/cloudera-scm-agent/cm_guid

4

first time: '/opt/cm-5.14.2/lib/cloudera-scm-agent/uuid'
[13/Jul/2018 16:55:19 +0000] 41501 MainThread __init__     INFO     Agent UUID file was last modified at 2018-07-13 16:55:19.631289
[13/Jul/2018 16:55:19 +0000] 41501 MainThread agent        INFO     ================================================================================
[13/Jul/2018 16:55:19 +0000] 41501 MainThread agent        INFO     SCM Agent Version: 5.14.2

解决方法:

rm -rf /opt/cm-5.14.2/lib/cloudera-scm-agent/uuid
cloudera-scm-agent supervisor exist

解决方法:

Gred –aux supervisor
Kill -9 id

oozie安装报错:

Completed only 1/2 steps. First failure: Failed to execute command Create Oozie Database Tables on service Oozie Support admin Back

解决方法:
写入权限问题,root权限修改 chmod 755 -R /var/lib/oozie/

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值