CDH 6.X 的安装部署
安装之前你需要了解关于CDH6.X 的一些限制,如Hadoop的主要功能等。本文以6.3.2 为例,官网也给出了详细的安装步骤,这里参考了官网给出的安装方法进行了安装
配置基础环境
首先你要做基础环境的配置,若果你已经完成可以忽略本段落。
免密登录
配置root用户的免密登录
在master结点上配置单向免密登录纯属个人为了以后操作方便(可以不配置)
进入到root用户的根目录下,做如下的操作
-
vim workers
,将你的所有结点主机ip及hostname写到workers文件中(如果你的hostname还没有修改,直接写各个结点主机ip就可以了)192.168.7.XXX dev-master01 192.168.7.XXX dev-master02 192.168.7.XXX dev-master03 192.168.7.XXX dev-worker01 192.168.7.XXX dev-worker02 192.168.7.XXX dev-worker03 192.168.7.XXX dev-worker04 192.168.7.XXX dev-worker05 192.168.7.XXX dev-worker06 192.168.7.XXX dev-worker07 192.168.7.XXX dev-worker08 192.168.7.XXX dev-worker09
-
vim keygen.sh
,编辑免密登录的脚本赋权并运行#!/bin/bash CURRENT_DIR=$(cd $(dirname $0);pwd) UNAME=root port=22 passwd=123456 #安装所需服务 yum install expect -y # 判断id_rsa密钥文件是否存在 if [ ! -f ~/.ssh/id_rsa ];then echo "======================>创建密钥<====================" # -t rsa ~密钥类型 -f 密钥文件路径及名称 ssh-keygen -t rsa -P "" -f ~/.ssh/id_rsa else echo "=======>id_rsa has created ..." fi while read pair do host=${pair#* } ip=${pair% *} echo "======================> distribute to $ip/$host <====================" expect <<EOF spawn ssh-copy-id -i $CURRENT_DIR/.ssh/id_rsa.pub $UNAME@$ip -p$port expect { "*yes/no" { send "yes\r"; exp_continue } "*password:" { send "$passwd\r" } } expect "password" { send "$passwd\n" } EOF done < $CURRENT_DIR/workers
配置hadoop用户的双向免密登录
在master结点上以root用户操作,因为已经做了root 单向免密登录
创建 hadoop用户
1、创建用户
vim user.sh
,编辑脚本赋权并运行
URRENT_DIR=$(cd $(dirname $0);pwd)
UNAME=root
port=22
while read pair
do
host=${pair#* }
ip=${pair% *}
#创建hadoop用户
echo "=================>create user ===============> "
#-p 参数是加密后的密码,所以需要重新修改密码
ssh -n -p$port $UNAME@$ip adduser -m hadoop -p 123456 -g wheel
#如果 /etc/sudoers 文件中没有放开 %wheel ALL=(ALL) ALL 则就添加该行命令
ssh -n -p$port $UNAME@$ip "echo "%wheel ALL='\(ALL\)' NOPASSWD: ALL" | tee -a /etc/sudoers"
done < /root/workers
2、修改密码
vim passwd.sh
,重新修改密码,编辑脚本赋权并运行
#!/usr/bin/expect
spawn passwd hadoop
expect "*password"
send "vipbdhadoop\n"
expect "*password"
send "123456\n"
expect eof
vim tmp.sh
,将 passwd.sh 分发到各个节点并运行 ./tmp.sh
#!/bin/bash
CURRENT_DIR=$(cd $(dirname $0);pwd)
UNAME=root
port=22
while read pair
do
host=${pair#* }
ip=${pair% *}
echo "=================>distribute passwd.sh to $ip/$host$ <==============="
scp -rp -P $port /root/passwd.sh $UNAME@$ip:/root/
ssh -n -p$port $UNAME@$ip yum install expect -y
echo "=================>passwd $ip/$host$ <==============="
ssh -n -p$port $UNAME@$ip /root/passwd.sh
done < /root/workers
配置各个结点的hadoop用户双向免密登录
进入/home/hadoop/ 目录下,以hadoop用户执行一下命令
-
编辑workers文件,同上
-
vim keygen.sh,逻辑同上
#!/bin/bash CURRENT_DIR=$(cd $(dirname $0);pwd) UNAME=hadoop port=22 passwd=hadoop # 判断id_rsa密钥文件是否存在 if [ ! -f ~/.ssh/id_rsa ];then echo "======================>创建密钥====================>" # -t rsa ~密钥类型 -f 密钥文件路径及名称 ssh-keygen -t rsa -P "" -f ~/.ssh/id_rsa else echo "=======>id_rsa has created ..." fi while read pair do host=${pair#* } ip=${pair% *} echo "======================>distribute to $ip/$host====================>" expect <<EOF spawn ssh-copy-id -i $CURRENT_DIR/.ssh/id_rsa.pub $UNAME@$ip -p$port expect { "*yes/no" { send "yes\r"; exp_continue } "*password:" { send "$passwd\r" } } expect "password" { send "$passwd\n" } EOF done < $CURRENT_DIR/workers
-
将keygen.sh分发到各个结点并运行,这里将这一步骤放到安装基础环境中的脚本中一起执行
安装基础服务及依赖
以下在hadoop用户的根目录下用hadoop用户操作
安装基础服务
需要安装mysql、 JDK、配置ntp时间同步和一些基础依赖等
-
在官网下载 jdk的安装包,编辑ntp.conf文件(时钟同步),编辑hosts(用来替换/etc/hosts文件)
vim ntp.conf
driftfile /var/lib/ntp/drift pidfile /var/run/ntpd.pid logfile /var/log/ntp.log # Access Control Support restrict default kod nomodify notrap nopeer noquery restrict -6 default kod nomodify notrap nopeer noquery restrict 127.0.0.1 restrict 192.168.0.0 mask 255.255.0.0 nomodify notrap nopeer noquery restrict 172.16.0.0 mask 255.240.0.0 nomodify notrap nopeer noquery restrict 100.64.0.0 mask 255.192.0.0 nomodify notrap nopeer noquery restrict 10.0.0.0 mask 255.0.0.0 nomodify notrap nopeer noquery # local clock server 127.127.1.0 fudge 127.127.1.0 stratum 10 restrict ntp1.aliyun.com nomodify notrap nopeer noquery restrict ntp1.cloud.aliyuncs.com nomodify notrap nopeer noquery restrict ntp10.cloud.aliyuncs.com nomodify notrap nopeer noquery restrict ntp11.cloud.aliyuncs.com nomodify notrap nopeer noquery restrict ntp12.cloud.aliyuncs.com nomodify notrap nopeer noquery restrict ntp2.aliyun.com nomodify notrap nopeer noquery restrict ntp2.cloud.aliyuncs.com nomodify notrap nopeer noquery restrict ntp3.aliyun.com nomodify notrap nopeer noquery restrict ntp3.cloud.aliyuncs.com nomodify notrap nopeer noquery restrict ntp4.aliyun.com nomodify notrap nopeer noquery restrict ntp4.cloud.aliyuncs.com nomodify notrap nopeer noquery restrict ntp5.aliyun.com nomodify notrap nopeer noquery restrict ntp5.cloud.aliyuncs.com nomodify notrap nopeer noquery restrict ntp6.aliyun.com nomodify notrap nopeer noquery restrict ntp6.cloud.aliyuncs.com nomodify notrap nopeer noquery restrict ntp7.cloud.aliyuncs.com nomodify notrap nopeer noquery restrict ntp8.cloud.aliyuncs.com nomodify notrap nopeer noquery restrict ntp9.cloud.aliyuncs.com nomodify notrap nopeer noquery server ntp1.aliyun.com iburst minpoll 4 maxpoll 10 server ntp1.cloud.aliyuncs.com iburst minpoll 4 maxpoll 10 server ntp10.cloud.aliyuncs.com iburst minpoll 4 maxpoll 10 server ntp11.cloud.aliyuncs.com iburst minpoll 4 maxpoll 10 server ntp12.cloud.aliyuncs.com iburst minpoll 4 maxpoll 10 server ntp2.aliyun.com iburst minpoll 4 maxpoll 10 server ntp2.cloud.aliyuncs.com iburst minpoll 4 maxpoll 10 server ntp3.aliyun.com iburst minpoll 4 maxpoll 10 server ntp3.cloud.aliyuncs.com iburst minpoll 4 maxpoll 10 server ntp4.aliyun.com iburst minpoll 4 maxpoll 10 server ntp4.cloud.aliyuncs.com iburst minpoll 4 maxpoll 10 server ntp5.aliyun.com iburst minpoll 4 maxpoll 10 server ntp5.cloud.aliyuncs.com iburst minpoll 4 maxpoll 10 server ntp6.aliyun.com iburst minpoll 4 maxpoll 10 server ntp6.cloud.aliyuncs.com iburst minpoll 4 maxpoll 10 server ntp7.cloud.aliyuncs.com iburst minpoll 4 maxpoll 10 server ntp8.cloud.aliyuncs.com iburst minpoll 4 maxpoll 10 server ntp9.cloud.aliyuncs.com iburst minpoll 4 maxpoll 10
vim hosts
#ip hostname 192.168.76.XXX dev-master01 192.168.76.XXX dev-master02 192.168.76.XXX dev-master03 192.168.76.XXX dev-worker01 192.168.76.XXX dev-worker02 192.168.76.XXX dev-worker03 192.168.76.XXX dev-worker04 192.168.76.XXX dev-worker05 192.168.76.XXX dev-worker06 192.168.76.XXX dev-worker07 192.168.76.XXX dev-worker08 192.168.76.XXX dev-worker09
-
编辑安装基础服务脚本
vim env.sh
,赋权并执行#!/bin/bash #Auther: chezhao #Description:Installation base dependency CURRENT_DIR=$(cd $(dirname $0);pwd) UNAME=hadoop port=22 JDK_NAME=jdk-8u191-linux-x64.rpm JDK_NAME_install=jdk1.8.0_191-amd64 keygen=keygen.sh parm1="echo never \> /sys/kernel/mm/transparent_hugepage/defrag" parm2="echo never \> /sys/kernel/mm/transparent_hugepage/enabled" echo "=====================>下载 mysql jdbc 驱动<============================" wget https://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.46.tar.gz \ && tar zxvf mysql-connector-java-5.1.46.tar.gz \ && sudo mkdir -p /usr/share/java/ \ && sudo cp mysql-connector-java-5.1.46/mysql-connector-java-5.1.46-bin.jar /usr/share/java/mysql-connector-java.jar \ && rm -rf mysql-connector-java-5.1.46 while read pair do host=${pair#* } ip=${pair% *} echo "=====================>安装一些基础软件<============================" ssh -n -p$port $UNAME@$ip sudo yum update ssh -n -p$port $UNAME@$ip sudo yum install -y ntp vim lsof psmisc krb5-devel cyrus-sasl-gssapi cyrus-sasl-deve libxml2-devel libxslt-devel mysql mysql-devel openldap-devel python-devel python-simplejson sqlite-devel httpd mod_ssl net-tools lrzsz nfs-utils rpcbind gcc python-devel cyrus-sasl* unzip zip expect wget vim* --skip-broken echo "=====================>配置hadoop用户双向免密登录<============================" scp -P $port $CURRENT_DIR/$keygen $UNAME@$ip:/home/$UNAME scp -P $port $CURRENT_DIR/workers $UNAME@$ip:/home/$UNAME ssh -n -p$port $UNAME@$ip /home/hadoop/$keygen echo "=====================>修改/etc/hosts 文件<============================" scp -P $port $CURRENT_DIR/hosts $UNAME@$ip:/home/$UNAME/ ssh -n -p$port $UNAME@$ip sudo cp -r /home/$UNAME/hosts /etc/hosts echo "=====================> 修改hostname <============================" ssh -n -p$port $UNAME@$ip sudo hostname $host ssh -n -p$port $UNAME@$ip "echo $host | sudo tee /etc/sysconfig/network" ssh -n -p$port $UNAME@$ip "echo "NOZEROCONF=yes" | sudo tee -a /etc/sysconfig/network" ssh -n -p$port $UNAME@$ip sudo hostnamectl set-hostname $host echo "=====================>启动相关基础服务<============================" ssh -n -p$port $UNAME@$ip sudo systemctl start httpd ssh -n -p$port $UNAME@$ip sudo systemctl status httpd echo "=====================>修改swap<============================" ssh -n -p$port $UNAME@$ip "echo "0" | sudo tee /proc/sys/vm/swappiness" ssh -n -p$port $UNAME@$ip "echo "vm.swappiness=0" | sudo tee -a /etc/sysctl.conf" # 若果是阿里云的主机则不用配置 echo "=====================>配置ntp<============================" scp -P $port $CURRENT_DIR/ntp.conf $UNAME@$ip:/home/$UNAME/ ssh -n -p$port $UNAME@$ip sudo cp -r /home/$UNAME/ntp.conf /etc/ntp.conf ssh -n -p$port $UNAME@$ip sudo systemctl restart ntpd ssh -n -p$port $UNAME@$ip sudo systemctl enable ntpd echo "=====================> 安装jdk <============================" ssh -n -p$port $UNAME@$ip sudo mkdir /usr/java/ scp -P $port $CURRENT_DIR/$JDK_NAME $UNAME@$ip:/home/$UNAME/ ssh -n -p$port $UNAME@$ip sudo cp /home/$UNAME/$JDK_NAME /usr/java/ ssh -n -p$port $UNAME@$ip sudo rpm -i /usr/java/$JDK_NAME ssh -n -p$port $UNAME@$ip sudo ln -s /usr/java/$JDK_NAME_install/ /usr/java/jdk ssh -n -p$port $UNAME@$ip "echo "export JAVA_HOME=/usr/java/jdk" | sudo tee -a /etc/profile" ssh -n -p$port $UNAME@$ip "echo "export PATH=\\\$JAVA_HOME/bin:\\\$PATH" | sudo tee -a /etc/profile" ssh -n -p$port $UNAME@$ip sudo -s source /etc/profile echo "=====================> 关闭透明大页面 <============================" ssh -n -p$port $UNAME@$ip sudo echo never \> /sys/kernel/mm/transparent_hugepage/defrag ssh -n -p$port $UNAME@$ip sudo echo never \> /sys/kernel/mm/transparent_hugepage/enabled ssh -n -p$port $UNAME@$ip sudo "echo $parm1 | tee -a /etc/rc.local" ssh -n -p$port $UNAME@$ip sudo "echo $parm2 | tee -a /etc/rc.local" echo "===========>关闭防火墙、SELinux<============================" #一般都是关闭的,当然你可以运行 getenforce 检查下你的环境 done < $CURRENT_DIR/workers
安装Mysql
-
进入Mysql 客户端,创建安装CDH时所需的数据库
create database scm DEFAULT CHARSET utf8 COLLATE utf8_general_ci; create database amon DEFAULT CHARSET utf8 COLLATE utf8_general_ci; create database rman DEFAULT CHARSET utf8 COLLATE utf8_general_ci; create database oozie DEFAULT CHARSET utf8 COLLATE utf8_general_ci; create database hive DEFAULT CHARSET utf8 COLLATE utf8_general_ci; create database hue DEFAULT CHARSET utf8 COLLATE utf8_general_ci; create database sentry DEFAULT CHARSET utf8 COLLATE utf8_general_ci; create database nav DEFAULT CHARSET utf8 COLLATE utf8_general_ci; create database navms DEFAULT CHARSET utf8 COLLATE utf8_general_ci;
安装CM和CDH
这里以CM6.3.1,CDH6.3.2为例
- vim install_cdh.sh 并运行(nohup /home/hadoop/install_cdh.sh >install_cdh.log &)
#!/bin/bash
#Auther: chezhao
#Description:Installation CM 、CDH
CURRENT_DIR=$(cd $(dirname $0);pwd)
master_hostname=dev-master
UNAME=hadoop
mysql_user=root
mysql_pwd=root
cdh_soft_dir=cdh
cloudera_manager_daemons=cloudera-manager-daemons-6.3.1-1466458.el7.x86_64.rpm
cloudera_manager_agent=cloudera-manager-agent-6.3.1-1466458.el7.x86_64.rpm
echo "==================>下载CM <============================"
sudo mkdir -p /var/www/html/cloudera-repos/cm6
wget https://archive.cloudera.com/cm6/6.3.1/repo-as-tarball/cm6.3.1-redhat7.tar.gz \
&& sudo tar xvfz cm6.3.1-redhat7.tar.gz -C /var/www/html/cloudera-repos/cm6 --strip-components=1
echo "==================> 在master 结点上安装 CM server <============================"
sudo yum -y install /var/www/html/cloudera-repos/cm6/RPMS/x86_64/cloudera-manager-daemons-6.3.1-1466458.el7.x86_64.rpm
sudo yum -y install /var/www/html/cloudera-repos/cm6/RPMS/x86_64/cloudera-manager-server-6.3.1-1466458.el7.x86_64.rpm
echo "==================> 为cm 创建数据库 <============================"
sudo /opt/cloudera/cm/schema/scm_prepare_database.sh mysql scm $mysql_user $mysql_pwd
echo "==================> 启动cm 服务 <============================"
sudo systemctl start cloudera-scm-server
echo "==================> 查看cm 状态 <============================"
sudo systemctl status cloudera-scm-agent
echo "==================> 复制 agent 的安装包到hadoop用户的根目录下 <===================="
mkdir /home/$UNAME/$cdh_soft_dir
cp -r /var/www/html/cloudera-repos/cm6/RPMS/x86_64/cloudera-manager-daemons-6.3.1-1466458.el7.x86_64.rpm /home/$UNAME/$cdh_soft_dir
cp -r /var/www/html/cloudera-repos/cm6/RPMS/x86_64/cloudera-manager-agent-6.3.1-1466458.el7.x86_64.rpm /home/$UNAME/$cdh_soft_dir
echo "==================>下载CDH parcel<============================"
sudo wget https://archive.cloudera.com/cdh6/6.3.2/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554-el7.parcel
sudo wget https://archive.cloudera.com/cdh6/6.3.2/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554-el7.parcel.sha1
sudo wget https://archive.cloudera.com/cdh6/6.3.2/parcels/manifest.json
echo "===========>将 parcel 复制到 /opt/cloudera/parcel-repo 目录下<==========="
sudo mkdir -p /opt/cloudera/parcel-repo
sudo chown -R cloudera-scm:cloudera-scm /opt/cloudera/parcel-repo
sudo cp $CURRENT_DIR/manifest.json /opt/cloudera/parcel-repo
sudo cp $CURRENT_DIR/*.parcel /opt/cloudera/parcel-repo
#这里将*.sha1 改名为 .sha,防止重新下载
sudo cp $CURRENT_DIR/*.parcel.sha1 /opt/cloudera/parcel-repo/CDH-6.3.2-1.cdh6.3.2.p0.1605554-el7.parcel.sha
while read pair
do
host=${pair#* }
ip=${pair% *}
echo "==================> 向 $host 结点分发aget安装包 <============================"
ssh -n -p$port $UNAME@$ip mkdir /home/$UNAME/$cdh_soft_dir
scp -r -P $port /home/$UNAME/$cdh_soft_dir/$cloudera_manager_daemons $UNAME@$ip:/home/$UNAME/$cdh_soft_dir && \
scp -r -P $port /home/$UNAME/$cdh_soft_dir/$cloudera_manager_agent $UNAME@$ip:/home/$UNAME/$cdh_soft_dir
echo "==========>向$host结点分发aget安装包成功,开始安装agent相关服务 <==========="
ssh -n -p$port $UNAME@$ip sudo yum -y install /home/$UNAME/$cdh_soft_dir/$cloudera_manager_daemons && \
ssh -n -p$port $UNAME@$ip sudo yum -y install /home/$UNAME/$cdh_soft_dir/$cloudera_manager_agent && \
echo "==================> $host 结点安装agent相关服务成功 <======================"
echo "==================> 修改agent配置文件,修改server_host为CM的主机 <============================"
ssh -n -p$port $UNAME@$ip sudo sed -i 's/server_host=localhost/server_host=dev-master/' /etc/cloudera-scm-agent/config.ini && \
echo "==================> 启动 agent <============================"
ssh -n -p$port $UNAME@$ip sudo systemctl start cloudera-scm-agent
echo "==================> $host 结点的agent 状态 <============================"
ssh -n -p$port $UNAME@$ip sudo systemctl status cloudera-scm-agent
done < $CURRENT_DIR/workers
- 访问 WEBUI ,master:7180,按照指导操作就可以了
安装后遇到的问题
HDFS EC 部署问题
HDFS Erasure Coding
HDFS EC 是Hadoop 3.0 的用来存储数据的新特性,可以节约50%的HDFS的存储空间,但建议对于数据量较大的冷数据采用这种方式存储,对于一般的温、热数据还是建议采用原来的三个副本的方式来存储数据,没有必要开启EC。CDH 官网中也给出了相关说明见:https://docs.cloudera.com/documentation/enterprise/6/latest/topics/admin_hdfs_deployec.html
Hdfs 权限的问题
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=root, access=WRITE, inode="/user":hdfs:supergroup:drwxr-xr-x
解决办法:
# 1 在Linux执行如下命令增加 supergroup
groupadd supergroup
# 2 如将用户root增加到 supergroup 中
usermod -a -G supergroup root
# 3 同步系统的权限信息到HDFS文件系统
sudo -u hdfs hdfs dfsadmin -refreshUserToGroupsMappings
# 4 查看属于 supergroup 用户组的用户
grep 'supergroup:' /etc/group
Hive 问题
Hive 中文注释显示乱码问题
show create database hive ;
#1、原编码为utf8,将 hive 数据库的编码改为latin1
alter database hive default character set latin1 ;
use hive;
#原 COLUMNS_V2 的编码为latin1,改为 utf8
alter table COLUMNS_V2 modify column COMMENT varchar(256) character set utf8;
alter table TABLE_PARAMS modify column PARAM_VALUE varchar(4000) character set utf8;
alter table PARTITION_PARAMS modify column PARAM_VALUE varchar(4000) character set utf8;
alter table PARTITION_KEYS modify column PKEY_COMMENT varchar(4000) character set utf8;
alter table INDEX_PARAMS modify column PARAM_VALUE varchar(4000) character set utf8;
Hive 在查询时不显示字段名
- 解决方式一(为了方便)
hive 在查询时不显示字段名,需要进行配置设置才会显示;
进入hive 客户端:
hive> set hive.cli.print.header=true;
此时查询结果各字段名带有表名,需要进一步设置
hive> set hive.resultset.use.unique.column.names=false;
-
解决方案二(修改hive-site.xml文件的配置)
<property> <name>hive.cli.print.header</name> <value>true</value> </property> <property> <name>hive.resultset.use.unique.column.names</name> <value>false</value> </property>
在CDH 界面中修改配置如图:
hive中创建JSON格式的表时会报错:
Cannot validate serde: org.openx.data.jsonserde.JsonSerDe
#需要将该jar包cop到hive的lib目录下
scp -rp -P $port /opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-hcatalog-core-2.1.1-cdh6.3.2.jar $UNAME@$ip:/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/hive/lib
Impala 时区问题
impala默认对于timestamp都是当成UTC来处理的,并不会做任何的时区转换。这也就是说,当你写入一个timestamp的数据时,impala就会把它当成是UTC的时间存起来,而不是本地时间
在cdh里面,impala->配置->mpala Daemo ->Impala Daemon 命令行参数高级配置代码段(安全阀)
-use_local_tz_for_unix_timestamp_conversions=true
如图:
配置Oozie 时区
配置Hue 时区:
Asia/Shanghai
移动端见个人公众号文章: 大数据理论与实战
个人博客网站见: 个人博客