Kerberos技术实践V1.0


1 Kerberos概述

1.1 什么是Kerberos

Kerberos是一种计算机网络授权协议,用来在非安全网络中,对个人通信以安全的手段进行身份认证。这个词又指麻省理工学院为这个协议开发的一套计算机软件。软件设计上采用客户端/服务器结构,并且能够进行相互认证,即客户端和服务器端均可对对方进行身份认证。可以用于防止窃听、防止重放攻击、保护数据完整性等场合,是一种应用对称密钥体制进行密钥管理的系统。
Kerberos不是k8s,Kubernetes简称k8s,是一个开源的,用于管理云平台中多个主机上的容器化的应用,跟我们今天学的Kerberos是两回事,大家切记一定不要搞混。

1.2 Kerberos概念

Kerberos中有以下一些概念需要了解:
1)KDC:密钥分发中心,负责管理发放票据,记录授权。
2)Realm:Kerberos管理领域的标识。
3)principal:当每添加一个用户或服务的时候都需要向kdc添加一条principal,principl的形式为:主名称/实例名@领域名。
4)主名称:主名称可以是用户名或服务名,表示是用于提供各种网络服务(如hdfs,yarn,hive)的主体。
5)实例名:实例名简单理解为主机名。
关于 Kerberos 更多的原理讲解可参考以下链接:
https://cloud.tencent.com/developer/article/1496451

1.3 Kerberos认证原理

在这里插入图片描述

2 Kerberos安装

2.1 server节点安装kerberos相关软件

[root@hadoop102 ~]# yum install -y krb5-server krb5-workstation krb5-libs
#查看结果
[root@hadoop102 ~]# rpm -qa | grep krb5
krb5-workstation-1.10.3-65.el6.x86_64
krb5-libs-1.10.3-65.el6.x86_64
krb5-server-1.10.3-65.el6.x86_64

2.2 client节点安装

[root@hadoop103 ~]# yum install -y krb5-workstation krb5-libs
[root@hadoop104 ~]# yum install -y krb5-workstation krb5-libs
#查看结果
[root@hadoop103 ~]# rpm -qa | grep krb5
krb5-workstation-1.10.3-65.el6.x86_64
krb5-libs-1.10.3-65.el6.x86_64

2.3 配置kerberos

需要配置的文件有两个为kdc.conf和krb5.conf , kdc配置只是需要Server服务节点配置,即hadoop102.
1) kdc配置

[root@hadoop102 ~]# vim /var/kerberos/krb5kdc/kdc.conf
[kdcdefaults]
 kdc_ports = 88
 kdc_tcp_ports = 88

[realms]
 HADOOP.COM = {
  #master_key_type = aes256-cts
  acl_file = /var/kerberos/krb5kdc/kadm5.acl
  dict_file = /usr/share/dict/words
  admin_keytab = /var/kerberos/krb5kdc/kadm5.keytab
  max_life = 1d
  max_renewable_life = 7d
  supported_enctypes = aes128-cts:normal des3-hmac-sha1:normal arcfour-hmac:normal camellia256-cts:normal camellia128-cts:normal des-hmac-sha1:normal des-cbc-md5:normal des-cbc-crc:normal
 }

说明
HADOOP.COM:realm名称,Kerberos支持多个realm,一般全用大写。
acl_file:admin的用户权。
admin_keytab:KDC进行校验的keytab。
supported_enctypes:支持的校验方式,注意把aes256-cts去掉,JAVA使用aes256-cts验证方式需要安装额外的jar包,所有这里不用。
2) krb5文件配置

[root@hadoop102 ~]# vim /etc/krb5.conf
includedir /etc/krb5.conf.d/

[logging]
 default = FILE:/var/log/krb5libs.log
 kdc = FILE:/var/log/krb5kdc.log
 admin_server = FILE:/var/log/kadmind.log

[libdefaults]
 dns_lookup_realm = false
 ticket_lifetime = 24h
 renew_lifetime = 7d
 forwardable = true
 rdns = false
 pkinit_anchors = /etc/pki/tls/certs/ca-bundle.crt
 default_realm = HADOOP.COM
 #default_ccache_name = KEYRING:persistent:%{uid}
 udp_preference_limit = 1
[realms]
 HADOOP.COM = {
  kdc = hadoop102
  admin_server = hadoop102
}

[domain_realm]
# .example.com = EXAMPLE.COM
# example.com = EXAMPLE.COM

说明
default_realm:默认的realm,设置 Kerberos 应用程序的默认领域,必须跟要配置的realm的名称一致。
ticket_lifetime:表明凭证生效的时限,一般为24小时。
renew_lifetime : 表明凭证最长可以被延期的时限,一般为一个礼拜。当凭证过期之后,对安全认证的服务的后续访问则会失败。
udp_preference_limit= 1:禁止使用 udp,可以防止一个 Hadoop 中的错误。
realms:配置使用的 realm,如果有多个领域,只需向 [realms] 节添加其他的语句。
domain_realm:集群域名与Kerberos realm的映射关系,单个realm的情况下,可忽略。
3)同步krb5到Client节点

[root@hadoop102 ~]# xsync /etc/krb5.conf

2.4 生成Kerberos数据库

在server节点执行

[root@hadoop102 ~]# kdb5_util create -s
Loading random data
Initializing database '/var/kerberos/krb5kdc/principal' for realm 'HADOOP.COM',
master key name 'K/M@HADOOP.COM'
You will be prompted for the database Master Password.
It is important that you NOT FORGET this password.
Enter KDC database master key: (输入密码)
Re-enter KDC database master key to verify:(确认密码)

创建完成后/var/kerberos/krb5kdc目录下会生成对应的文件

[root@hadoop102 ~]# ls /var/kerberos/krb5kdc/
kadm5.acl  kdc.conf  principal  principal.kadm5  principal.kadm5.lock  principal.ok

2.5 赋予Kerberos管理员所有权限

[root@hadoop102 ~]# vim /var/kerberos/krb5kdc/kadm5.acl
#修改为以下内容:
*/admin@HADOOP.COM      *

说明:
*/admin:admin实例的全部主体
@HADOOP.COM:realm
*:全部权限
这个授权的意思:就是授予admin实例的全部主体对应HADOOP.COM领域的全部权限。也就是创建Kerberos主体的时候如果实例为admin,就具有HADOOP.COM领域的全部权限,比如创建如下的主体user1/admin就拥有全部的HADOOP.COM领域的权限。

2.6 启动Kerberos服务

#启动krb5kdc
[root@hadoop102 ~]# systemctl start krb5kdc
正在启动 Kerberos 5 KDC:                                  [确定]

#启动kadmin
[root@hadoop102 ~]# systemctl start kadmin
正在启动 Kerberos 5 Admin Server:                         [确定]

#设置开机自启
[root@hadoop102 ~]# systemctl enable krb5kdc
#查看是否设置为开机自启
[root@hadoop102 ~]# systemctl is-enabled krb5kdc

[root@hadoop102 ~]# systemctl enable kadmin
#查看是否设置为开机自启
[root@hadoop102 ~]# systemctl is-enabled kadmin

注意:启动失败时可以通过/var/log/krb5kdc.log和/var/log/kadmind.log来查看。

2.7 创建管理员主体/实例

[root@hadoop102 ~]# kadmin.local -q "addprinc admin/admin"
Authenticating as principal root/admin@HADOOP.COM with password.
WARNING: no policy specified for admin/admin@HADOOP.COM; defaulting to no policy
Enter password for principal "admin/admin@HADOOP.COM": (输入密码)
Re-enter password for principal "admin/admin@HADOOP.COM": (确认密码)
Principal "admin/admin@HADOOP.COM" created.

2.8 kinit管理员验证

[root@hadoop102 ~]# kinit admin/admin
Password for admin/admin@HADOOP.COM: (输入密码)
[root@hadoop102 ~]# klist
Ticket cache: FILE:/tmp/krb5cc_0
Default principal: admin/admin@HADOOP.COM

Valid starting     Expires            Service principal
08/27/19 14:41:39  08/28/19 14:41:39  krbtgt/HADOOP.COM@HADOOP.COM
        renew until 08/27/19 14:41:39

出现以上红色admin/admin@HADOOP.COM说明没问题
在其他机器尝试:

[root@hadoop103 ~]# kinit admin/admin
Password for admin/admin@HADOOP.COM: (输入密码)

[root@hadoop103 ~]# klist
Ticket cache: FILE:/tmp/krb5cc_0
Default principal: admin/admin@HADOOP.COM

Valid starting     Expires            Service principal
08/27/19 14:41:39  08/28/19 14:41:39  krbtgt/HADOOP.COM@HADOOP.COM
        renew until 08/27/19 14:41:39

如果出现:kadmin: GSS-API (or Kerberos) error while initializing kadmin interface,则重启ntp服务:

[root@hadoop103 ~]# service ntpd restart
关闭 ntpd:                                                [确定]
正在启动 ntpd:

3 Kerberos数据库操作

3.1 登录Kerberos数据库

1)本地登录(无需认证)

[root@hadoop102 ~]# kadmin.local 
Authenticating as principal root/admin@HADOOP.COM with password.
kadmin.local:  

2)远程登录(需进行主体认证,先认证刚刚创建的管理员主体)

[root@hadoop103 ~]# kadmin
Authenticating as principal admin/admin@HADOOP.COM with password.
Password for admin/admin@HADOOP.COM: 
kadmin:  

退出输入:exit

3.2 创建Kerberos主体

[root@hadoop102 ~]# kadmin.local -q "addprinc zhouchen/zhouchen"
Authenticating as principal root/admin@HADOOP.COM with password.
WARNING: no policy specified for zhouchen/zhouchen@HADOOP.COM; defaulting to no policy
Enter password for principal "zhouchen/zhouchen@HADOOP.COM": (输入密码)
Re-enter password for principal "zhouchen/zhouchen@HADOOP.COM": (输入密码)
Principal "admin/admin@HADOOP.COM" created.

3.3 修改主体密码

[root@hadoop102 ~]# kadmin.local -q "cpw zhouchen/zhouchen"
Authenticating as principal root/admin@HADOOP.COM with password.
Enter password for principal "zhouchen/zhouchen@HADOOP.COM": (输入密码)
Re-enter password for principal "zhouchen/zhouchen@HADOOP.COM": (输入密码)
Password for "zhouchen/zhouchen@HADOOP.COM" changed.

3.4 查看所有主体

[root@hadoop102 ~]# kadmin.local -q "list_principals"
Authenticating as principal root/admin@HADOOP.COM with password.
K/M@HADOOP.COM
admin/admin@HADOOP.COM
zhouchen/zhouchen@HADOOP.COM
kadmin/admin@HADOOP.COM
kadmin/changepw@HADOOP.COM
kadmin/hadoop105@HADOOP.COM
kiprop/hadoop105@HADOOP.COM
krbtgt/HADOOP.COM@HADOOP.COM

4 Kerberos主体认证

Kerberos提供了两种认证方式,一种是通过输入密码认证,另一种是通过keytab密钥文件认证,但两种方式不可同时使用。

4.1 密码认证

1)使用kinit进行主体认证

[root@hadoop102 ~]# kinit zhouchen/zhouchen
Password for admin/admin@HADOOP.COM:

2)查看认证凭证

[root@hadoop102 ~]# klist 
Ticket cache: FILE:/tmp/krb5cc_0
Default principal: zhouchen/zhouchen@HADOOP.COM

Valid starting       Expires              Service principal
10/27/2019 18:23:57  10/28/2019 18:23:57  krbtgt/HADOOP.COM@HADOOP.COM
	renew until 11/03/2019 18:23:57

4.2 keytab密钥文件认证

1)生成主体admin/admin的keytab文件到指定目录/root/admin.keytab

[root@hadoop102 ~]# 
kadmin.local -q "xst -k /root/zhouchen.keytab zhouchen/zhouchen@HADOOP.COM"

2)使用keytab进行认证

[root@hadoop102 ~]# kinit -kt /root/zhouchen.keytab zhouchen/zhouchen

3)查看认证凭证

[root@hadoop102 ~]# klist 
Ticket cache: FILE:/tmp/krb5cc_0
Default principal: zhouchen/zhouchen@HADOOP.COM

Valid starting     Expires            Service principal
08/27/19 15:41:28  08/28/19 15:41:28  krbtgt/HADOOP.COM@HADOOP.COM
        renew until 08/27/19 15:41:28

4.3 销毁凭证

[root@hadoop102 ~]# kdestroy
[root@hadoop102 ~]# klist   
klist: No credentials cache found (ticket cache FILE:/tmp/krb5cc_0)

5 CDH启用Kerberos安全认证

5.1 为CM创建管理员主体/实例

[root@hadoop102 ~]# kadmin.local -q "addprinc cloudera-scm/admin"
Authenticating as principal root/admin@HADOOP.COM with password.
WARNING: no policy specified for cloudera-scm/admin @HADOOP.COM; defaulting to no policy
Enter password for principal " cloudera-scm/admin @HADOOP.COM": (输入密码)
Re-enter password for principal " cloudera-scm/admin @HADOOP.COM": (确认密码)
Principal " cloudera-scm/admin @HADOOP.COM" created.

5.2 启用Kerberos

在这里插入图片描述

5.3 环境确认(勾选全部)

在这里插入图片描述

5.4 填写配置

Kerberos 加密类型:aes128-cts、des3-hmac-sha1、arcfour-hmac
在这里插入图片描述

5.5 继续

在这里插入图片描述

5.6 填写主体名和密码

在这里插入图片描述

5.7 等待导入KDC

在这里插入图片描述

5.8 准备重启集群

在这里插入图片描述

5.9 等待完成

在这里插入图片描述在这里插入图片描述

5.10 查看主体

[root@hadoop102 ~]# kadmin.local -q "list_principals"
Authenticating as principal cloudera-scm/admin@HADOOP.COM with password.
HTTP/hadoop102@HADOOP.COM
HTTP/hadoop103@HADOOP.COM
HTTP/hadoop104@HADOOP.COM
K/M@HADOOP.COM
admin/admin@HADOOP.COM
zhouchen@HADOOP.COM
cloudera-scm/admin@HADOOP.COM
hdfs/hadoop102@HADOOP.COM
hdfs/hadoop103@HADOOP.COM
hdfs/hadoop104@HADOOP.COM
hive/hadoop102@HADOOP.COM
hue/hadoop102@HADOOP.COM
kadmin/admin@HADOOP.COM
kadmin/changepw@HADOOP.COM
kadmin/hadoop102@HADOOP.COM
krbtgt/HADOOP.COM@HADOOP.COM
mapred/hadoop102@HADOOP.COM
oozie/hadoop102@HADOOP.COM
sentry/hadoop102@HADOOP.COM
yarn/hadoop102@HADOOP.COM
yarn/hadoop103@HADOOP.COM
yarn/hadoop104@HADOOP.COM
zookeeper/hadoop102@HADOOP.COM
zookeeper/hadoop103@HADOOP.COM
zookeeper/hadoop104@HADOOP.COM

6 Kerberos安全环境实操

在启用Kerberos之后,系统与系统(flume-kafka)之间的通讯,以及用户与系统(user-hdfs)之间的通讯都需要先进行安全认证,认证通过之后方可进行通讯。
故在启用Kerberos后,数仓中使用的脚本等,均需要加入一步安全认证的操作,才能正常工作。

6.1 用户访问服务认证

开启Kerberos安全认证之后,日常的访问服务(例如访问HDFS,消费Kafka topic等)都需要先进行安全认证
1)在Kerberos数据库中创建用户主体/实例

[root@hadoop102 ~]# kadmin.local -q "addprinc hive/hive@HADOOP.COM"

2)进行用户认证

[root@hadoop102 ~]# kinit hive/hive@HADOOP.COM

3)访问HDFS

[root@hadoop102 ~]# hadoop fs -ls /
Found 4 items
drwxr-xr-x   - hive hive                0 2019-10-02 01:29 /origin_data
drwxrwxrwt   - hdfs supergroup          0 2019-10-03 00:20 /tmp
drwxr-xr-x   - hdfs supergroup          0 2019-10-02 01:35 /user
drwxr-xr-x   - hive hive                0 2019-10-02 01:38 /warehouse

4)hive查询

[root@hadoop102 ~]# hive 
WARNING: Use "yarn jar" to launch YARN applications.

Logging initialized using configuration in jar:file:/opt/cloudera/parcels/CDH-6.2.1-1.cdh6.2.1.p0.1425774/jars/hive-common-2.1.1-cdh6.2.1.jar!/hive-log4j2.properties Async: false

WARNING: Hive CLI is deprecated and migration to Beeline is recommended.
hive>

5)消费Kafka topic
(1)修改Kafka配置
○1在Kafka的配置项搜索“security.inter.broker.protocol”,设置为SALS_PLAINTEXT。
在这里插入图片描述

○2在Kafka的配置项搜索“ssl.client.auth”,设置为none。
在这里插入图片描述

(2)创建jaas.conf文件

[root@hadoop102 hive]# vim /var/lib/hive/jaas.conf
#文件内容如下
KafkaClient {
com.sun.security.auth.module.Krb5LoginModule required
useTicketCache=true;
};

(3)创建consumer.properties文件

[root@hadoop102 conf]# vim /etc/kafka/conf/consumer.properties
#文件内容如下
security.protocol=SASL_PLAINTEXT
sasl.kerberos.service.name=kafka

(4)声明jass.conf文件路径

[root@hadoop102 conf]# export KAFKA_OPTS="-Djava.security.auth.login.config=/var/lib/hive/jaas.conf"

(5)使用kafka-console-consumer消费Kafka topic数据

[root@hadoop102 ~]# kafka-console-consumer --bootstrap-server hadoop102:9092 --topic topic_start --from-beginning --consumer.config /etc/kafka/conf/consumer.properties

6)HDFS WebUI浏览器认证
我们设置CDH支持kerberos后会出现下图所示的情况:
在这里插入图片描述

可以登录9870,但是不能查看目录及文件,这是由于我们本地环境没有通过认证。
接下来我们设置本地验证。
注意:由于浏览器限制问题,我们这里使用火狐浏览器,其他如:谷歌,ie等均会出现问题。
(1) 下载火狐
(2)设置浏览器
○1打开火狐浏览器,在地址栏输入:about:config,进入设置页面。
在这里插入图片描述
○2搜索“network.negotiate-auth.trusted-uris”,修改值为自己的服务器主机名。
在这里插入图片描述

○3搜索“network.auth.use-sspi”,双击将值变为false。
在这里插入图片描述

(3)安装kfw
○1安装提供的kfw-4.1-amd64.msi。
○2将集群的/etc/krb5.conf文件的内容复制到C:\ProgramData\MIT\Kerberos5\krb.ini中,删除和路径相关的配置。

[logging]

 [libdefaults]
  default_realm = HADOOP.COM
  dns_lookup_realm = false
  dns_lookup_kdc = false
  ticket_lifetime = 24h
  renew_lifetime = 7d
  forwardable = true
  udp_preference_limit = 1

[realms]
 HADOOP.COM = {
  kdc = hadoop102  
  admin_server = hadoop102
 }

[domain_realm]

○3打开MIT,输入主体名和密码:
在这里插入图片描述

(4)测试
在这里插入图片描述

6.2 用户行为数仓

1)日志采集Flume配置
日志采集Flume,数据被发送到了Kafka,该Flume相当于一个Kafka生产者。所以需要我们进行上述Kafka客户端的安全认证。但是此处不需要我们进行手动配置,在开启Kerberos后,CM会自动进行配置。
2)消费Kafka Flume配置
消费Kafka Flume,将数据从Kafka传输到HDFS,该Flume相当于一个Kafka消费者。所以也需要我们进行上述Kafka客户端的安全认证(无需手动认证,CM会自动进行配置)。除此之外,我们还需要进行HDFS客户端的安全认证,这需要我们手动配置。
(1)生成hive用户的keytab文件
用户认证的方式有“输入密码”和“使用keytab密钥文件”两种方式,此处需使用keytab密钥文件进行认证。

[root@hadoop102 hive]# kadmin.local -q "xst -k /var/lib/hive/hive.keytab hive/hive@HADOOP.COM"

(2)增加读权限

chmod +r /var/lib/hive/hive.keytab

(3)分发keytab文件

xsync /var/lib/hive/hive.keytab

(4)修改flume agent配置文件

## 组件
a1.sources=r1 r2
a1.channels=c1 c2
a1.sinks=k1 k2

## source1
a1.sources.r1.type = org.apache.flume.source.kafka.KafkaSource
a1.sources.r1.batchSize = 5000
a1.sources.r1.batchDurationMillis = 2000
a1.sources.r1.kafka.bootstrap.servers = hadoop102:9092,hadoop103:9092,hadoop104:9092
a1.sources.r1.kafka.topics=topic_start

## source2
a1.sources.r2.type = org.apache.flume.source.kafka.KafkaSource
a1.sources.r2.batchSize = 5000
a1.sources.r2.batchDurationMillis = 2000
a1.sources.r2.kafka.bootstrap.servers = hadoop102:9092,hadoop103:9092,hadoop104:9092
a1.sources.r2.kafka.topics=topic_event

## channel1
a1.channels.c1.type=memory
a1.channels.c1.capacity=100000
a1.channels.c1.transactionCapacity=10000

## channel2
a1.channels.c2.type=memory
a1.channels.c2.capacity=100000
a1.channels.c2.transactionCapacity=10000

## sink1
a1.sinks.k1.type = hdfs
#a1.sinks.k1.hdfs.proxyUser=hive
a1.sinks.k1.hdfs.kerberosPrincipal=hive/hive@HADOOP.COM
a1.sinks.k1.hdfs.kerberosKeytab=/var/lib/hive/hive.keytab
a1.sinks.k1.hdfs.path = /origin_data/gmall/log/topic_start/%Y-%m-%d
a1.sinks.k1.hdfs.filePrefix = logstart-
a1.sinks.k1.hdfs.round = true
a1.sinks.k1.hdfs.roundValue = 10
a1.sinks.k1.hdfs.roundUnit = second

##sink2
a1.sinks.k2.type = hdfs
#a1.sinks.k2.hdfs.proxyUser=hive
a1.sinks.k2.hdfs.kerberosPrincipal=hive/hive@HADOOP.COM
a1.sinks.k2.hdfs.kerberosKeytab=/var/lib/hive/hive.keytab
a1.sinks.k2.hdfs.path = /origin_data/gmall/log/topic_event/%Y-%m-%d
a1.sinks.k2.hdfs.filePrefix = logevent-
a1.sinks.k2.hdfs.round = true
a1.sinks.k2.hdfs.roundValue = 10
a1.sinks.k2.hdfs.roundUnit = second

## 不要产生大量小文件
a1.sinks.k1.hdfs.rollInterval = 10
a1.sinks.k1.hdfs.rollSize = 134217728
a1.sinks.k1.hdfs.rollCount = 0

a1.sinks.k2.hdfs.rollInterval = 10
a1.sinks.k2.hdfs.rollSize = 134217728
a1.sinks.k2.hdfs.rollCount = 0

## 控制输出文件是原生文件。
a1.sinks.k1.hdfs.fileType = CompressedStream 
a1.sinks.k2.hdfs.fileType = CompressedStream 

a1.sinks.k1.hdfs.codeC = lzop
a1.sinks.k2.hdfs.codeC = lzop

## 拼装
a1.sources.r1.channels = c1
a1.sinks.k1.channel= c1

a1.sources.r2.channels = c2
a1.sinks.k2.channel= c2

3)ods层
编辑ods层数据导入脚本

[root@hadoop102 bin]# vim ods_db.sh
#内容如下
#!/bin/bash

kinit -kt /var/lib/hive/hive.keytab hive/hive
# 定义变量方便修改
APP=gmall

# 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
if [ -n "$1" ] ;then
   do_date=$1
else 
   do_date=`date -d "-1 day" +%F`
fi 

echo "===日志日期为 $do_date==="
sql="
load data inpath '/origin_data/gmall/log/topic_start/$do_date' into table "$APP".ods_start_log partition(dt='$do_date');

"

beeline -u "jdbc:hive2://hadoop102:10000/;principal=hive/hadoop102@HADOOP.COM" -n hive -e "$sql"

4)dwd层
编辑dwd数据导入脚本

[root@hadoop102 bin]# vim dwd_start_log.sh
#内容如下
#!/bin/bash

kinit -kt /var/lib/hive/hive.keytab hive/hive

# 定义变量方便修改
APP=gmall

# 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
if [ -n "$1" ] ;then
	do_date=$1
else 
	do_date=`date -d "-1 day" +%F`  
fi 

sql="
set hive.exec.dynamic.partition.mode=nonstrict;

insert overwrite table "$APP".dwd_start_log
PARTITION (dt='$do_date')
select 
    get_json_object(line,'$.mid') mid_id,
    get_json_object(line,'$.uid') user_id,
    get_json_object(line,'$.vc') version_code,
    get_json_object(line,'$.vn') version_name,
    get_json_object(line,'$.l') lang,
    get_json_object(line,'$.sr') source,
    get_json_object(line,'$.os') os,
    get_json_object(line,'$.ar') area,
    get_json_object(line,'$.md') model,
    get_json_object(line,'$.ba') brand,
    get_json_object(line,'$.sv') sdk_version,
    get_json_object(line,'$.g') gmail,
    get_json_object(line,'$.hw') height_width,
    get_json_object(line,'$.t') app_time,
    get_json_object(line,'$.nw') network,
    get_json_object(line,'$.ln') lng,
    get_json_object(line,'$.la') lat,
    get_json_object(line,'$.entry') entry,
    get_json_object(line,'$.open_ad_type') open_ad_type,
    get_json_object(line,'$.action') action,
    get_json_object(line,'$.loading_time') loading_time,
    get_json_object(line,'$.detail') detail,
    get_json_object(line,'$.extend1') extend1
from "$APP".ods_start_log 
where dt='$do_date';
"

beeline -u "jdbc:hive2://hadoop102:10000/;principal=hive/hadoop102@HADOOP.COM" -n hive -e "$sql"

5)dws层
编辑dws数据导入脚本

[root@hadoop102 bin]$ vim dws_log.sh
#内容如下
#!/bin/bash

kinit -kt /var/lib/hive/hive.keytab hive/hive

# 定义变量方便修改
APP=gmall

# 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
if [ -n "$1" ] ;then
	do_date=$1
else 
	do_date=`date -d "-1 day" +%F`  
fi 


sql="
  set hive.exec.dynamic.partition.mode=nonstrict;

  insert overwrite table "$APP".dws_uv_detail_day partition(dt='$do_date')
  select  
    mid_id,
    concat_ws('|', collect_set(user_id)) user_id,
    concat_ws('|', collect_set(version_code)) version_code,
    concat_ws('|', collect_set(version_name)) version_name,
    concat_ws('|', collect_set(lang)) lang,
    concat_ws('|', collect_set(source)) source,
    concat_ws('|', collect_set(os)) os,
    concat_ws('|', collect_set(area)) area, 
    concat_ws('|', collect_set(model)) model,
    concat_ws('|', collect_set(brand)) brand,
    concat_ws('|', collect_set(sdk_version)) sdk_version,
    concat_ws('|', collect_set(gmail)) gmail,
    concat_ws('|', collect_set(height_width)) height_width,
    concat_ws('|', collect_set(app_time)) app_time,
    concat_ws('|', collect_set(network)) network,
    concat_ws('|', collect_set(lng)) lng,
    concat_ws('|', collect_set(lat)) lat
  from "$APP".dwd_start_log
  where dt='$do_date'  
  group by mid_id;
"

beeline -u "jdbc:hive2://hadoop102:10000/;principal=hive/hadoop102@HADOOP.COM" -n hive -e "$sql"

6)ads层
编辑ads数据导入脚本

[root@hadoop102 bin]# vim ads_uv_log.sh
#内容如下
#!/bin/bash

kinit -kt /var/lib/hive/hive.keytab hive/hive

# 定义变量方便修改
APP=gmall

# 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
if [ -n "$1" ] ;then
	do_date=$1
else 
	do_date=`date -d "-1 day" +%F`  
fi 

sql="
  set hive.exec.dynamic.partition.mode=nonstrict;

insert into table "$APP".ads_uv_count 
select  
  '$do_date' dt,
   daycount.ct
from 
(
   select  
      '$do_date' dt,
       count(*) ct
   from "$APP".dws_uv_detail_day
   where dt='$do_date'  
)daycount;
"

beeline -u "jdbc:hive2://hadoop102:10000/;principal=hive/hadoop102@HADOOP.COM" -n hive -e "$sql"

6.3 业务数仓

1)sqoop导入
编辑sqoop导入脚本

[root@hadoop102 bin]# vim sqoop_import.sh
#内容如下
#!/bin/bash
kinit -kt /var/lib/hive/hive.keytab hive/hive
#export HADOOP_USER_NAME=hive
db_date=$2
echo $db_date
db_name=gmall

import_data() {
sqoop import \
--connect jdbc:mysql://hadoop102:3306/$db_name \
--username root \
--password Zhouchen.123456 \
--target-dir  /origin_data/$db_name/db/$1/$db_date \
--delete-target-dir \
--num-mappers 1 \
--fields-terminated-by "\t" \
--query "$2"' and  $CONDITIONS;'
}

import_sku_info(){
  import_data  "sku_info"  "select 
id, spu_id, price, sku_name, sku_desc, weight, tm_id,
category3_id, create_time 
  from sku_info  where 1=1"
}

import_user_info(){
  import_data "user_info" "select 
id, name, birthday, gender, email, user_level, 
create_time 
from user_info where 1=1"
}

import_base_category1(){
  import_data "base_category1" "select 
id, name from base_category1 where 1=1"
}

import_base_category2(){
  import_data "base_category2" "select 
id, name, category1_id from base_category2 where 1=1"
}

import_base_category3(){
  import_data "base_category3" "select id, name, category2_id from base_category3 where 1=1"
}

import_order_detail(){
  import_data   "order_detail"   "select 
    od.id, 
    order_id, 
    user_id, 
    sku_id, 
    sku_name, 
    order_price, 
    sku_num, 
    o.create_time  
  from order_info o , order_detail od 
  where o.id=od.order_id 
  and DATE_FORMAT(create_time,'%Y-%m-%d')='$db_date'"
}

import_payment_info(){
  import_data "payment_info"   "select 
    id,  
    out_trade_no, 
    order_id, 
    user_id, 
    alipay_trade_no, 
    total_amount,  
    subject, 
    payment_type, 
    payment_time 
  from payment_info 
  where DATE_FORMAT(payment_time,'%Y-%m-%d')='$db_date'"
}

import_order_info(){
  import_data   "order_info"   "select 
    id, 
    total_amount, 
    order_status, 
    user_id, 
    payment_way, 
    out_trade_no, 
    create_time, 
    operate_time  
  from order_info 
  where  (DATE_FORMAT(create_time,'%Y-%m-%d')='$db_date' or DATE_FORMAT(operate_time,'%Y-%m-%d')='$db_date')"
}

case $1 in
  "base_category1")
     import_base_category1
;;
  "base_category2")
     import_base_category2
;;
  "base_category3")
     import_base_category3
;;
  "order_info")
     import_order_info
;;
  "order_detail")
     import_order_detail
;;
  "sku_info")
     import_sku_info
;;
  "user_info")
     import_user_info
;;
  "payment_info")
     import_payment_info
;;
   "all")
   import_base_category1
   import_base_category2
   import_base_category3
   import_order_info
   import_order_detail
   import_sku_info
   import_user_info
   import_payment_info
;;
esac

2)ods层

编辑ods层导入脚本
[root@hadoop102 bin]# vim ods_db.sh
#内容如下
#!/bin/bash
kinit -kt /var/lib/hive/hive.keytab hive/hive
APP=gmall

# 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
if [ -n "$1" ] ;then
	do_date=$1
else 
	do_date=`date -d "-1 day" +%F`  
fi

sql=" 
load data inpath '/origin_data/$APP/db/order_info/$do_date'  OVERWRITE into table "$APP".ods_order_info partition(dt='$do_date');

load data inpath '/origin_data/$APP/db/order_detail/$do_date'  OVERWRITE into table "$APP".ods_order_detail partition(dt='$do_date');

load data inpath '/origin_data/$APP/db/sku_info/$do_date'  OVERWRITE into table "$APP".ods_sku_info partition(dt='$do_date');

load data inpath '/origin_data/$APP/db/user_info/$do_date' OVERWRITE into table "$APP".ods_user_info partition(dt='$do_date');

load data inpath '/origin_data/$APP/db/payment_info/$do_date' OVERWRITE into table "$APP".ods_payment_info partition(dt='$do_date');

load data inpath '/origin_data/$APP/db/base_category1/$do_date' OVERWRITE into table "$APP".ods_base_category1 partition(dt='$do_date');

load data inpath '/origin_data/$APP/db/base_category2/$do_date' OVERWRITE into table "$APP".ods_base_category2 partition(dt='$do_date');

load data inpath '/origin_data/$APP/db/base_category3/$do_date' OVERWRITE into table "$APP".ods_base_category3 partition(dt='$do_date'); 
"
beeline -u " jdbc:hive2://hadoop102:10000/;principal=hive/hadoop102@HADOOP.COM" -n hive -e "$sql"

3)dwd层
编辑dwd层导入脚本

[root@hadoop102 bin]# vim dwd_db.sh
#内容如下
#!/bin/bash

kinit -kt /var/lib/hive/hive.keytab hive/hive

# 定义变量方便修改
APP=gmall
# 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
if [ -n "$1" ] ;then
	do_date=$1
else 
	do_date=`date -d "-1 day" +%F`  
fi 

sql="

set hive.exec.dynamic.partition.mode=nonstrict;

insert overwrite table   "$APP".dwd_order_info partition(dt)
select * from "$APP".ods_order_info 
where dt='$do_date'  and id is not null;
 
insert overwrite table   "$APP".dwd_order_detail partition(dt)
select * from "$APP".ods_order_detail 
where dt='$do_date'   and id is not null;

insert overwrite table   "$APP".dwd_user_info partition(dt)
select * from "$APP".ods_user_info
where dt='$do_date'   and id is not null;
 
insert overwrite table   "$APP".dwd_payment_info partition(dt)
select * from "$APP".ods_payment_info
where dt='$do_date'  and id is not null;

insert overwrite table   "$APP".dwd_sku_info partition(dt)
select  
    sku.id,
    sku.spu_id, 
    sku.price,
    sku.sku_name,  
    sku.sku_desc,  
    sku.weight,  
    sku.tm_id,  
    sku.category3_id,  
    c2.id category2_id ,  
    c1.id category1_id,  
    c3.name category3_name,  
    c2.name category2_name,  
    c1.name category1_name,  
    sku.create_time,
    sku.dt
from
    "$APP".ods_sku_info sku 
join "$APP".ods_base_category3 c3 on sku.category3_id=c3.id 
    join "$APP".ods_base_category2 c2 on c3.category2_id=c2.id 
    join "$APP".ods_base_category1 c1 on c2.category1_id=c1.id 
where sku.dt='$do_date'  and c2.dt='$do_date'  
and  c3.dt='$do_date' and  c1.dt='$do_date' 
and sku.id is not null;
"
beeline -u " jdbc:hive2://hadoop102:10000/;principal=hive/hadoop102@HADOOP.COM" -n hive -e "$sql"

4)dws层
编辑dws层导入脚本

[root@hadoop102 bin]# vim dws_db_wide.sh
#内容如下
#!/bin/bash
kinit -kt /var/lib/hive/hive.keytab hive/hive
# 定义变量方便修改
APP=gmall

# 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
if [ -n "$1" ] ;then
	do_date=$1
else 
	do_date=`date -d "-1 day" +%F`  
fi 

sql="
with  
tmp_order as
(
    select 
        user_id, 
        count(*)  order_count,
        sum(oi.total_amount) order_amount
    from "$APP".dwd_order_info oi
    where date_format(oi.create_time,'yyyy-MM-dd')='$do_date'
    group by user_id
) ,
tmp_payment as
(
    select
        user_id, 
        sum(pi.total_amount) payment_amount, 
        count(*) payment_count 
    from "$APP".dwd_payment_info pi 
    where date_format(pi.payment_time,'yyyy-MM-dd')='$do_date'
    group by user_id
)
insert overwrite table "$APP".dws_user_action partition(dt='$do_date')
select
    user_actions.user_id,
    sum(user_actions.order_count),
    sum(user_actions.order_amount),
    sum(user_actions.payment_count),
    sum(user_actions.payment_amount)
from 
(
    select
        user_id,
        order_count,
        order_amount,
        0 payment_count,
        0 payment_amount
    from tmp_order

    union all
    select
        user_id,
        0 order_count,
        0 order_amount,
        payment_count,
        payment_amount
    from tmp_payment
 ) user_actions
group by user_id;
"

beeline -u " jdbc:hive2://hadoop102:10000/;principal=hive/hadoop102@HADOOP.COM" -n hive -e "$sql"

5)ads层
编辑ads层导入脚本

[root@hadoop102 bin]# vim ads_db_gmv.sh
#内容如下
#!/bin/bash
kinit -kt /var/lib/hive/hive.keytab hive/hive

# 定义变量方便修改
APP=gmall

# 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
if [ -n "$1" ] ;then
	do_date=$1
else 
	do_date=`date -d "-1 day" +%F`
fi 

sql="
insert into table "$APP".ads_gmv_sum_day 
select 
    '$do_date' dt,
    sum(order_count)  gmv_count,
    sum(order_amount) gmv_amount,
    sum(payment_amount) payment_amount 
from "$APP".dws_user_action 
where dt ='$do_date'
group by dt;
"

beeline -u " jdbc:hive2://hadoop102:10000/;principal=hive/hadoop102@HADOOP.COM" -n hive -e "$sql"

6)sqoop导出
编辑sqoop导出脚本

[root@hadoop102 bin]# vim sqoop_export.sh
#内容如下
#!/bin/bash
kinit -kt /var/lib/hive/hive.keytab hive/hive

db_name=gmall

export_data() {
sqoop export \
--connect "jdbc:mysql://hadoop102:3306/${db_name}?useUnicode=true&characterEncoding=utf-8"  \
--username root \
--password Zhouchen.123456 \
--table $1 \
--num-mappers 1 \
--export-dir /warehouse/$db_name/ads/$1 \
--input-fields-terminated-by "\t" \
--update-mode allowinsert \
--update-key $2 \
--input-null-string '\\N'    \
--input-null-non-string '\\N'
}

case $1 in
  "ads_uv_count")
     export_data "ads_uv_count" "dt"
;;
  "ads_user_action_convert_day") 
     export_data "ads_user_action_convert_day" "dt"
;;
  "ads_gmv_sum_day")
     export_data "ads_gmv_sum_day" "dt"
;;
   "all")
     export_data "ads_uv_count" "dt"
     export_data "ads_user_action_convert_day" "dt"
     export_data "ads_gmv_sum_day" "dt"
;;
esac

7 Kerberos常见问题

7.1 Kerberos启动后台日志提示异常:No such file or directory - while initializing database for realm HADOOP.COM

在/var/log/krb5kdc.log中发现No such file or directory - while initializing database for realm HADOOP.COM。
解决方法:
1)检查kdc.conf和krb5.conf文件是否配置正确,修改配置,注意:配置文件的[kdcdefaults],[logging]、[libdefaults]等的里面不能有空格
2)停止服务

service krb5kdc stop
service kadmin stop

3)删除Kerberos数据库重新创建数据库

rm -rf /var/kerberos/krb5kdc/*principal*

kdb5_util create -s -r HADOOP.COM

4)创建管理员

kadmin.local -q "addprinc admin/admin"

5)启动服务

service krb5kdc start
service kadmin start

7.2 kinit通过keytab认证出现异常

通过Linux的zhouchen主体执行kinit -kt /root/zhouchen.keytab zhouchen出现下面情况:

kinit: ???? while getting initial credentials
或者
kinit: Permission denied while getting initial credentials

解决方式:
1)使用root用户修改zhouchen.keytab的所属用户:

chown zhouchen /root/zhouchen.keytab
  1. 修改zhouchen.keytab的权限为660
chmod 660 /root/zhouchen.keytab

7.3 kinit认证时密码输入正确却提示密码错误

[root@hadoop102 ~]# kinit zhouchen
Password for zhouchen@HADOOP.COM: 
kinit: Password incorrect while getting initial credentials

这是因为zhouchen已经生成了keytab,所以此时通过这种方式不能认证,需要通过keytab文件来认证,或者修改密码后再认证(修改密码后之前的keytab文件会失效)。

[root@hadoop102 ~]# kinit -kt /root/zhouchen.keytab zhouchen
[root@hadoop102 ~]# klist 
Ticket cache: FILE:/tmp/krb5cc_0
Default principal: zhouchen@HADOOP.COM

Valid starting     Expires            Service principal
08/27/19 16:00:39  08/28/19 16:00:39  krbtgt/HADOOP.COM@HADOOP.COM
        renew until 08/27/19 16:00:39

7.4 创建数据库异常

Bad krb5 admin server hostname while initializing kadmin interface
原因:在 krb5.conf 文件中为 admin_server 配置了无效的主机名。
解决办法:修改krb5文件,检查admin_server配置是否正确的主机名,修改后,重启执行创建数据库的命令。

7.5 Zookeeper集群启动异常

异常信息:Could not configure server because SASL configuration did not allow the ZooKeeper server to authenticate itself properly: javax.security.auth.login.LoginException: No password provided
问题:属于认证凭证更新问题
解决办法:ZooKeeper 启动安全机制,重新生成Krb5凭证。
每个主机重新更新凭证方法:全部服务停止和Cloudera Manager服务停止,修改Kerberos配置中的 Kerberos 加密类型随便添加一个,就可以主机更新Krb5凭证,然后在改回去在重新生成一次

7.6 Hue启动,Kerberos Ticket Renewer起不来

[root@hadoop102 ~]# kinit hue
Password for hue@HADOOP.COM: 

[root@hadoop102 ~]# kadmin.local 
Authenticating as principal hue/admin@HADOOP.COM with password.

kadmin.local:  modprinc -maxrenewlife 90day krbtgt/HADOOP.COM@HADOOP.COM
Principal "krbtgt/HADOOP.COM@HADOOP.COM" modified.

kadmin.local: modprinc -maxrenewlife 90day +allow_renewable krbtgt/HADOOP.COM@HADOOP.COM
Principal "krbtgt/HADOOP.COM@HADOOP.COM" modified.
  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值