Impala 4.0集成kerberos、ranger、进行资源管控配置流程
一、集成Kerberos
1.生成 keytab
在 KDC server 节点上使用root账号执行下面命令:
kadmin.local -q "addprinc -randkey impala/host1@EXAMPLE.COM "
kadmin.local -q "addprinc -randkey impala/host2@EXAMPLE.COM "
kadmin.local -q "addprinc -randkey impala/host3@EXAMPLE.COM "
kadmin.local -q "xst -k /var/kerberos/impala-service.keytab impala/host1@EXAMPLE.COM "
kadmin.local -q "xst -k /var/kerberos/impala-service.keytab impala/host2@EXAMPLE.COM "
kadmin.local -q "xst -k /var/kerberos/impala-service.keytab impala/host3@EXAMPLE.COM "
2.修改 impala 配置文件
修改 host1 节点上的 impala配置文件,在 IMPALA_CATALOG_ARGS
、IMPALA_SERVER_ARGS
和 IMPALA_STATE_STORE_ARGS
中添加下面参数:
-kerberos_reinit_interval=60 \
-principal=impala/_HOST@EXAMPLE.COM \
-keytab_file=/var/kerberos/impala-service.keytab \
3.修改core-site文件
将hadoop.security.auth_to_local配置项中新添加一行:
RULE:[2:$1@$0](impala@EXAMPLE.COM)s/.*/hadoop/
这里将impala用户映射为管理员hadoop用户,否则将需要使用impala用户启动,且与ranger集成时可能存在问题
4.使用haproxy(选)
如果使用了 HAProxy,需将 部署haproxy机器的principal也生成到impala-service.keytab
# proxy 为安装了 haproxy 的机器,本文为host4
kadmin.local -q "addprinc -randkey impala/host4@EXAMPLE.COM "
kadmin.local -q "xst -k /var/kerberos/impala-service.keytab impala/host4@EXAMPLE.COM "
IMPALA_SERVER_ARGS
参数需要修改为(proxy为 HAProxy 机器的名称,这里我是将 HAProxy 安装在 host4 节点上):
-kerberos_reinit_interval=60
-be_principal=impala/_HOST@EXAMPLE.COM
-principal=impala/host4@EXAMPLE.COM
-keytab_file=/var/kerberos/impala-service.keytab
5.拷贝文件
拷贝impala-service.keytab 文件到其他节点的 /var/kerberos/目录
for i in host{1..3};
do scp /var/kerberos/impala-service.keytab $i:/var/kerberos/;
ssh $i "chown hadoop:hadoop /var/kerberos/impala-service.keytab;chmod 400 /var/kerberos/impala-service.keytab";
done;
将impala配置文件文件同步到其他节点
cp /etc/hadoop/conf/core-site.xml /etc/impala/conf/
cp /etc/hadoop/conf/hdfs-site.xml /etc/impala/conf/
cp /etc/hive/conf/hive-site.xml /etc/impala/conf/
for i in host{2..3};
do scp /etc/impala/conf/ $i:/etc/impala/;
done;
6.启动服务
启动 impala-state-store
$ kinit -k -t /etc/impala/conf/impala.keytab impala/host1@EXAMPLE.COM
$ impala-state-store start
然后查看日志,确认是否启动成功。
启动 impala-catalog
$ kinit -k -t /etc/impala/conf/impala.keytab impala/host1@EXAMPLE.COM
$ impala-catalog start
然后查看日志,确认是否启动成功。
启动 impala-server
$ kinit -k -t /etc/impala/conf/impala.keytab impala/host1@EXAMPLE.COM
$ impala-server start
然后查看日志,确认是否启动成功。
7.测试 impala-shell
在启用了 kerberos 之后,运行 impala-shell 时,需要添加 -k
参数指定使用Kerberos -s
指定service name :
$ impala-shell -i host2:21000 -k -s impala --protocol=beeswax
Starting Impala Shell using Kerberos authentication
Starting Impala Shell with Kerberos authentication using Python 2.7.5
Using service name 'impala'
Opened TCP connection to host2:21000
Connected to host2:21000
Server version: impalad version 4.0.0-SNAPSHOT RELEASE (build c0503e6b29cb165ec0e7fa44cba4025c63a08200)
***********************************************************************************
Welcome to the Impala shell.
(Impala Shell v4.0.0-RELEASE (a702d2d) built on Thu Nov 18 10:55:28 CST 2021)
You can change the Impala daemon that you're connected to by using the CONNECT
command.To see how Impala will plan to run your query without actually executing
it, use the EXPLAIN command. You can change the level of detail in the EXPLAIN
output by setting the EXPLAIN_LEVEL query option.
***********************************************************************************
WARNING: The beeswax protocol is deprecated and will be removed in a future version of Impala.
[host1:21000] >
[host1:21000] > show tables;
Query: show tables
+--------+
| name |
+--------+
| aaa |
| bbb |
| ccc |
| ddd |
+--------+
Returned 4 row(s) in 0.08s
8.jdbc连接测试(选)
(1)下载连接jar包
链接:https://pan.baidu.com/s/11DQielaYD_pZyrMYAWLVbA?pwd=jdbc
提取码:jdbc
(2)将压缩包内的ImpalaJDBC41.jar和TCLIServiceClient.jar两个jar包拷贝至hive/lib下
(3)Beeline命令行测试
【1】非Kerberos环境下测试
$ beeline -d "com.cloudera.impala.jdbc41.Driver" -u "jdbc:impala://host1:21050/"
Connecting to jdbc::impala://host1:21050/
Connected to: Impala (version 4.1.0-SNAPSHOT)
Driver: ImpalaJDBC (version 02.05.41.1061)
Error: [Simba][JDBC](11975) Unsupported transaction isolation level: 4. (state=HY000,code=11975)
Beeline version 3.1.2 by Apache Hive
0: jdbc::impala://host1:21050/> show tables;
+--------+
| name |
+--------+
| aaa |
| bbb |
| ccc |
| ddd |
+--------+
【2】Kerberos环境下测试
$ beeline -d "com.cloudera.impala.jdbc41.Driver" -u "jdbc:impala://host1:21050/;AuthMech=1;KrbServiceName=impala;KrbRealm=EXAMPLE.COM;KrbHostFQDN=host1"
Connecting to jdbc::impala://host1:21050/;AuthMech=1;KrbServiceName=e3base;KrbRealm=EXAMPLE.COM;KrbHostFQDN=host1
Connected to: Impala (version 4.1.0-SNAPSHOT)
Driver: ImpalaJDBC (version 02.05.41.1061)
Error: [Simba][JDBC](11975) Unsupported transaction isolation level: 4. (state=HY000,code=11975)
Beeline version 3.1.2 by Apache Hive
0: jdbc::impala://host1:21050/> show tables;
+--------+
| name |
+--------+
| aaa |
| bbb |
| ccc |
| ddd |
+--------+
二、集成Ranger
默认情况下,未启用授权时,Impala 所有用户都有权限进行所有读写操作,这就适用于开发/测试环境,但不适用于安全生产环境。启用授权后,Impala 会对使用 impala-shell 或其他客户端程序的用户的进行权限控制,并将各种权限与每个用户相关联。
1.impala服务开启权限控制
(1)在 /etc/impala/conf/impala 配置文件中IMPALA_SERVER_ARGS和IMPALA_CATALOG_ARGS添加以下选项
-server-name=server1 \
-ranger_service_type=hive \
-ranger_app_id=hiveServer2 \
-authorization_provider=ranger"
参数解析:
-server_name: 为所有 impalad 节点和catalogd集群中的节点指定相同的名称。
-ranger_service_type=hive
-ranger_app_id:将其设置为 Ranger 应用程序 ID。
-authorization_provider=ranger
(2)在/etc/impala/conf/下创建ranger-hive-security.xml文件
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration xmlns:xi="http://www.w3.org/2001/XInclude">
<property>
<name>ranger.plugin.hive.service.name</name>
<value>impaladev</value>
</property>
<property>
<name>ranger.plugin.hive.policy.source.impl</name>
<value>org.apache.ranger.admin.client.RangerAdminRESTClient</value>
</property>
<property>
<name>ranger.plugin.hive.policy.rest.url</name>
<value>http://ranger-admin:6080</value>
</property>
<property>
<name>ranger.plugin.hive.policy.rest.ssl.config.file</name>
<value>/etc/hive/conf/ranger-policymgr-ssl.xml</value>
</property>
<property>
<name>ranger.plugin.hive.policy.pollIntervalMs</name>
<value>30000</value>
</property>
<property>
<name>ranger.plugin.hive.policy.cache.dir</name>
<value>/etc/ranger/hivedev/policycache</value>
</property>
<property>
<name>xasecure.hive.update.xapolicies.on.grant.revoke</name>
<value>True</value>
</property>
<property>
<name>ranger.plugin.hive.policy.rest.client.connection.timeoutMs</name>
<value>120000</value>
</property>
<property>
<name>ranger.plugin.hive.policy.rest.client.read.timeoutMs</name>
<value>30000</value>
</property>
</configuration>
2.重新启动catalogd和所有 impalad 服务
3.在ranger上配置impala服务
(1)打开ranger页面
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-UFMtaLib-1655955120100)(D:\公众号\ranger页面.png)]
(2)点击HADOOP SQL右上角的+号
(3)配置(非Kerberos)
Service Name :impaladev
Display Name:impaladev
Username:hadoop
Passwod:###
jdbc.driverClassName: org.apache.hive.jdbc.HiveDriver
jdbc.url:jdbc:impala://host1:21050/
(4)配置(Kerberos)
Service Name :impaladev
Display Name:impaladev
Username:hadoop
Passwod:###
jdbc.driverClassName: org.apache.hive.jdbc.HiveDriver
jdbc.url: jdbc:impala://host1:21050/;AuthMech=1;KrbServiceName=impala;KrbRealm=EXAMPLE.COM;KrbHostFQDN=host1
#并新添加一条配置 将策略拉取到本机进行缓存的用户
policy.download.auth.users: hadoop
4.重新启动catalogd和所有 impalad
5.impala-shell测试
(1)使用test用户登录impala-shell 并访问某个不属于他的库
ERROR: AuthorizationException: User ‘test@EXAMPLE.COM’ does not have privileges to access: test_imapla_hive..
(2)通过ranger页面进行赋权后
[host1:21000] test_imapla_hive> select current_user();
Query: select current_user()
Query submitted at: 2022-06-17 14:20:40 (Coordinator: http://host1:21000)
Query progress can be monitored at: http://host1:21000/query_plan?query_id=43493faa62a9e3a6:6221306b00000000
+------------------+
| current_user() |
+------------------+
| test@EXAMPLE.COM |
+------------------+
Fetched 1 row(s) in 0.13s
[host1:21000] test_imapla_hive> show tables;
Query: show tables
+---------+
| name |
+---------+
| events5 |
| stu |
| stu1 |
| stu_yes |
+---------+
Fetched 12 row(s) in 0.01s
三、准入管控
准入控制是 Impala 的一项功能,它对并发 SQL 查询施加限制,以避免在繁忙的集群上出现资源使用高峰和内存不足的情况。准入控制功能允许您设置并发 Impala 查询数量和这些查询使用的内存的上限。任何其他查询都会排队,直到较早的查询完成,而不是被取消或运行缓慢并导致争用。随着其他查询完成,排队的查询被允许继续。
1.impala服务开启准入控制
(1)编辑${IMPALA_HOME}/conf/impala,添加以下变量内容:
DISABLE_ADMISSION_CONTROL=false
FAIR_SCHEDULER_ALLOCATION_PATH=/etc/impala/conf/fair-scheduler.xml
LLAMA_SITE_PATH=/etc/impala/conf/llama-site.xml
(2) IMPALA_SERVER_ARGS模块引用上述变量
-fair_scheduler_allocation_path=${FAIR_SCHEDULER_ALLOCATION_PATH} \
-llama_site_path=${LLAMA_SITE_PATH} \
2 配置资源队列文件
(1)修改${IMPALA_HOME}/conf/fair-scheduler.xml文件,增加下列内容
<allocations>
<queue name="root">
<aclSubmitApps> </aclSubmitApps>
<queue name="default">
<maxResources>50000 mb, 0 vcores</maxResources>
<aclSubmitApps>hadoop hadoop</aclSubmitApps>
</queue>
<queue name="test">
<maxResources>100 mb, 2 vcores</maxResources>
<aclSubmitApps>test test</aclSubmitApps>
</queue>
</queue>
<queuePlacementPolicy>
<rule name="specified" create="false"/>
<rule name="default" />
</queuePlacementPolicy>
</allocations>
(2)参数解析
maxResources:配置可用内容和CPU,0 vcores代表不对CPU使用做限制
aclSubmitApps:允许使用资源队列的用户,用户和用户之间用”,”隔开,用户组与用户组之间用”,”隔开,用户与用户组之间用“ ”隔开。
queuePlacementPolicy:默认资源队列。
该配置配置了两个队列
default:该队列50000 mb内存,对内存不做限制,hadoop用户可使用
test:该队列100 mb内存,cpu最大仅能使用两核,testl用户可使用
3. 配置资源管控
(1) 修改${IMPALA_HOME}/conf/llama-site.xml文件,增加下列内容
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<property>
<name>llama.am.throttling.maximum.placed.reservations.root.default</name>
<value>2</value>
</property>
<property>
<name>llama.am.throttling.maximum.queued.reservations.root.default</name>
<value>2</value>
</property>
<property>
<name>impala.admission-control.pool-default-query-options.root.default</name>
<value>mem_limit=128m,query_timeout_s=20,max_io_buffers=10</value>
</property>
<property>
<name>impala.admission-control.pool-queue-timeout-ms.root.default</name>
<value>30000</value>
</property>
<property>
<name>llama.am.throttling.maximum.placed.reservations.root.test</name>
<value>2</value>
</property>
<property>
<name>llama.am.throttling.maximum.queued.reservations.root.test</name>
<value>2</value>
</property>
<property>
<name>impala.admission-control.pool-default-query-options.root.test</name>
<value>mem_limit=128m,query_timeout_s=20,max_io_buffers=10</value>
</property>
<property>
<name>impala.admission-control.pool-queue-timeout-ms.root.test</name>
<value>30000</value>
</property>
</configuration>
(2)配置解析
llama.am.throttling.maximum.placed.reservations.root.${username}
配置租户资源队列的最大并发量(资源池上最大并发查询数)
llama.am.throttling.maximum.queued.reservations.root.${username}
配置资源池上队列的最大值(等待执行队列)
impala.admission-control.pool-default-query-options.root.${username}
资源池上查询的默认配置参数,可以配置多个参数,用“,”隔开
mem_limit:该资源池最大内存,可以覆盖impala配置文件中MEM_LIMIT
query_timeout_s:单个查询超时时间,单位:秒
impala.admission-control.pool-queue-timeout-ms.root.${username}
等待队列的超时时间,超过该时间的等待队列将不再继续等待
配置好之后,重启impala服务即可
4.impala-shell连接使用
连接时添加-Q REQUEST_POOL=队列名即可
$ impala-shell -i host2:21000 -k -s impala --protocol=beeswax -Q REQUEST_POOL=root.test
Starting Impala Shell using Kerberos authentication
Starting Impala Shell with Kerberos authentication using Python 2.7.5
Using service name 'impala'
Opened TCP connection to host2:21000
Connected to host2:21000
Server version: impalad version 4.0.0-SNAPSHOT RELEASE (build c0503e6b29cb165ec0e7fa44cba4025c63a08200)
***********************************************************************************
Welcome to the Impala shell.
(Impala Shell v4.0.0-RELEASE (a702d2d) built on Thu Nov 18 10:55:28 CST 2021)
You can change the Impala daemon that you're connected to by using the CONNECT
command.To see how Impala will plan to run your query without actually executing
it, use the EXPLAIN command. You can change the level of detail in the EXPLAIN
output by setting the EXPLAIN_LEVEL query option.
***********************************************************************************
[host1:21000] > show tables;
Query: show tables
+--------+
| name |
+--------+
| aaa |
| bbb |
| ccc |
| ddd |
+--------+
Returned 4 row(s) in 0.08s
四、常见问题
1.impala服务启动报错:can not find /var/run/hdfs-sockets/dn文件
解决:将impala/conf下的hdfs-site配置中的dfs.client.read.shortcircuit=false
2.failed to save roles to cache file ‘/etc/ranger/hivedev/policycache/hiveServer2_hivedev_roles.json’
解决:事先创建/etc/ranger/hivedev 并将/etc/ranger目录属主给hadoop
3.catalog启动时报错hive.metastore.dml.events需要被设置成true
解决:由于次配置项为hive服务端配置,正常不进行修改所以在catalog配置中添加-hms_event_polling_interval_s=0即可解决
关注公众号,及时获取大数据干货知识