1. 部署流程图
略
2. 安装清单
序号 | 虚机名称 | 安装服务 |
---|---|---|
1 | VM01 | Master/Worker/Hive Meta Store/Thrift Server/Kerberos Client |
2 | VM02 | Worker/Zeppelin/Kerberos Client |
3 | VM03 | Worker/Ranger/Kerberos Client |
3. Hive Meta Store安装
3.1. 相关压缩包
hadoop-2.7.7.tar.gz
apache-hive-2.3.7.tar.gz
3.2. 安装步骤
此文档默认安装路径为:/apps/delta-lake
3.2.1. 安装 hadoop
tar -xvf hadoop-2.7.7.tar.gz /apps/delta-lake
3.2.2. 安装hive
tar -xvf apache-hive-2.3.7.tar.gz -C /apps/delta-lake
修改hive-env.sh
cd /apps/delta-lake/apache-hive-2.3.7-bin/conf
#修改配置文件
vi hive-env.sh
export HADOOP_HEAPSIZE=1024
export HADOOP_HOME=/apps/delta-lake/hadoop-2.7.7 #修改hadoop文件路径
export HIVE_CONF_DIR=/apps/delta-lake/apache-hive-2.3.7-bin/conf
修改hive-site.xml(按照实际情况修改下面配置的参数和路径)
vi hive-env.sh
<!-- 修改相关文件路径 -->
<configuration>
<property>
<name>system:java.io.tmpdir</name>
<value>/data/hive-data/tmp</value>
</property>
<property>
<name>system:user.name</name>
<value>hive</value>
</property>
<property>
<name>hive.exec.scratchdir</name>
<value>/data/hive-data/hive</value>
<description></description>
</property>
<property>
<name>hive.repl.rootdir</name>
<value>/data/hive-data/repl</value>
<description>HDFS root dir for all replication dumps.</description>
</property>
<property>
<name>hive.repl.cmrootdir</name>
<value>/data/hive-data/cmroot</value>
<description>Root dir for ChangeManager, used for deleted files.</description>
</property>
<property>
<name>hive.server2.logging.operation.log.location</name>
<value>/data/hive-data/hive/operation_logs</value>
<description></description>
</property>
<property>
<name>hive.exec.local.scratchdir</name>
<value>/data/hive-data/scratchdir</value>
<description>Local scratch space for Hive jobs</description>
</property>
<property>
<name>hive.downloaded.resources.dir</name>
<value>/data/hive-data/download/${hive.session.id}_resources</value>
<description>Temporary local directory for added resources in the remote file system.</description>
</property>
<property>
<name>hive.querylog.location</name>
<value>/data/hive-data/querylog</value>
<description>Location of Hive run time structured log file</description>
</property>
<!-- 修改hdfs 中 hive要使用的初始路径 -->
<property>
<name>hive.metastore.warehouse.dir</name>
<value>hdfs://***/usr/hive/warehouse</value>
<description>location of default database for the warehouse</description>
</property>
<!-- 若有需要,修改hive meta store端口 -->
<property>
<name>hive.metastore.port</name>
<value>9083</value>
<description>Hive metastore listener port</description>
</property>
<!-- 默认改参数为true -->
<property>
<name>hive.metastore.metrics.enabled</name>
<value>true</value>
<description>Enable metrics on the metastore.</description>
</property>
<!-- 修改并发查询数量 -->
<property>
<name>hive.exec.parallel.thread.number</name>
<value>50</value>
<description>How many jobs at most can be executed in parallel</description>
</property>
<!-- 修改数据库连接串,改为实际要用的地址 -->
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://*.*.*.*/hive_metastore?createDatabaseIfNotExist=true&useSSL=false</value>
<description>
</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<!-- 数据库用户名和密码 -->
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
<description>Username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>****</value>
<description>password to use against metastore database</description>
</property>
</configuration>
修改hive Kerberos 配置
1) hive-env.sh 增加Kerberos配置
vi hive-env.sh <property> <name>hive.metastore.sasl.enabled</name> <value>true</value> </property> <!-- 修改keytab实际文件地址 --> <property> <name>hive.metastore.kerberos.keytab.file</name> <value>/data/kerberos/xxx_user1.service.keytab</value> </property>
<!-- 修改keytab登录名 --> <property> <name>hive.metastore.kerberos.principal</name> <value>XXX_user1/XXX_user1@KES.COM</value> </property> <property> <name>hive.server2.authentication</name> <value>KERBEROS</value> </property> <property> <name>hive.server2.authentication.kerberos.principal</name> <value>****</value> </property> <property> <name>hive.server2.authentication.kerberos.keytab</name> <value>****</value> </property>
2) 拷贝core-site.xml 到/apps/apache-hive-2.3.7-bin/conf文件夹下
服务启动和停止
$ cd /apps/delta-lake/apache-hive-2.3.7-bin/bin
$ ./start-hive.sh
$ ./stop-hive.sh
4. Spark 安装
4.1 相关部署包
jdk-8u171-linux-x64.tar.gz
hadoop-2.7.7.tar.gz
scala-2.12.2.tgz
spark-3.0.0-bin-hadoop2.7.tar.gz
4.2部署步骤
此文档默认安装路径为:/apps/delta-lake
序号 | 虚机名称 | 角色 |
---|---|---|
1 | VM01 | Master/Worker |
2 | VM02 | Worker |
3 | VM03 | Worker |
4.2.1 建立互信
若虚机之前已建立互信,忽略此步骤
$ ssh-keygen -t rsa #需要确认直接enter键确认
#配置互信,需要输入密码确认
$ ssh-copy-id VM01
$ ssh-copy-id VM02
$ ssh-copy-id VM03
4.2.2 JDK安装
所有spark集群虚机依次检查并安装 java。
检查是否java已安装,若已安装则跳过该步骤
# 检查java是否已安装
$ java -version
java version "1.8.0_171"
Java(TM) SE Runtime Environment (build 1.8.0_171-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode)
安装java:
$ tar -xvf jdk-8u171-linux-x64.tar.gz -C /apps/delta-lake/
4.2.3 Hadoop 安装
$ tar -xvf hadoop-2.7.7.tar.gz /apps/delta-lake
4.2.4 Scala 安装
$ tar -xvf scala-2.12.2.tgz -C /apps/delta-lake/
4.2.5 Spark 安装
1)安装spark
$ tar -xvf spark-3.0.0-bin-hadoop2.7.tar.gz -C /apps/delta-lake/
2)配置slaves(配置worker)
$ cd /apps/delta-lake/spark-3.0.0-bin-hadoop2.7/
$ vi slaves
#列出所有可用于worker的虚机清单
# A Spark Worker will be started on each of the machines listed below.
VM01
VM02
VM03
3) spark-env.sh 配置
$ vi spark-env.sh
export HADOOP_HOME=/apps/delta-lake/hadoop-2.7.7
export JAVA_HOME=/apps/delta-lake/jdk1.8.0_171
export SCALA_HOME=/apps/delta-lake/scala-2.12.2
SPARK_MASTER_HOST=VM01
SPARK_MASTER_PORT=7077
SPARK_MASTER_WEBUI_PORT=9090
SPARK_MASTER_OPTS="-Djava.io.tmpdir=/data/spark-data/tmp/" #按需修改实际路径
SPARK_WORKER_WEBUI_PORT=9091
SPARK_WORKER_CORES=8 #按照实际core数进行修改
SPARK_WORKER_MEMORY=5g #按照实际可用内存设置
SPARK_WORKER_DIR=/nfsmount/spark-data/$HOSTNAME/work #使用NAS路径
SPARK_LOCAL_DIRS=/nfsmount/spark-data/$HOSTNAME/data #使用NAS路径
SPARK_WORKER_OPTS="-Djava.io.tmpdir=/nfsmount/spark-data/$HOSTNAME/tmp/"
SPARK_LOG_DIR=/nfsmount/spark-data/$HOSTNAME/logs #使用NAS路径
SPARK_PID_DIR=/nfsmount/spark-data/$HOSTNAME/tmp #使用NAS路径
# HADOOP_USER_NAME=hdfs #若配置kerberos则不需要该配置
4) 同步安装文件到其它节点
$ scp -r /apps/delta-lake/jdk1.8.0_171/ VM02:/apps/delta-lake/
$ scp -r /apps/delta-lake/scala-2.12.2/ VM02:/apps/delta-lake/
$ scp -r /apps/delta-lake/hadoop-2.7.7/ VM02:/apps/delta-lake/
$ scp -r /apps/delta-lake/spark-3.0.0-bin-hadoop2.7/ VM02:/apps/delta-lake/
$ scp -r /apps/delta-lake/jdk1.8.0_171/ VM03:/apps/delta-lake/
$ scp -r /apps/delta-lake/scala-2.12.2/ VM03:/apps/delta-lake/
$ scp -r /apps/delta-lake/hadoop-2.7.7/ VM03:/apps/delta-lake/
$ scp -r /apps/delta-lake/spark-3.0.0-bin-hadoop2.7/ VM03:/apps/delta-lake/
4.2.6 集群开启和关闭
$ cd /apps/delta-lake/spark-3.0.0-bin-hadoop2.7/sbin/
$ ./start-all.sh #全部启动
$ ./stop-all.sh #全部关闭
4.2.7 服务测试
打开控制台,检查相关服务是否已启动 http://<VM01>:9090/
5. Thrift Server安装
5.1 相关部署包
jdk-8u171-linux-x64.tar.gz
hadoop-2.7.7.tar.gz
scala-2.12.2.tgz
spark-3.0.0-bin-hadoop2.7.tar.gz
ivy2.tar.gz
5.2 Thrift Server安装
具体安装步骤同 [Spark 安裝步骤],若当前虚机已安装spark节点且尚未启动Thrift Server服务,可直接使用该配置
5.2.1 安装 Delta lake 依赖jar
#可用如下方式检查
$ ll -a /home/adaas/ | grep ivy
drwxrwxr-x. 4 adaas adaas 54 Jul 8 16:00 .ivy2
若不存在,使用下面命令安装
$ tar -xvf ivy2.tar.gz -C $HOME
5.2.2 Thrift Server配置文件修改
若spark sbin下startThriftServer.sh 文件已存在,则修改相关配置,若不存在,需要创建新文件。
注:需要修改脚本中的hive.server2.thrift.port 端口和SPARK_MASTER_URL地址
$ cd /apps/delta-lake/spark-3.0.0-bin-hadoop2.7/sbin
$ vi startThriftServer.sh
#!/bin/bash
BASE_HOME="$(cd "`dirname "$0"`"/..; pwd)"
SPARK_MASTER_URL="spark://VM01:7077" #使用实际IP地址替换
EXECUTOR_CORES=12 #设置当前服务可以使用的最大core数
$BASE_HOME/sbin/start-thriftserver.sh --master ${SPARK_MASTER_URL} \
--packages io.delta:delta-core_2.12:0.7.0 \
--conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" \
--conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog" \
--conf "spark.delta.logStore.class=org.apache.spark.sql.delta.storage.HDFSLogStore" \
--hiveconf "hive.server2.thrift.port=10005" \
--driver-memory 3g \
--conf "spark.scheduler.mode=FAIR" \
--executor-memory 1g \
--total-executor-cores ${EXECUTOR_CORES}
5.2.3 服务启动和关闭
$ cd /apps/delta-lake/spark-3.0.0-bin-hadoop2.7/sbin
$ ./startThriftServer.sh #开启服务
$ ./stop-thriftserver.sh #关闭服务
6. Zeppelin安装
6.1 相关部署包
zeppelin-0.9.0.tgz
Python3
ivy2.tar.gz
Spark 相关包(Spark部署参见 [3 Spark安装])
6.2 安装Zeppelin
6.2.1 安装步骤
1) 解压缩
$ tar -xvf zeppelin-0.9.0.tar.gz -C /apps/delta-lake/
2) 修改 zeppelin-env.sh
$ cd /apps/delta-lake/zeppelin-0.9.0/conf
$ vi zeppelin-env.sh
export JAVA_HOME=/apps/delta-lake/jdk1.8.0_171
#使用实际的Spark master地址替换下面参数
export MASTER=spark://<VM01>:7077
export ZEPPELIN_ADDR=${HOSTNAME}
#使用规划的端口
export ZEPPELIN_PORT=9093
#设置Zeppelin可使用的 CPU core数
export ZEPPELIN_JAVA_OPTS="-Dspark.cores.max=9"
#设置Zeppelin可用的内存
export ZEPPELIN_MEM="-Xms2048m -Xmx2048m"
#设置Zeppelin 解析器可用的内存
export ZEPPELIN_INTP_MEM="-Xms512m -Xmx512m"
#设置相关文件路径
export ZEPPELIN_LOG_DIR=/data/zeppelin-data/logs
export ZEPPELIN_PID_DIR=/data/zeppelin-data/tmp
export ZEPPELIN_WAR_TEMPDIR=/data/zeppelin-data/webserver
export ZEPPELIN_NOTEBOOK_DIR=/data/zeppelin-data/notebook
#设置Spark安装路径
export SPARK_HOME=/apps/delta-lake/spark-3.0.0-bin-hadoop2.7
export SPARK_SUBMIT_OPTIONS="--driver-memory 1024m --executor-memory 1g"
export SPARK_APP_NAME="zeppelin"
#设置Hadoop地址
export HADOOP_CONF_DIR=/apps/delta-lake/hadoop-2.7.7/etc/hadoop
export ZEPPELIN_SPARK_CONCURRENTSQL=true
#设置Python3地址
export PYTHONPATH=/apps/delta-lake/anaconda3
export PYSPARK_PYTHON=/apps/delta-lake/anaconda3/bin/python
3) 安装 Python3
$ tar -xvf anaconda3.tar.gz -C /apps/delta-lake/
4)安装 Delta lake 依赖 jar
#可用如下方式检查
$ ll -a /home/adaas/ | grep ivy
drwxrwxr-x. 4 adaas adaas 54 Jul 8 16:00 .ivy2
若不存在,使用下面命令安装
$ tar -xvf ivy2.tar.gz -C $HOME
5)登录配置
$ cd /apps/zeppelin-0.9.0/conf/
$ vi shiro.ini
#配置要使用的用户
[users]
admin = 1qaz@WSX, admin
sappydataadmin = Tkjy2019, admin
6.2.2 服务启动和关闭
$ cd /apps/zeppelin-0.9.0/bin/
$ ./zeppelin-daemon.sh start #启动服务
$ ./zeppelin-daemon.sh stop #停止服务
$ ./zeppelin-daemon.sh status #查看状态
6.2.3 服务测试
6.3 Zeppelin使用
1) 使用shiro.ini中配置的用户名和密码登录 Zeppelin
2) 检查并配置解析器中python路径
3)创建一个note,并连接Spark Cluster
编码脚本并执行命令:
from pyspark import SparkContext, SparkConf, SQLContext
from pyspark.sql import SparkSession, SQLContext
from pyspark.sql import functions as F
from os.path import join, abspath
import os
spark = SparkSession.builder.master("spark://VM01:7077") \ # 修改spark master地址*
.appName("zeppelin-python").config("spark.cores.max", "3") \ # 设置合适的可用CPU core数
.enableHiveSupport() \
.config("spark.sql.warehouse.dir", "hdfs://***/usr/hive/warehouse") \ #设置要使用的hdfs连接
.config("principal", "****") \ #设置要使用的kerberos 用户
.config("keytab", "***") \ #设置要使用的keytab路径
.config("hive.metastore.uris", "thrift://VM01:9083") \ #设置呀使用hive metastore
.config("spark.jars.packages", "io.delta:delta-core_2.12:0.7.0") \
.config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension") \
.config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog") \
.config("spark.delta.logStore.class", "org.apache.spark.sql.delta.storage.HDFSLogStore") \
.getOrCreate()
from delta.tables import *
4)相关脚本
a) 从已存在的parquet文件生成的delta lake表
df=spark.read.format("parquet").load("hdfs://***/data/snappydata/test/SYS_CUSTOMER/TOTAL_MERGED")
df.write.format("delta").saveAsTable("hr.SYS_CUSTOMER")
b) 合并delta lake partition文件,提高查询效率
from delta.tables import *
tablepath = "hdfs://***/usr/hive/warehouse/hr.db/sys_customer"
df = spark.read.format("delta") \
.load(tablepath) \
.repartition(1) \ #设置适当的数目
.write \
.option("dataChange", "true") \
.format("delta") \
.mode("overwrite") \
.save(tablepath)
7. Ranger 安装
7.1 相关部署包
ranger-1.2.0-admin.tar.gz
ranger-1.2.0-usersync.tar.gz
ranger-1.2.0-metastore-plugin.tar.gz
7.2 Ranger Admin安装
7.2.1 安装部署
1) 解压缩包
$ tar -xvf ranger-1.2.0-admin.tar.gz -C /apps/delta-lake
2)修改 install.properties 配置文件
注:按照实际参数修改install.properties 配置文件
在mysql中创建数据库ranger,然后执行该脚本
DB_FLAVOR=MYSQL
#
#
# Location of DB client library (please check the location of the jar file)
#
SQL_CONNECTOR_JAR=/apps/ranger/ranger-1.2.0-admin/jdbc/mysql-connector-java-5.1.38.jar
#
# DB password for the DB admin user-id
# **************************************************************************
# ** If the password is left empty or not-defined here,
# ** it will try with blank password during installation process
# **************************************************************************
#
#db_root_user=root|SYS|postgres|sa|dba
#db_host=host:port # for DB_FLAVOR=MYSQL|POSTGRES|SQLA|MSSQL #for example: db_host=localhost:3306
#db_host=host:port:SID # for DB_FLAVOR=ORACLE #for SID example: db_host=localhost:1521:ORCL
#db_host=host:port/ServiceName # for DB_FLAVOR=ORACLE #for Service example: db_host=localhost:1521/XE
db_root_user=root
db_root_password=****
db_host=<host name>:<port>
#SSL config
db_ssl_enabled=false
db_ssl_required=false
db_ssl_verifyServerCertificate=false
#db_ssl_auth_type=1-way|2-way, where 1-way represents standard one way ssl authentication and 2-way represents mutual ssl authentication
db_ssl_auth_type=2-way
javax_net_ssl_keyStore=
javax_net_ssl_keyStorePassword=
javax_net_ssl_trustStore=
javax_net_ssl_trustStorePassword=
#
# DB UserId used for the Ranger schema
#
db_name=ranger
db_user=rangeradmin
db_password=*** #mysql数据库密码
#Source for Audit Store. Currently only solr is supported.
# * audit_store is solr
#audit_store=solr
audit_store=
#
# ------- PolicyManager CONFIG ----------------
#
policymgr_external_url=http://<VM03>:6080
policymgr_http_enabled=true
policymgr_https_keystore_file=
policymgr_https_keystore_keyalias=rangeradmin
policymgr_https_keystore_password=
#Add Supported Components list below separated by semi-colon, default value is empty string to support all components
#Example : policymgr_supportedcomponents=hive,hbase,hdfs
policymgr_supportedcomponents=
#
# ------- PolicyManager CONFIG - END ---------------
#
#
# ------- UNIX User CONFIG ----------------
#
unix_user=adaas
#linux VM 用户名密码
unix_user_pwd=adaas
unix_group=adaas
# PID file path
RANGER_PID_DIR_PATH=/apps/public/ranger-1.2.0-admin/tmp
7.2.2 服务启动和关闭
$ cd /apps/delta-lake/ranger-1.2.0-admin/ews
$ ./start-ranger-admin.sh
$ ./stop-ranger-admin.sh
7.2.3 服务测试
7.3 Ranger usersync plug-in 插件安装
Ranger usersync plug-in 插件用于从LDAP同步数据到Ranger。
7.3.1 安装部署
1)解压缩此插件
$ tar -xvf ranger-1.2.0-usersync.tar.gz -C /apps/delta-lake
2) 修改 install.properties 配置文件
检查下面的配置并按照实际参数修改
$ cd /apps/delta-lake/ranger-1.2.0-usersync
$ vi install.properties
#
# The following URL should be the base URL for connecting to the policy manager web application
# For example:
#
# POLICY_MGR_URL = http://policymanager.xasecure.net:6080
#
POLICY_MGR_URL = http://VM03:6080
# sync source, only unix and ldap are supported at present
# defaults to unix
SYNC_SOURCE = ldap
#User and group for the usersync process
unix_user=adaas
unix_group=adaas
# URL of source ldap
# a sample value would be: ldap://ldap.example.com:389
# Must specify a value if SYNC_SOURCE is ldap
SYNC_LDAP_URL = ldap://<LDAP Host>:1389/
# ldap bind dn used to connect to ldap and query for users and groups
# a sample value would be cn=admin,ou=users,dc=hadoop,dc=apache,dc=org
# Must specify a value if SYNC_SOURCE is ldap
SYNC_LDAP_BIND_DN = cn=Directory manager
# ldap bind password for the bind dn specified above
# please ensure read access to this file is limited to root, to protect the password
# Must specify a value if SYNC_SOURCE is ldap
# unless anonymous search is allowed by the directory on users and group
SYNC_LDAP_BIND_PASSWORD = qwert12345
# ldap delta sync flag used to periodically sync users and groups based on the updates in the server
# please customize the value to suit your deployment
# default value is set to true when is SYNC_SOURCE is ldap
SYNC_LDAP_DELTASYNC =true
# search base for users and groups
# sample value would be dc=hadoop,dc=apache,dc=org
SYNC_LDAP_SEARCH_BASE = dc=nrap,dc=net
# search base for users
# sample value would be ou=users,dc=hadoop,dc=apache,dc=org
# overrides value specified in SYNC_LDAP_SEARCH_BASE
SYNC_LDAP_USER_SEARCH_BASE = ou=people,dc=nrap,dc=net
# attribute from user entry that would be treated as user name
# please customize the value to suit your deployment
# default value: cn
SYNC_LDAP_USER_NAME_ATTRIBUTE = uid
# attribute from user entry whose values would be treated as
# group values to be pushed into Policy Manager database
# You could provide multiple attribute names separated by comma
# default value: memberof, ismemberof
SYNC_LDAP_USER_GROUP_NAME_ATTRIBUTE = ismemberof
#
# UserSync - Case Conversion Flags
# possible values: none, lower, upper
SYNC_LDAP_USERNAME_CASE_CONVERSION=none
SYNC_LDAP_GROUPNAME_CASE_CONVERSION=none
#user sync log path
logdir=logs
#/var/log/ranger/usersync
# PID DIR PATH
USERSYNC_PID_DIR_PATH=/apps/ranger/ranger-1.2.0-usersync/temp
# do we want to do ldapsearch to find groups instead of relying on user entry attributes
# valid values: true, false
# any value other than true would be treated as false
# default value: false
SYNC_GROUP_SEARCH_ENABLED=true
# do we want to do ldapsearch to find groups instead of relying on user entry attributes and
# sync memberships of those groups
# valid values: true, false
# any value other than true would be treated as false
# default value: false
SYNC_GROUP_USER_MAP_SYNC_ENABLED=true
# search base for groups
# sample value would be ou=groups,dc=hadoop,dc=apache,dc=org
# overrides value specified in SYNC_LDAP_SEARCH_BASE, SYNC_LDAP_USER_SEARCH_BASE
# if a value is not specified, takes the value of SYNC_LDAP_SEARCH_BASE
# if SYNC_LDAP_SEARCH_BASE is also not specified, takes the value of SYNC_LDAP_USER_SEARCH_BASE
SYNC_GROUP_SEARCH_BASE=ou=groups,dc=nrap,dc=net
3)插件安装
使用root命令执行插件安装
$ cd /apps/delta-lake/ranger-1.2.0-usersync
$ sudo ./setup.sh
Direct Key not found:hadoop_conf
Direct Key not found:ranger_base_dir
Direct Key not found:USERSYNC_PID_DIR_PATH
Direct Key not found:rangerUsersync_password
[I] ranger.usersync.ldap.ldapbindpassword property is verified.
...
usersync.ssl.key.password has been successfully created.
usersync.ssl.truststore.password has been successfully created.
Creating ranger-usersync-env-hadoopconfdir.sh file
Creating ranger-usersync-env-piddir.sh file
Creating ranger-usersync-env-confdir.sh file
WARN: core-site.xml file not found in provided hadoop conf path...
异常
处理:若执行脚本报无法找到JAVA_HOME,可以修改setup.sh 增加如下参数:
export JAVA_HOME=/apps/jdk1.8.0_171
export JAVA_HOME=/apps/jdk1.8.0_171
if [ "${JAVA_HOME}" == "" ]
then
echo "JAVA_HOME environment property not defined, aborting installation."
exit 1
elif [ ! -d "${JAVA_HOME}" ]
then
echo "JAVA_HOME environment property was set incorrectly, aborting installation."
exit 1
else
export JAVA_HOME
PATH="${JAVA_HOME}/bin:${PATH}"
export PATH
fi
./setup.py
7.3.2 服务启动和关闭
$ ./ranger-usersync-services.sh start
$ ./ranger-usersync-services.sh stop
7.4 Ranger meta store plug-in插件安装
Ranger meta store plug-in 插件用于Hive meta store端数据权限校验。 插件运行在hive meta中。
7.4.1 安装部署
1) 解压缩
$ tar -xvf ranger-1.2.0-metastore.tar.gz -C /apps/delta-lake/
2) 修改install.properties
$ cd /apps/delta-lake/ranger-1.2.0-metastore/
$ vi install.properties
按照实际值修改相应配置:
POLICY_MGR_URL=http://VM03:6080/
# JAVA_HOME
JAVA_HOME=/apps/delta-lake/jdk1.8.0_171
#
# This is the repository name created within policy manager
#
# Example:
# REPOSITORY_NAME=hivedev
#
REPOSITORY_NAME=thrift111-10002
#
# Hive installation directory
#
# Example:
# COMPONENT_INSTALL_DIR_NAME=/var/local/apache-hive-2.1.0-bin
#
COMPONENT_INSTALL_DIR_NAME=/apps/delta-lake/apache-hive-2.3.7-bin
XAAUDIT.SOLR.ENABLE=false
XAAUDIT.SOLR.URL=NONE
XAAUDIT.SOLR.USER=NONE
XAAUDIT.SOLR.PASSWORD=NONE
XAAUDIT.SOLR.ZOOKEEPER=NONE
XAAUDIT.SOLR.FILE_SPOOL_DIR=/var/log/hive/audit/solr/spool
#Solr Audit Provder
XAAUDIT.SOLR.IS_ENABLED=false
XAAUDIT.SOLR.MAX_QUEUE_SIZE=1
XAAUDIT.SOLR.MAX_FLUSH_INTERVAL_MS=1000
XAAUDIT.SOLR.SOLR_URL=http://localhost:6083/solr/ranger_audits
CUSTOM_USER=adaas
CUSTOM_GROUP=adaas
3)部署插件到Hive
$ sudo ./enable-metastore-plugin.sh
4)重启Hive Meta Store使配置生效
7.5 Ranger 用户权限配置
待补充
8. Kerberos 配置
所有spark虚机需要安装Kerberos client包
1)检查krb5-libs包是否安装
$ rpm -qa | grep krb5-libs
krb5-libs-1.15.1-8.el7.x86_64
2)若未安装krb5-libs,使用命令安装方式如下:
$ sudo yum install krb5-libs
3) 使用kinit登录
$ kinit -kt /data/kerberos/xxx_user1.service.keytab XXX_user1/XXX_user1@KES.COM