2020-12-08

 

 

1. 部署流程图

 

2. 安装清单

序号虚机名称安装服务
1VM01Master/Worker/Hive Meta Store/Thrift Server/Kerberos Client
2VM02Worker/Zeppelin/Kerberos Client
3VM03Worker/Ranger/Kerberos Client

3. Hive Meta Store安装

3.1. 相关压缩包

hadoop-2.7.7.tar.gz

apache-hive-2.3.7.tar.gz

3.2. 安装步骤

此文档默认安装路径为:/apps/delta-lake

3.2.1. 安装 hadoop

tar -xvf hadoop-2.7.7.tar.gz /apps/delta-lake

3.2.2. 安装hive

tar -xvf apache-hive-2.3.7.tar.gz -C /apps/delta-lake

修改hive-env.sh

cd /apps/delta-lake/apache-hive-2.3.7-bin/conf
#修改配置文件
vi hive-env.sh
​
export HADOOP_HEAPSIZE=1024
export HADOOP_HOME=/apps/delta-lake/hadoop-2.7.7  #修改hadoop文件路径
export HIVE_CONF_DIR=/apps/delta-lake/apache-hive-2.3.7-bin/conf

修改hive-site.xml(按照实际情况修改下面配置的参数和路径)

vi hive-env.sh

<!-- 修改相关文件路径 -->
<configuration> 
 <property>
    <name>system:java.io.tmpdir</name>
    <value>/data/hive-data/tmp</value>
  </property>
  <property>
    <name>system:user.name</name>
    <value>hive</value>
  </property>

  <property>
    <name>hive.exec.scratchdir</name>
    <value>/data/hive-data/hive</value>
    <description></description>
  </property>
  <property>
    <name>hive.repl.rootdir</name>
    <value>/data/hive-data/repl</value>
    <description>HDFS root dir for all replication dumps.</description>
  </property>
  <property>
    <name>hive.repl.cmrootdir</name>
    <value>/data/hive-data/cmroot</value>
    <description>Root dir for ChangeManager, used for deleted files.</description>
  </property>
  <property>
    <name>hive.server2.logging.operation.log.location</name>
    <value>/data/hive-data/hive/operation_logs</value>
    <description></description>
  </property>
   <property>
    <name>hive.exec.local.scratchdir</name>
    <value>/data/hive-data/scratchdir</value>
    <description>Local scratch space for Hive jobs</description>
  </property>
  <property>
    <name>hive.downloaded.resources.dir</name>
    <value>/data/hive-data/download/${hive.session.id}_resources</value>
    <description>Temporary local directory for added resources in the remote file system.</description>
  </property>
  <property>
    <name>hive.querylog.location</name>
    <value>/data/hive-data/querylog</value>
    <description>Location of Hive run time structured log file</description>
  </property>
    
  <!-- 修改hdfs 中 hive要使用的初始路径 -->
  <property>
    <name>hive.metastore.warehouse.dir</name>
    <value>hdfs://***/usr/hive/warehouse</value>
    <description>location of default database for the warehouse</description>
  </property>
  
  <!-- 若有需要,修改hive meta store端口 -->
  <property>
    <name>hive.metastore.port</name>
    <value>9083</value>
    <description>Hive metastore listener port</description>
  </property>
 <!-- 默认改参数为true -->
  <property>
    <name>hive.metastore.metrics.enabled</name>
    <value>true</value>
    <description>Enable metrics on the metastore.</description>
  </property>

 <!-- 修改并发查询数量 -->
  <property>
    <name>hive.exec.parallel.thread.number</name>
    <value>50</value>
    <description>How many jobs at most can be executed in parallel</description>
  </property>

  <!-- 修改数据库连接串,改为实际要用的地址 -->
  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://*.*.*.*/hive_metastore?createDatabaseIfNotExist=true&amp;useSSL=false</value>
    <description>
    </description>
  </property>
  
  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
    <description>Driver class name for a JDBC metastore</description>
  </property>
  <!-- 数据库用户名和密码 -->
  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>root</value>
    <description>Username to use against metastore database</description>
  </property>
   <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>****</value>
    <description>password to use against metastore database</description>
  </property>

</configuration>

修改hive Kerberos 配置

1) hive-env.sh 增加Kerberos配置

vi hive-env.sh
​
  <property>
   <name>hive.metastore.sasl.enabled</name>
   <value>true</value>
  </property>
​
<!-- 修改keytab实际文件地址 -->
  <property>
    <name>hive.metastore.kerberos.keytab.file</name>
    <value>/data/kerberos/xxx_user1.service.keytab</value>  
  </property><!-- 修改keytab登录名 -->
  <property>
    <name>hive.metastore.kerberos.principal</name>
    <value>XXX_user1/XXX_user1@KES.COM</value>  </property> 
​
  <property>
    <name>hive.server2.authentication</name>
    <value>KERBEROS</value>
  </property>
​
  <property>
    <name>hive.server2.authentication.kerberos.principal</name>
    <value>****</value>
  </property>
  <property>
    <name>hive.server2.authentication.kerberos.keytab</name>
    <value>****</value>
  </property>

2) 拷贝core-site.xml 到/apps/apache-hive-2.3.7-bin/conf文件夹下

 

服务启动和停止

$ cd /apps/delta-lake/apache-hive-2.3.7-bin/bin
$ ./start-hive.sh
$ ./stop-hive.sh
​

 

4. Spark 安装

4.1 相关部署包

jdk-8u171-linux-x64.tar.gz

hadoop-2.7.7.tar.gz

scala-2.12.2.tgz

spark-3.0.0-bin-hadoop2.7.tar.gz

4.2部署步骤

此文档默认安装路径为:/apps/delta-lake

序号虚机名称角色
1VM01Master/Worker
2VM02Worker
3VM03Worker

4.2.1 建立互信

若虚机之前已建立互信,忽略此步骤

$ ssh-keygen -t rsa #需要确认直接enter键确认
​
#配置互信,需要输入密码确认
$ ssh-copy-id VM01
$ ssh-copy-id VM02
$ ssh-copy-id VM03

 

4.2.2 JDK安装

所有spark集群虚机依次检查并安装 java。

检查是否java已安装,若已安装则跳过该步骤

# 检查java是否已安装
$ java -version
java version "1.8.0_171"
Java(TM) SE Runtime Environment (build 1.8.0_171-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode)

安装java:

$ tar -xvf jdk-8u171-linux-x64.tar.gz -C /apps/delta-lake/

4.2.3 Hadoop 安装

$ tar -xvf hadoop-2.7.7.tar.gz /apps/delta-lake

4.2.4 Scala 安装

$ tar -xvf scala-2.12.2.tgz -C /apps/delta-lake/

4.2.5 Spark 安装

1)安装spark

$ tar -xvf spark-3.0.0-bin-hadoop2.7.tar.gz -C /apps/delta-lake/

2)配置slaves(配置worker)

$ cd /apps/delta-lake/spark-3.0.0-bin-hadoop2.7/
$ vi slaves
​
#列出所有可用于worker的虚机清单
# A Spark Worker will be started on each of the machines listed below.
VM01
VM02
VM03

3) spark-env.sh 配置

$ vi spark-env.sh
export HADOOP_HOME=/apps/delta-lake/hadoop-2.7.7
export JAVA_HOME=/apps/delta-lake/jdk1.8.0_171
export SCALA_HOME=/apps/delta-lake/scala-2.12.2
​
SPARK_MASTER_HOST=VM01
SPARK_MASTER_PORT=7077
SPARK_MASTER_WEBUI_PORT=9090
SPARK_MASTER_OPTS="-Djava.io.tmpdir=/data/spark-data/tmp/" #按需修改实际路径
​
SPARK_WORKER_WEBUI_PORT=9091
SPARK_WORKER_CORES=8  #按照实际core数进行修改
SPARK_WORKER_MEMORY=5g #按照实际可用内存设置
SPARK_WORKER_DIR=/nfsmount/spark-data/$HOSTNAME/work #使用NAS路径
SPARK_LOCAL_DIRS=/nfsmount/spark-data/$HOSTNAME/data #使用NAS路径
SPARK_WORKER_OPTS="-Djava.io.tmpdir=/nfsmount/spark-data/$HOSTNAME/tmp/"
SPARK_LOG_DIR=/nfsmount/spark-data/$HOSTNAME/logs #使用NAS路径
SPARK_PID_DIR=/nfsmount/spark-data/$HOSTNAME/tmp #使用NAS路径
​
# HADOOP_USER_NAME=hdfs  #若配置kerberos则不需要该配置
​

4) 同步安装文件到其它节点

$ scp -r /apps/delta-lake/jdk1.8.0_171/ VM02:/apps/delta-lake/
$ scp -r /apps/delta-lake/scala-2.12.2/ VM02:/apps/delta-lake/
$ scp -r /apps/delta-lake/hadoop-2.7.7/ VM02:/apps/delta-lake/
$ scp -r /apps/delta-lake/spark-3.0.0-bin-hadoop2.7/ VM02:/apps/delta-lake/
​
$ scp -r /apps/delta-lake/jdk1.8.0_171/ VM03:/apps/delta-lake/
$ scp -r /apps/delta-lake/scala-2.12.2/ VM03:/apps/delta-lake/
$ scp -r /apps/delta-lake/hadoop-2.7.7/ VM03:/apps/delta-lake/
$ scp -r /apps/delta-lake/spark-3.0.0-bin-hadoop2.7/ VM03:/apps/delta-lake/

4.2.6 集群开启和关闭

$ cd /apps/delta-lake/spark-3.0.0-bin-hadoop2.7/sbin/
$ ./start-all.sh  #全部启动
$ ./stop-all.sh   #全部关闭

4.2.7 服务测试

打开控制台,检查相关服务是否已启动 http://<VM01>:9090/

 

5. Thrift Server安装

5.1 相关部署包

jdk-8u171-linux-x64.tar.gz

hadoop-2.7.7.tar.gz

scala-2.12.2.tgz

spark-3.0.0-bin-hadoop2.7.tar.gz

ivy2.tar.gz

5.2 Thrift Server安装

具体安装步骤同 [Spark 安裝步骤],若当前虚机已安装spark节点且尚未启动Thrift Server服务,可直接使用该配置

5.2.1 安装 Delta lake 依赖jar

#可用如下方式检查
$ ll -a /home/adaas/ | grep ivy
drwxrwxr-x.  4 adaas adaas       54 Jul  8 16:00 .ivy2

若不存在,使用下面命令安装

$ tar -xvf ivy2.tar.gz -C $HOME

5.2.2 Thrift Server配置文件修改

若spark sbin下startThriftServer.sh 文件已存在,则修改相关配置,若不存在,需要创建新文件。

注:需要修改脚本中的hive.server2.thrift.port 端口和SPARK_MASTER_URL地址

$ cd /apps/delta-lake/spark-3.0.0-bin-hadoop2.7/sbin
​
$ vi startThriftServer.sh
#!/bin/bash
​
BASE_HOME="$(cd "`dirname "$0"`"/..; pwd)"
​
SPARK_MASTER_URL="spark://VM01:7077" #使用实际IP地址替换
EXECUTOR_CORES=12 #设置当前服务可以使用的最大core数
​
$BASE_HOME/sbin/start-thriftserver.sh --master ${SPARK_MASTER_URL} \
--packages io.delta:delta-core_2.12:0.7.0 \
--conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" \
--conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog" \
--conf "spark.delta.logStore.class=org.apache.spark.sql.delta.storage.HDFSLogStore" \
--hiveconf "hive.server2.thrift.port=10005" \ 
--driver-memory 3g \
--conf "spark.scheduler.mode=FAIR" \
--executor-memory 1g \
--total-executor-cores ${EXECUTOR_CORES}
​

5.2.3 服务启动和关闭

$ cd /apps/delta-lake/spark-3.0.0-bin-hadoop2.7/sbin
$ ./startThriftServer.sh #开启服务
$ ./stop-thriftserver.sh #关闭服务

 

6. Zeppelin安装

6.1 相关部署包

zeppelin-0.9.0.tgz

Python3

ivy2.tar.gz

Spark 相关包(Spark部署参见 [3 Spark安装])

6.2 安装Zeppelin

6.2.1 安装步骤

1) 解压缩

$ tar -xvf zeppelin-0.9.0.tar.gz -C /apps/delta-lake/

2) 修改 zeppelin-env.sh

$ cd /apps/delta-lake/zeppelin-0.9.0/conf
$ vi zeppelin-env.sh
​
export JAVA_HOME=/apps/delta-lake/jdk1.8.0_171
#使用实际的Spark master地址替换下面参数
export MASTER=spark://<VM01>:7077
export ZEPPELIN_ADDR=${HOSTNAME}
#使用规划的端口
export ZEPPELIN_PORT=9093
​
#设置Zeppelin可使用的 CPU core数
export ZEPPELIN_JAVA_OPTS="-Dspark.cores.max=9"
#设置Zeppelin可用的内存
export ZEPPELIN_MEM="-Xms2048m -Xmx2048m"
#设置Zeppelin 解析器可用的内存
export ZEPPELIN_INTP_MEM="-Xms512m -Xmx512m"
​
#设置相关文件路径
export ZEPPELIN_LOG_DIR=/data/zeppelin-data/logs
export ZEPPELIN_PID_DIR=/data/zeppelin-data/tmp
export ZEPPELIN_WAR_TEMPDIR=/data/zeppelin-data/webserver
export ZEPPELIN_NOTEBOOK_DIR=/data/zeppelin-data/notebook
​
#设置Spark安装路径
export SPARK_HOME=/apps/delta-lake/spark-3.0.0-bin-hadoop2.7
export SPARK_SUBMIT_OPTIONS="--driver-memory 1024m --executor-memory 1g"
export SPARK_APP_NAME="zeppelin"
​
#设置Hadoop地址
export HADOOP_CONF_DIR=/apps/delta-lake/hadoop-2.7.7/etc/hadoop
​
export ZEPPELIN_SPARK_CONCURRENTSQL=true
#设置Python3地址
export PYTHONPATH=/apps/delta-lake/anaconda3
export PYSPARK_PYTHON=/apps/delta-lake/anaconda3/bin/python

3) 安装 Python3

$ tar -xvf anaconda3.tar.gz -C /apps/delta-lake/

4)安装 Delta lake 依赖 jar

#可用如下方式检查
$ ll -a /home/adaas/ | grep ivy
drwxrwxr-x.  4 adaas adaas       54 Jul  8 16:00 .ivy2

若不存在,使用下面命令安装

$ tar -xvf ivy2.tar.gz -C $HOME

5)登录配置

$ cd /apps/zeppelin-0.9.0/conf/
$ vi shiro.ini
#配置要使用的用户
[users]
admin = 1qaz@WSX, admin
sappydataadmin = Tkjy2019, admin

6.2.2 服务启动和关闭

$ cd /apps/zeppelin-0.9.0/bin/
$ ./zeppelin-daemon.sh start #启动服务
$ ./zeppelin-daemon.sh stop  #停止服务
$ ./zeppelin-daemon.sh status  #查看状态

6.2.3 服务测试

http://VM01:9093

6.3 Zeppelin使用

1) 使用shiro.ini中配置的用户名和密码登录 Zeppelin

2) 检查并配置解析器中python路径

 

 

3)创建一个note,并连接Spark Cluster

 

编码脚本并执行命令:

 

from pyspark import SparkContext, SparkConf, SQLContext
from pyspark.sql import SparkSession, SQLContext
from pyspark.sql import functions as F 
from os.path import join, abspath
import os
​
​
spark = SparkSession.builder.master("spark://VM01:7077") \   # 修改spark master地址*
.appName("zeppelin-python").config("spark.cores.max", "3") \ # 设置合适的可用CPU core数
.enableHiveSupport() \
.config("spark.sql.warehouse.dir", "hdfs://***/usr/hive/warehouse") \  #设置要使用的hdfs连接
.config("principal", "****") \   #设置要使用的kerberos 用户
.config("keytab", "***") \ #设置要使用的keytab路径
.config("hive.metastore.uris", "thrift://VM01:9083") \  #设置呀使用hive metastore 
.config("spark.jars.packages", "io.delta:delta-core_2.12:0.7.0") \
.config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension") \
.config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog") \
.config("spark.delta.logStore.class", "org.apache.spark.sql.delta.storage.HDFSLogStore") \
.getOrCreate()
​
from delta.tables import *

4)相关脚本

a) 从已存在的parquet文件生成的delta lake表

df=spark.read.format("parquet").load("hdfs://***/data/snappydata/test/SYS_CUSTOMER/TOTAL_MERGED")
df.write.format("delta").saveAsTable("hr.SYS_CUSTOMER")

b) 合并delta lake partition文件,提高查询效率

from delta.tables import *
​
tablepath = "hdfs://***/usr/hive/warehouse/hr.db/sys_customer" 
df = spark.read.format("delta") \
.load(tablepath) \
.repartition(1) \  #设置适当的数目
 .write \
 .option("dataChange", "true") \
 .format("delta") \
 .mode("overwrite") \
 .save(tablepath)

 

7. Ranger 安装

7.1 相关部署包

ranger-1.2.0-admin.tar.gz

ranger-1.2.0-usersync.tar.gz

ranger-1.2.0-metastore-plugin.tar.gz

7.2 Ranger Admin安装

7.2.1 安装部署

1) 解压缩包

$ tar -xvf ranger-1.2.0-admin.tar.gz -C /apps/delta-lake

2)修改 install.properties 配置文件

注:按照实际参数修改install.properties 配置文件

在mysql中创建数据库ranger,然后执行该脚本

DB_FLAVOR=MYSQL
#
​
#
# Location of DB client library (please check the location of the jar file)
#
SQL_CONNECTOR_JAR=/apps/ranger/ranger-1.2.0-admin/jdbc/mysql-connector-java-5.1.38.jar
​
​
#
# DB password for the DB admin user-id
# **************************************************************************
# ** If the password is left empty or not-defined here,
# ** it will try with blank password during installation process
# **************************************************************************
#
#db_root_user=root|SYS|postgres|sa|dba
#db_host=host:port              # for DB_FLAVOR=MYSQL|POSTGRES|SQLA|MSSQL       #for example: db_host=localhost:3306
#db_host=host:port:SID          # for DB_FLAVOR=ORACLE                          #for SID example: db_host=localhost:1521:ORCL
#db_host=host:port/ServiceName  # for DB_FLAVOR=ORACLE                          #for Service example: db_host=localhost:1521/XE
db_root_user=root
db_root_password=****
db_host=<host name>:<port>
#SSL config
db_ssl_enabled=false
db_ssl_required=false
db_ssl_verifyServerCertificate=false
#db_ssl_auth_type=1-way|2-way, where 1-way represents standard one way ssl authentication and 2-way represents mutual ssl authentication
db_ssl_auth_type=2-way
javax_net_ssl_keyStore=
javax_net_ssl_keyStorePassword=
javax_net_ssl_trustStore=
javax_net_ssl_trustStorePassword=
#
# DB UserId used for the Ranger schema
#
db_name=ranger
db_user=rangeradmin
db_password=***  #mysql数据库密码
​
​
#Source for Audit Store. Currently only solr is supported.
# * audit_store is solr
#audit_store=solr
audit_store=
​
#
# ------- PolicyManager CONFIG ----------------
#
​
policymgr_external_url=http://<VM03>:6080
policymgr_http_enabled=true
policymgr_https_keystore_file=
policymgr_https_keystore_keyalias=rangeradmin
policymgr_https_keystore_password=
​
#Add Supported Components list below separated by semi-colon, default value is empty string to support all components
#Example :  policymgr_supportedcomponents=hive,hbase,hdfs
policymgr_supportedcomponents=
​
#
# ------- PolicyManager CONFIG - END ---------------
#
​
​
#
# ------- UNIX User CONFIG ----------------
#
unix_user=adaas
#linux VM 用户名密码
unix_user_pwd=adaas  
unix_group=adaas
​
# PID file path
RANGER_PID_DIR_PATH=/apps/public/ranger-1.2.0-admin/tmp

7.2.2 服务启动和关闭

$ cd /apps/delta-lake/ranger-1.2.0-admin/ews
$ ./start-ranger-admin.sh
$ ./stop-ranger-admin.sh
​

7.2.3 服务测试

http://VM03:6080/

7.3 Ranger usersync plug-in 插件安装

Ranger usersync plug-in 插件用于从LDAP同步数据到Ranger。

7.3.1 安装部署

1)解压缩此插件

$ tar -xvf ranger-1.2.0-usersync.tar.gz -C /apps/delta-lake

2) 修改 install.properties 配置文件

检查下面的配置并按照实际参数修改

$ cd /apps/delta-lake/ranger-1.2.0-usersync
$ vi install.properties
​
#
# The following URL should be the base URL for connecting to the policy manager web application
# For example:
#
#  POLICY_MGR_URL = http://policymanager.xasecure.net:6080
#
POLICY_MGR_URL = http://VM03:6080
​
# sync source,  only unix and ldap are supported at present
# defaults to unix
SYNC_SOURCE = ldap
​
#User and group for the usersync process
unix_user=adaas
unix_group=adaas
​
​
# URL of source ldap 
# a sample value would be:  ldap://ldap.example.com:389
# Must specify a value if SYNC_SOURCE is ldap
SYNC_LDAP_URL = ldap://<LDAP Host>:1389/
​
# ldap bind dn used to connect to ldap and query for users and groups
# a sample value would be cn=admin,ou=users,dc=hadoop,dc=apache,dc=org
# Must specify a value if SYNC_SOURCE is ldap
SYNC_LDAP_BIND_DN = cn=Directory manager
​
# ldap bind password for the bind dn specified above
# please ensure read access to this file  is limited to root, to protect the password
# Must specify a value if SYNC_SOURCE is ldap
# unless anonymous search is allowed by the directory on users and group
SYNC_LDAP_BIND_PASSWORD = qwert12345
​
# ldap delta sync flag used to periodically sync users and groups based on the updates in the server
# please customize the value to suit your deployment
# default value is set to true when is SYNC_SOURCE is ldap
SYNC_LDAP_DELTASYNC =true
​
# search base for users and groups
# sample value would be dc=hadoop,dc=apache,dc=org
SYNC_LDAP_SEARCH_BASE = dc=nrap,dc=net
​
# search base for users
# sample value would be ou=users,dc=hadoop,dc=apache,dc=org
# overrides value specified in SYNC_LDAP_SEARCH_BASE
SYNC_LDAP_USER_SEARCH_BASE = ou=people,dc=nrap,dc=net
​
# attribute from user entry that would be treated as user name
# please customize the value to suit your deployment
# default value: cn
SYNC_LDAP_USER_NAME_ATTRIBUTE = uid
​
# attribute from user entry whose values would be treated as 
# group values to be pushed into Policy Manager database
# You could provide multiple attribute names separated by comma
# default value: memberof, ismemberof
SYNC_LDAP_USER_GROUP_NAME_ATTRIBUTE = ismemberof
#
# UserSync - Case Conversion Flags
# possible values:  none, lower, upper
SYNC_LDAP_USERNAME_CASE_CONVERSION=none
SYNC_LDAP_GROUPNAME_CASE_CONVERSION=none
​
#user sync log path
logdir=logs
#/var/log/ranger/usersync
​
# PID DIR PATH
USERSYNC_PID_DIR_PATH=/apps/ranger/ranger-1.2.0-usersync/temp
​
# do we want to do ldapsearch to find groups instead of relying on user entry attributes
# valid values: true, false
# any value other than true would be treated as false
# default value: false
SYNC_GROUP_SEARCH_ENABLED=true
​
# do we want to do ldapsearch to find groups instead of relying on user entry attributes and
# sync memberships of those groups
# valid values: true, false
# any value other than true would be treated as false
# default value: false
SYNC_GROUP_USER_MAP_SYNC_ENABLED=true
​
# search base for groups
# sample value would be ou=groups,dc=hadoop,dc=apache,dc=org
# overrides value specified in SYNC_LDAP_SEARCH_BASE,  SYNC_LDAP_USER_SEARCH_BASE
# if a value is not specified, takes the value of  SYNC_LDAP_SEARCH_BASE
# if  SYNC_LDAP_SEARCH_BASE is also not specified, takes the value of SYNC_LDAP_USER_SEARCH_BASE
SYNC_GROUP_SEARCH_BASE=ou=groups,dc=nrap,dc=net

3)插件安装

使用root命令执行插件安装

$ cd /apps/delta-lake/ranger-1.2.0-usersync
​
$ sudo ./setup.sh
Direct Key not found:hadoop_conf
Direct Key not found:ranger_base_dir
Direct Key not found:USERSYNC_PID_DIR_PATH
Direct Key not found:rangerUsersync_password
[I] ranger.usersync.ldap.ldapbindpassword property is verified.
...
usersync.ssl.key.password has been successfully created.
usersync.ssl.truststore.password has been successfully created.
Creating ranger-usersync-env-hadoopconfdir.sh file
Creating ranger-usersync-env-piddir.sh file
Creating ranger-usersync-env-confdir.sh file
WARN: core-site.xml file not found in provided hadoop conf path...
​

异常

处理:若执行脚本报无法找到JAVA_HOME,可以修改setup.sh 增加如下参数:

export JAVA_HOME=/apps/jdk1.8.0_171

export JAVA_HOME=/apps/jdk1.8.0_171
​
if [ "${JAVA_HOME}" == "" ]
then
    echo "JAVA_HOME environment property not defined, aborting installation."
    exit 1
elif [ ! -d "${JAVA_HOME}" ]
then
    echo "JAVA_HOME environment property was set incorrectly, aborting installation."
    exit 1
else
    export JAVA_HOME
    PATH="${JAVA_HOME}/bin:${PATH}"
    export PATH
fi
​
./setup.py

 

7.3.2 服务启动和关闭

$ ./ranger-usersync-services.sh start
$ ./ranger-usersync-services.sh stop

7.4 Ranger meta store plug-in插件安装

Ranger meta store plug-in 插件用于Hive meta store端数据权限校验。 插件运行在hive meta中。

7.4.1 安装部署

1) 解压缩

$ tar -xvf ranger-1.2.0-metastore.tar.gz -C /apps/delta-lake/

2) 修改install.properties

$ cd /apps/delta-lake/ranger-1.2.0-metastore/
$ vi install.properties

按照实际值修改相应配置:

POLICY_MGR_URL=http://VM03:6080/
​
# JAVA_HOME
JAVA_HOME=/apps/delta-lake/jdk1.8.0_171
​
#
# This is the repository name created within policy manager
#
# Example:
# REPOSITORY_NAME=hivedev
#
REPOSITORY_NAME=thrift111-10002
​
#
# Hive installation directory
#
# Example:
# COMPONENT_INSTALL_DIR_NAME=/var/local/apache-hive-2.1.0-bin
#
COMPONENT_INSTALL_DIR_NAME=/apps/delta-lake/apache-hive-2.3.7-bin
​
​
XAAUDIT.SOLR.ENABLE=false
XAAUDIT.SOLR.URL=NONE
XAAUDIT.SOLR.USER=NONE
XAAUDIT.SOLR.PASSWORD=NONE
XAAUDIT.SOLR.ZOOKEEPER=NONE
XAAUDIT.SOLR.FILE_SPOOL_DIR=/var/log/hive/audit/solr/spool
​
#Solr Audit Provder
XAAUDIT.SOLR.IS_ENABLED=false
XAAUDIT.SOLR.MAX_QUEUE_SIZE=1
XAAUDIT.SOLR.MAX_FLUSH_INTERVAL_MS=1000
XAAUDIT.SOLR.SOLR_URL=http://localhost:6083/solr/ranger_audits
​
CUSTOM_USER=adaas
​
CUSTOM_GROUP=adaas
​

3)部署插件到Hive

$ sudo ./enable-metastore-plugin.sh

4)重启Hive Meta Store使配置生效

 

7.5 Ranger 用户权限配置

待补充

 

8. Kerberos 配置

所有spark虚机需要安装Kerberos client包

1)检查krb5-libs包是否安装

$ rpm -qa | grep krb5-libs
krb5-libs-1.15.1-8.el7.x86_64

2)若未安装krb5-libs,使用命令安装方式如下:

$ sudo yum install krb5-libs

3) 使用kinit登录

$ kinit -kt /data/kerberos/xxx_user1.service.keytab XXX_user1/XXX_user1@KES.COM
​

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值