背景:集群中有hive、spark2,没有impala
需求:通过hue访问hive以及提交spark任务
现状:hue以及通过sentry实现了对hive的权限管理
CDH5.15.1
centos7.4
livy-0.5.0-incubating-bin
目标:hue实现scala、pyspark的交互式功能、实现提交python脚本,提交spark jar包到spark 的功能
1、部署安装livy。目前博客上很多是编译安装,我试着从github上找了一下,总是编译不成功(可能是环境不对,我的是jdk1.8),最后没有编译,直接下载了livy官网版本。
[root@dip livy]# cd conf/ [root@dip conf]# scp livy-env.sh.template livy-env.sh [root@dip conf]# scp spark-blacklist.conf.template spark-blacklist.conf [root@dip conf]# scp livy.conf.template livy.conf [root@dip conf]# chown hdfs:hdfs livy.conf livy-env.sh spark-blacklist.conf [root@dip conf]# ll 在HDFS上创建livy的home目录 [root@dip conf]# sudo -u hdfs hadoop fs -mkdir /user/livy [root@dip conf]# sudo -u hdfs hadoop fs -chown hdfs:hdfs /user/livy [root@dip conf]# sudo -u hdfs hadoop fs -ls /user vim livy.conf
livy.spark.master = yarn
livy.spark.deploy-mode = cluster
livy.environment = production
livy.impersonation.enabled = true
livy.server.port = 8998
livy.server.session.timeout = 3600000
livy.server.recovery.mode = recovery
livy.server.recovery.state-store=filesystem
livy.server.recovery.state-store.url=/opt/cslc/livy-0.5.0-incubating-bin
vim livy-env.sh
export JAVA_HOME=/opt/cslc/jdk1.8.0_151
export HADOOP_HOME=/opt/cloudera/parcels/CDH/lib/hadoop
export SPARK_CONF_DIR=/etc/spark2/conf
export SPARK_HOME=/opt/cloudera/parcels/SPARK2-2.3.0.cloudera3-1.cdh5.13.3.p0.458809/lib/spark2
export HADOOP_CONF_DIR=/etc/hadoop/conf
export LIVY_LOG_DIR=/home/cloudera_data_lib_log_tmp/log/livy/
export LIVY_SERVER_JAVA_OPTS="-Xmx2g"
启动: 使用hdfs用户启动 : /opt/cloudera/livy-0.5.0-incubating-bin/bin/livy-server start
2.CDH集成livy
修改 core-site.xml 的群集范围高级配置代码段(安全阀) <property> <name>hadoop.proxyuser.livy.groups</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.livy.hosts</name> <value>*</value> </property>
3.hue配置livy连接spark
修改hue_safety_valve.ini 的 Hue 服务高级配置代码段(安全阀) [desktop] app_blacklist= [spark] livy_server_host=dip007 livy_server_port=8998 livy_server_session_kind=yarn [notebook] show_notebooks=true enable_batch_execute=true enable_query_builder=true enable_query_scheduling=false [[interpreters]] [[[hive]]] # The name of the snippet. name=Hive # The backend connection to use to communicate with the server. interface=hiveserver2 [[[spark]]] name=Scala interface=livy [[[pyspark]]] name=PySpark interface=livy [[[jar]]] name=Spark Submit Jar interface=livy-batch [[[py]]] name=Spark Submit Python interface=livy-batch [[[text]]] name=Text interface=text [[[spark2]]] name=Spark interface=oozie [[[markdown]]] name=Markdown interface=text [[[java]]] name=Java interface=oozie [[[mapreduce]]] name=MapReduce interface=oozie [[[distcp]]] name=Distcp interface=oozie [[[shell]]] name=Shell interface=oozie
遇到的问题:
1.livy总是去连接一个处于stand by 的name node
尝试:配置hue使用高可用的yarn 无效
尝试2:之前安装是基于普通用户,后来所有安装后都换成hdfs用户,解决