hive集群配置 hive on spark
hive
HiveServer2的高可用-HA配置
hive on spark
编译sparkhive on spark要求spark编译时不集成hive,编辑命令如下,需要安装maven,命令中hadoop版本根据实际情况调整
#Spark 2.0.0以后
./dev/make-distribution.sh--name"hadoop2-without-hive"--tgz"-Pyarn,hadoop-provided,hadoop-2.6,parquet-provided"
环境变量在/etc/profile中加入以下环境变量
exportHADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
exportYARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
hive配置在hive-site.xml中加入以下配置,同时将spark的jars目录中的jar文件传到hdfs对应目录下
#创建hdfs目录
hadoop fs-mkdir/spark
#上传/application/spark/jars文件夹到hdfs的/spark目录
hadoop fs-put/application/spark/jars/ /spark/
spark.yarn.jars
hdfs://xxxx:9000/spark/jars/*.jar
spark配置yarn模式在spark-env.sh中加入以下配置,注意不要有spark集群的配置,会导致hive on spark出现异常
#注意$(hadoop classpath)需要支持hadoop命令可执行,可以修改成根目录形式$(/application/hadoop-2.6.4/bin/hadoop classpath)
exportSPARK_DIST_CLASSPATH=$(hadoop classpath)
exportHADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
exportYARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
```
> 在spark-defaults.conf中加入以下配置,需要将该文件放入$HIVE_HOME/conf目录下
spark.master yarn
spark.submit.deployMode client
spark.eventLog.enabled true
spark.eventLog.dir hdfs://dashuju174:9000/spark/logs
spark.driver.memory 512m
spark.driver.cores 1
spark.executor.memory 512m
spark.executor.cores 1
spark.executor.instances 2
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.yarn.jars hdfs://dashuju174:9000/spark/jars/*.jar