经过官网及网上其他的资料介绍,摸索着安装,发现各种问题。现将最终的成果分享,希望帮到其他人。
hive版本 2.2.0
spark版本 1.6.0
1、编译spark1.6.0源码,生成无hive包组件
./make-distribution.sh --name "hadoop2-without-hive" --tgz "-Pyarn,hadoop-provided,hadoop-2.6,parquet-provided"
2、配置hive-site.xml
<configuration>
<property>
<name>hive.execution.engine</name>
<value>spark</value>
</property>
<property>
<name>spark.master</name>
<value>yarn-cluster</value> <!--client模式存在异常-->
</property>
<property>
<name>spark.eventLog.enabled</name>
<value>true</value>
</property>
<property>
<name>spark.eventLog.dir</name>
<value>hdfs://s2:9000/user/hive/tmp/sparkeventlog</value>
</property>
<property>
<name>spark.serializer</name>
<value>org.apache.spark.serializer.KryoSerializer</value>
</property>
<property>
<name>spark.executor.memory</name>
<value>3g</value>
</property>
<property>
<name>spark.executor.cores</name>
<value>2</value>
</property>
<property>
<name>spark.executor.instances</name>
<value>30</value>
</property>
<property>
<name>spark.driver.memory</name>
<value>3g</value>
</property>
<property>
<name>hive.spark.client.server.connect.timeout</name>
<value>300000</value>
</property>
</configuration>
3、hive lib包添加jar包
spark-assembly-1.6.0-hadoop2.6.0.jar
4、spark.env.sh配置
export HADOOP_HOME=/data/soft/hadoop-2.6.0
export HADOOP_CONF_DIR=/data/soft/hadoop-2.6.0/etc/hadoop
export YARN_CONF_DIR=/data/soft/hadoop-2.6.0/etc/hadoop
export SPARK_DIST_CLASSPATH=$(${HADOOP_HOME}/bin/hadoop classpath)
5、配置spark全局环境变量
sudo vi /etc/profile
export SPARK_HOME=/data/soft/spark1.6.0
export PATH=$PATH:$SPARK_HOME/bin