spark对于hive的版本是有要求的
spark1.6.1对应hive版本是1.2.1
spark1.6.0对应hive版本是1.2.1
我这里安装的版本是:
spark1.6.0,hadoop2.6.0,hive 1.2.1,mysql5.6.35
先安装mysql(参考我的另一篇博客https://blog.csdn.net/qq_16563637/article/details/81774699)
上传hive安装包(apache-hive-1.2.1-bin.tar.gz)到有hive的容器(docker)或服务器上
docker cp (apache-hive-1.2.1-bin.tar.gz 容器名:/root/apache-hive-1.2.1-bin.tar.gz
进入容器
docker exec -it 容器名 /bin/bash
cd /root
解压缩
tar -zxf apache-hive-1.2.1-bin.tar.gz
重新命名
mv apache-hive-1.2.1-bin hive
修改配置文件
cd hive/conf
vi hive-site.xml
粘贴下面
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://mini1:3306/hive?createDatabaseIfNotExist=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
<description>username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>123456</value>
<description>password to use against metastore database</description>
</property>
</configuration>
保存
上传mysql的jar包到hive/lib中
我这里mysql 安装的是5.6.40
因此将mysql-connector-java-5.1.41.jar(mysql没有5.6的jar包5版本最多到5.1.46)
可以使用maven下载该jar包
mv mysql-connector-java-5.1.41.jar /home/hadoop/app/hive/lib/mysql-connector-java-5.1.41.jar
上传jar包到docker容器
docker cp mysql-connector-java-5.1.41.jar 容器名:/root/mysql-connector-java-5.1.41.jar
cp /root/mysql-connector-java-5.1.41.jar /root/hive/lib/mysql-connector-java-5.1.41.jar
修改hadoop的jline.jar包和hive的jline.jar包版本一致(不然hive启动会报错,这个过程hadoop不需要重启)
rm /usr/local/hadoop-2.6.0/share/hadoop/yarn/lib/jline-0.9.94.jar
cp /root/hive/lib/jline-2.12.jar /usr/local/hadoop-2.6.0/share/hadoop/yarn/lib/jline-2.12.jar
启动hive
cd /root/hive/bin
./hive
hive安装完成
2.将配置好的hive-site.xml放入$SPARK-HOME/conf目录下
cp /root/hive/conf/hive-site.xml /usr/local/spark-1.6.0-bin-hadoop2.6/conf/hive-site.xml
克隆会话启动hive
创建表person
show databases;
use default;
create table person (id int,name string,age int) row format delimited fields terminated by ‘,’;
克隆会话创建person
cd /root
vi person
1,zhangsan,18
2,lisi,19
3,wangwu,20
4,zhaoliu,21
保存
上传person到hdfs
hadoop fs -put person /user/hive/warehouse/person/person
3.克隆会话启动spark-shell时指定mysql连接驱动位置
spark-shell
–master spark://192.168.1.103:7077
–executor-memory 1g
–total-executor-cores 2
–driver-class-path /root/hive/lib/mysql-connector-java-5.1.41.jar
4.使用sqlContext.sql调用HQL
sqlContext.sql(“select * from default.person”)
sqlContext.sql(“select * from default.person limit 2”)
或使用org.apache.spark.sql.hive.HiveContext
import org.apache.spark.sql.hive.HiveContext
val hiveContext = new HiveContext(sc)
hiveContext.sql(“select * from default.person”)
spark-sql
–master spark://192.168.1.103:7077
–executor-memory 1g
–total-executor-cores 2
–driver-class-path /root/hive/lib/mysql-connector-java-5.1.41.jar
spark-sql兼容hive语法
克隆会话创建person
cd /root
vi person.txt
1,zhangsan,18
2,lisi,19
3,wangwu,20
4,zhaoliu,21
保存
上传person到hdfs
hadoop fs -put person.txt /person.txt
在spark-sql中输入下面命令
show databases;
use default;
create table person2 (id int,name string,age int) row format delimited fields terminated by ‘,’;
//docker 下不推荐使用
load data inpath “hdfs://192.168.1.103:9000/person.txt” into table person2;
select * from person order by age desc limit;
select * from person2