Spark & Hive集成
代码
- 修改hive-site.xml
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://CentOS:3306/hive?createDatabaseIfNotExist=true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>root</value>
</property>
<!--开启MetaStore服务,用于Spark读取hive中的元数据-->
<property>
<name>hive.metastore.uris</name>
<value>thrift://CentOS:9083</value>
</property>
<property>
<name>hive.metastore.local</name>
<value>false</value>
</property>
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
</property>
- 启动metastore服务
[root@CentOS apache-hive-1.2.2-bin]# ./bin/hive --service metastore >/dev/null 2>&1 &
[1] 55017
- 导入以下依赖
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.4.5</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-hive -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.11</artifactId>
<version>2.4.5</version>
</dependency>
- 编写如下代码
//配置spark
val spark = SparkSession.builder()
.appName("Spark Hive Example")
.master("local[*]")
.config("hive.metastore.uris", "thrift://CentOS:9083")
.enableHiveSupport() //启动hive支持
.getOrCreate()
spark.sql("show databases").show()
spark.sql("use baizhi")
spark.sql("select * from t_emp").na.fill(0.0).show()
spark.close()
+-----+------+---------+----+-------------------+-------+-------+------+
|empno| ename| job| mgr| hiredate| sal| comm|deptno|
+-----+------+---------+----+-------------------+-------+-------+------+
| 7369| SMITH| CLERK|7902|1980-12-17 00:00:00| 800.00| 0.00| 20|
| 7499| ALLEN| SALESMAN|7698|1981-02-20 00:00:00|1600.00| 300.00| 30|
| 7521| WARD| SALESMAN|7698|1981-02-22 00:00:00|1250.00| 500.00| 30|
| 7566| JONES| MANAGER|7839|1981-04-02 00:00:00|2975.00| 0.00| 20|
| 7654|MARTIN| SALESMAN|7698|1981-09-28 00:00:00|1250.00|1400.00| 30|
| 7698| BLAKE| MANAGER|7839|1981-05-01 00:00:00|2850.00| 0.00| 30|
| 7782| CLARK| MANAGER|7839|1981-06-09 00:00:00|2450.00| 0.00| 10|
| 7788| SCOTT| ANALYST|7566|1987-04-19 00:00:00|1500.00| 0.00| 20|
| 7839| KING|PRESIDENT| 0|1981-11-17 00:00:00|5000.00| 0.00| 10|
| 7844|TURNER| SALESMAN|7698|1981-09-08 00:00:00|1500.00| 0.00| 30|
| 7876| ADAMS| CLERK|7788|1987-05-23 00:00:00|1100.00| 0.00| 20|
| 7900| JAMES| CLERK|7698|1981-12-03 00:00:00| 950.00| 0.00| 30|
| 7902| FORD| ANALYST|7566|1981-12-03 00:00:00|3000.00| 0.00| 20|
| 7934|MILLER| CLERK|7782|1982-01-23 00:00:00|1300.00| 0.00| 10|
+-----+------+---------+----+-------------------+-------+-------+------+
交互
1、需要将spark-hive_2.11-2.4.5.jar、spark-hive-thriftserver_2.11-2.4.5.jar拷贝到spark的jar目录,重启spark
jiar
链接:https://pan.baidu.com/s/1nkKesJyRitfvjO7bINlHdA
提取码:9nkb
2、将hive-site.xml文件拷贝到spark的conf目录下
3、需要将Hive的jar的类路径配置到hadoop的类路径下
就是Hive中的lib下的所有jar
SPARK_HOME=/usr/spark-2.4.5
KE_HOME=/usr/kafka-eagle
M2_HOME=/usr/apache-maven-3.6.3
SQOOP_HOME=/usr/sqoop-1.4.7
HIVE_HOME=/usr/apache-hive-1.2.2-bin
JAVA_HOME=/usr/java/latest
HADOOP_HOME=/usr/hadoop-2.9.2/
HBASE_HOME=/usr/hbase-1.2.4/
ZOOKEEPER_HOME=/usr/zookeeper-3.4.6
PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HBASE_HOME/bin:$M2_HOME/bin:$HIVE_HOME/bin:$SQOOP_HOME/bin:$ZOOKEEPER_HOME/bin:$KE_HOME/bin:$SPARK_HOME/bin:$SPARK_HOME/sbin
CLASSPATH=.
export JAVA_HOME
export PATH
export CLASSPATH
export HADOOP_HOME
export HBASE_HOME
HBASE_CLASSPATH=$(/usr/hbase-1.2.4/bin/hbase classpath)
HADOOP_CLASSPATH=/root/mysql-connector-java-5.1.49.jar:/usr/spark-2.4.5/jars/spark-hive_2.11-2.4.5.jar:/usr/spark-2.4.5/jars/spark-hive-thriftserver_2.11-2.4.5.jar:$HIVE_HOME/lib/*
export HADOOP_CLASSPATH
export M2_HOME
export HIVE_HOME
export SQOOP_HOME
export ZOOKEEPER_HOME
export KE_HOME
export SPARK_HOME
4、执行如下指令
[root@CentOS spark-2.4.5]# ./bin/spark-sql --master spark://CentOS:7077 --total-executor-cores 6 --packages org.apache.spark:spark-hive-thriftserver_2.11:2.4.5
.
spark-sql> show databases;
20/11/04 12:06:33 INFO codegen.CodeGenerator: Code generated in 748.341192 ms
baizhi
default
test
Time taken: 5.818 seconds, Fetched 3 row(s)