整合的目的
采用 SparkSql 与 hive 进行整合,通过 SparkSql 读取 hive 中表的元数据,
把 HiveHQL 底层采用 MapReduce 来处理任务,导致性能慢的特点,
改为更加强大的 Spark 引擎来进行相应的分析处理
一.环境搭建
将hive的配置文件hive-site.xml拷贝到spark的配置文件目录中
二.启动
1.通过spark-shell启动
[root@ant1 bin]# ./spark-shell --master local --jars ~/apps/mysql-connector-java-5.1.27-bin.jar
scala> spark.sql("show tables").show
20/03/07 09:59:18 WARN metastore.ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
+--------+---------+-----------+
|database|tableName|isTemporary|
+--------+---------+-----------+
| default| people| false|
| default| t1| false|
退出
scala> :quit
2.通过spark-sql启动
spark-sql --master local --jars apps/mysql-connector-java-5.1.27-bin.jar --driver-class-path apps/mysql-connector-java-5.1.27-bin.jar
spark-sql> show tables;
20/03/07 10:09:40 INFO metastore.HiveMetaStore: 0: get_database: global_temp
20/03/07 10:09:40 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=get_database: global_temp
20/03/07 10:09:40 WARN metastore.ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
20/03/07 10:09:40 INFO metastore.HiveMetaStore: 0: get_database: default
20/03/07 10:09:40 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=get_database: default
20/03/07 10:09:40 INFO metastore.HiveMetaStore: 0: get_database: default
20/03/07 10:09:40 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=get_database: default
20/03/07 10:09:40 INFO metastore.HiveMetaStore: 0: get_tables: db=default pat=*
20/03/07 10:09:40 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=get_tables: db=default pat=*
20/03/07 10:09:41 INFO codegen.CodeGenerator: Code generated in 357.03804 ms
default people false
default t1 false
Time taken: 3.708 seconds, Fetched 2 row(s)
20/03/07 10:09:41 INFO thriftserver.SparkSQLCLIDriver: Time taken: 3.708 seconds, Fetched 2 row(s)
spark-sql>
退出
exit;