Spark与hive集成、Hive On Spark 、使用Spark SQL进行数据查询配置流程

本文主要是介绍在开源hadoop上使用Spark SQL进行数据查询。
有关本文的各组件版本如下:
1、hadoop版本

[root@hadoop01 ~]# hadoop version
Hadoop 2.7.7

2、hive版本

hive 2.1.1

3、Spark和scala版本

Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.4.7
      /_/
         
Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_161)

注意:根据其他文章,需要测试一下电脑上已经安装的Spark版本是否支持Hive(大神文章](http://dblab.xmu.edu.cn/blog/1086-2/))
使用spark-shell,然后在scala命令提示符下输入:

scala> import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.sql.hive.HiveContext

这种输出代表支持hive。

配置流程:
1、把apache-hive-2.1.1-bin/conf/hive-site.xml复制到spark目录下的conf中,修改以下内容
hive.metastore.uris
thrift://hadoop01:9083
2、启动hive service metastore服务

nohup hive --service metastore &

注意有无报错,
如有以下报错,则是因为端口被占用

org.apache.thrift.transport.TTransportException: Could not create ServerSocket on address 0.0.0.0/0.0.0.0:9083.

先把进程杀掉,在重新启动。

[root@hadoop01 conf]# netstat -apn|grep 9083
tcp        0      0 0.0.0.0:9083            0.0.0.0:*               LISTEN      5482/java           
[root@hadoop01 conf]# kill -9 5482

3、启动spark-sql

[root@hadoop01 bin]# pwd
/usr/local/software/spark/bin




[root@hadoop01 bin]# ./spark-sql 
21/03/29 05:39:01 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
log4j:WARN No appenders could be found for logger (org.apache.hadoop.conf.Configuration).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
21/03/29 05:39:02 INFO SparkContext: Running Spark version 2.4.7
21/03/29 05:39:02 INFO SparkContext: Submitted application: SparkSQL::192.168.83.3
21/03/29 05:39:02 INFO SecurityManager: Changing view acls to: root
21/03/29 05:39:02 INFO SecurityManager: Changing modify acls to: root
21/03/29 05:39:02 INFO SecurityManager: Changing view acls groups to: 
21/03/29 05:39:02 INFO SecurityManager: Changing modify acls groups to: 
21/03/29 05:39:02 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
21/03/29 05:39:02 INFO Utils: Successfully started service 'sparkDriver' on port 44255.
21/03/29 05:39:02 INFO SparkEnv: Registering MapOutputTracker
21/03/29 05:39:02 INFO SparkEnv: Registering BlockManagerMaster
21/03/29 05:39:02 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
21/03/29 05:39:02 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
21/03/29 05:39:02 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-805d97c7-63cd-40ab-93d7-1c685b3abc9e
21/03/29 05:39:02 INFO MemoryStore: MemoryStore started with capacity 413.9 MB
21/03/29 05:39:02 INFO SparkEnv: Registering OutputCommitCoordinator
21/03/29 05:39:03 INFO Utils: Successfully started service 'SparkUI' on port 4040.
21/03/29 05:39:03 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://hadoop01:4040
21/03/29 05:39:03 INFO Executor: Starting executor ID driver on host localhost
21/03/29 05:39:03 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 37256.
21/03/29 05:39:03 INFO NettyBlockTransferService: Server created on hadoop01:37256
21/03/29 05:39:03 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
21/03/29 05:39:03 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, hadoop01, 37256, None)
21/03/29 05:39:03 INFO BlockManagerMasterEndpoint: Registering block manager hadoop01:37256 with 413.9 MB RAM, BlockManagerId(driver, hadoop01, 37256, None)
21/03/29 05:39:03 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, hadoop01, 37256, None)
21/03/29 05:39:03 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, hadoop01, 37256, None)
21/03/29 05:39:03 INFO SharedState: loading hive config file: file:/usr/local/software/spark/conf/hive-site.xml
21/03/29 05:39:03 INFO SharedState: spark.sql.warehouse.dir is not set, but hive.metastore.warehouse.dir is set. Setting spark.sql.warehouse.dir to the value of hive.metastore.warehouse.dir ('/user/hive/warehouse').
21/03/29 05:39:03 INFO SharedState: Warehouse path is '/user/hive/warehouse'.
21/03/29 05:39:04 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
21/03/29 05:39:04 INFO HiveUtils: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
21/03/29 05:39:04 INFO HiveClientImpl: Warehouse location for Hive client (version 1.2.2) is /user/hive/warehouse
Spark master: local[*], Application Id: local-1617010743159
21/03/29 05:39:04 INFO SparkSQLCLIDriver: Spark master: local[*], Application Id: local-1617010743159
spark-sql>

出现

spark-sql>

表示已经进去spark,此时可以查询数据。
如下:
显示数据库:

spark-sql> show databases;
21/03/29 05:42:24 INFO CodeGenerator: Code generated in 172.263094 ms
dblab
default
Time taken: 1.68 seconds, Fetched 2 row(s)
21/03/29 05:42:24 INFO SparkSQLCLIDriver: Time taken: 1.68 seconds, Fetched 2 row(s)
spark-sql>

显示表:

spark-sql> show tables;
21/03/29 05:43:16 INFO ContextCleaner: Cleaned accumulator 1
21/03/29 05:43:16 INFO ContextCleaner: Cleaned accumulator 2
21/03/29 05:43:16 INFO CodeGenerator: Code generated in 20.374405 ms
dblab	sstest1	false
dblab	testtab1	false
dblab	testtab1_copy	false
Time taken: 0.071 seconds, Fetched 3 row(s)
21/03/29 05:43:16 INFO SparkSQLCLIDriver: Time taken: 0.071 seconds, Fetched 3 row(s)
spark-sql>

本文参考以下文章
感谢大神

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值