SparkSQL(三)-----------操作Hive

一.内嵌的 HIVE

如果使用 Spark 内嵌的 Hive, 则什么都不用做, 直接使用即可.
Hive 的元数据存储在 derby 中,默认仓库地址:$SPARK_HOME/spark-warehouse

(1)查表

scala> spark.sql("show tables").show
+--------+---------+-----------+
|database|tableName|isTemporary|
+--------+---------+-----------+
+--------+---------+-----------+

(2)使用临时的表

scala> val df=spark.read.json("file:///usr/hadoop/spark-2.1.1/bin/data/user.json")
df: org.apache.spark.sql.DataFrame = [age: bigint, username: string]

scala> df.create
createGlobalTempView   createOrReplaceTempView   createTempView

scala> df.createOrReplaceTempView("user")

scala> spark.sql("show tables").show
+--------+---------+-----------+
|database|tableName|isTemporary|
+--------+---------+-----------+
|        |     user|       true|
+--------+---------+-----------+

(2)自己建表

scala> spark.sql("create table user(id int)")

(3)向表加载本地数据
在spark安装目录data下创建user.txt.输入如下:1 2 3 4

scala> spark.sql("load data local inpath '/usr/hadoop/spark-2.1.1/data/user.txt' into table user")
21/02/03 11:21:58 ERROR hdfs.KeyProviderCache: Could not find uri with key [dfs.encryption.key.provider.uri] to create a keyProvider !!
res3: org.apache.spark.sql.DataFrame = []

scala> spark.sql("select * from user").show
+---+
| id|
+---+
|  1|
|  2|
|  3|
|  4|
+---+

scala> spark.sql("show tables").show
+--------+---------+-----------+
|database|tableName|isTemporary|
+--------+---------+-----------+
| default|     user|      false|
|        |     user|       true|
+--------+---------+-----------+

二.外部的 HIVE

如果想连接外部已经部署好的 Hive,需要通过以下几个步骤:
➢ Spark 要接管 Hive 需要把 hive-site.xml 拷贝到 conf/目录下
➢ 把 Mysql的驱动(mysql-connector-java-xxxx-bin.jar) copy 到 Spark的jars/目录下。(如果自己之前装过mysql和hive,这个驱动可以在hive的安装目录下lib中找到)
➢ 如果访问不到 hdfs,则需要把 core-site.xml 和 hdfs-site.xml 拷贝到 conf/目录下
➢ 重启 spark-shell

1.启动错误

我自己在启动时第一次是失败了的,没有加载到hive。启动的过程和之前启动spark-shell信息一样,我注意到如下警告

Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
21/02/03 13:02:06 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/02/03 13:02:18 WARN metastore.ObjectStore: Failed to get database global_temp, returning NoSuchObjectException

我按照网上方法
Spark安装目录下的conf/spark-env.sh引入JAVA_LIBRAY_PATH

export LD_LIBRARY_PATH=$JAVA_LIBRARY_PATH

此处$JAVA_LIBRARY_PATH为hadoop安装目录下的lib/native目录
详情参考:https://blog.csdn.net/someby/article/details/82930140

或者是因为我没有开启hive元数据服务 hive --service metastore
我就进行了这两个操作,不确定是哪个引起的之后执行spark-shell后显示的警告信息就变了,然后连接hive成功

Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
21/02/03 13:11:01 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
21/02/03 13:11:01 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException
21/02/03 13:11:02 WARN metastore.ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
Spark context Web UI available at http://192.168.137.128:4040
Spark context available as 'sc' (master = local[*], app id = local-1612329052731).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.1.1
      /_/

注意检查是否开启如下:

  • 开启hive元数据服务:hive --service metastore
  • 开启hadoop服务:sh $HADOOP_HOME/sbin/start-all.sh
  • 开启spark服务:sh $SPARK_HOME/sbin/start-all.sh

2.操作hive

scala> spark.sql("show tables").show
+--------+---------+-----------+
|database|tableName|isTemporary|
+--------+---------+-----------+
+--------+---------+-----------+

scala> import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.sql.hive.HiveContext

scala> val hiveContext = new HiveContext(sc)
warning: there was one deprecation warning; re-run with -deprecation for details
hiveContext: org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext@ac6da02

scala> sqlContext.sql("create table user(id int)")
21/02/03 13:28:31 WARN metastore.HiveMetaStore: Location: file:/usr/hadoop/spark-2.1.1/spark-warehouse/user specified for non-external table:user
res3: org.apache.spark.sql.DataFrame = []

scala> spark.sql("show tables").show
+--------+---------+-----------+
|database|tableName|isTemporary|
+--------+---------+-----------+
| default|     user|      false|
+--------+---------+-----------+


scala> spark.sql("show databases").show
+------------+
|databaseName|
+------------+
|     default|
+------------+


三.IDEA操作HIVE

1.导入依赖

我的版本是hive 2.3.5 spark2.1.1


    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-hive_2.11</artifactId>
      <version>2.1.1</version>
    </dependency>
    <dependency>
      <groupId>org.apache.hive</groupId>
      <artifactId>hive-exec</artifactId>
      <version>1.2.1</version>
    </dependency>
    <dependency>
      <groupId>mysql</groupId>
      <artifactId>mysql-connector-java</artifactId>
      <version>5.1.32</version>
    </dependency>

2.文件拷贝

最好将这三个文件分别都拷入resources和classes下
在这里插入图片描述

3. 代码编写

注意:在开发工具中创建数据库默认是在本地仓库,通过参数修改数据库仓库的地址:
config(“spark.sql.warehouse.dir”, “hdfs://linux1:8020/user/hive/warehouse”)

import org.apache.spark.SparkConf
import org.apache.spark.sql._

object SparkSQL_Hive {

def main(args: Array[String]): Unit = {
        System.setProperty("HADOOP_USER_NAME", "root")
        // TODO 创建SparkSQL的运行环境
  val sparkConf = new SparkConf().setMaster("local[*]").setAppName("sparkSQL")
  val spark = SparkSession.builder().enableHiveSupport().config(sparkConf).getOrCreate()
 spark.sql("show tables").show
 // TODO 关闭环境
   spark.close()
    }
}

如果报错:

Exception in thread “main” java.lang.IllegalArgumentException: Error
while instantiating ‘org.apache.spark.sql.hive.HiveSessionState

参考我的上篇博文:https://blog.csdn.net/smallworldxyl/article/details/113608967
如果在执行操作时,出现如下错误:
在这里插入图片描述可以代码最前面增加如下代码解决:

System.setProperty("HADOOP_USER_NAME", "root")

此处的 root 改为你们自己的 hadoop 用户名称

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值