SparkSql连接Hive

一、配置

1、主要pom.xml

<dependencies>
        <dependency>
            <groupId>org.scala-lang</groupId>
            <artifactId>scala-library</artifactId>
            <version>2.11.8</version>
        </dependency>
        <dependency>
            <groupId>org.scala-lang</groupId>
            <artifactId>scala-compiler</artifactId>
            <version>2.11.8</version>
        </dependency>
        <dependency>
            <groupId>org.scala-lang</groupId>
            <artifactId>scala-reflect</artifactId>
            <version>2.11.8</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.11</artifactId>
            <version>2.2.1</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_2.11</artifactId>
            <version>2.2.1</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming_2.11</artifactId>
            <version>2.2.1</version>
            <!--<scope>provided</scope>-->
        </dependency>
        
        <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-hive -->
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-hive_2.11</artifactId>
            <version>2.2.1</version>
        </dependency>
        
        <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-hive-thriftserver -->
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-hive-thriftserver_2.11</artifactId>
            <version>2.1.1</version>
            <scope>provided</scope>
        </dependency>
        <!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-client -->
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>2.7.3</version>
        </dependency>
    </dependencies>

2、将hive-site.xml、hdfs-site.xml、core-site.xml拷贝到resources下

在拷贝过来的hive-site.xml配置文件中,添加hive.metastore.uris属性:

    <property>
        <name>hive.metastore.uris</name>
        <value>thrift://master:9083</value>
        <description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>
    </property>

注意:如果不拷贝这些文件,也可以在api中使用此配置替代:

      .config("hive.metastore.uris","thrift://master:9083")
      .config("fs.defaultFS","hdfs://master:9000")

3、开启hive的metastore服务

[root@master bin]# ./hive --service metastore &

二、hive shell操作

1、新建立数据库

hive> create database db_hive_test;

2、此数据库中建表

hive> use db_hive_test;
OK
hive> create table student(id int,name string) row format delimited fields terminated by ',';
OK
Time taken: 0.233 seconds
hive> show tables;
OK
student
Time taken: 0.029 seconds, Fetched: 1 row(s)

3、为表造数据

[root@master conf]# cat /home/test/stu.txt 
001,xiaohong
002,xiaolan

hive> load data local inpath '/home/test/stu.txt' into table db_hive_test.student
    > ;
Loading data to table db_hive_test.student

hive> select * from student;
OK
1	xiaohong
2	xiaolan
Time taken: 2.885 seconds, Fetched: 2 row(s)

三、代码

package com.cn.sparkSql

import java.io.File

import org.apache.spark.sql.SparkSession

object SparkSql_Hive {
  def main(args: Array[String]): Unit = {
    val warehouseLocation = "/spark-warehouse01"
    //val warehouseLocation = new File("spark-warehouse").getAbsolutePath
    val spark = SparkSession
      .builder()
      .master("local[*]")
      .config("hive.metastore.uris","thrift://master:9083")
      .config("fs.defaultFS","hdfs://master:9000")
      .appName("Spark Hive Example")
      .config("spark.sql.warehouse.dir", warehouseLocation)
      .enableHiveSupport()
      .getOrCreate()
    spark.sparkContext.setLogLevel("WARN")
    spark.sql("select * from db_hive_test.student").show()
  }

}
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
20/03/18 11:44:20 INFO SparkContext: Running Spark version 2.2.1
20/03/18 11:44:21 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/03/18 11:44:21 INFO SparkContext: Submitted application: Spark Hive Example
20/03/18 11:44:21 INFO SecurityManager: Changing view acls to: haoyajun
20/03/18 11:44:21 INFO SecurityManager: Changing modify acls to: haoyajun
20/03/18 11:44:21 INFO SecurityManager: Changing view acls groups to: 
20/03/18 11:44:21 INFO SecurityManager: Changing modify acls groups to: 
20/03/18 11:44:21 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(haoyajun); groups with view permissions: Set(); users  with modify permissions: Set(haoyajun); groups with modify permissions: Set()
20/03/18 11:44:21 INFO Utils: Successfully started service 'sparkDriver' on port 64070.
20/03/18 11:44:21 INFO SparkEnv: Registering MapOutputTracker
20/03/18 11:44:21 INFO SparkEnv: Registering BlockManagerMaster
20/03/18 11:44:21 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
20/03/18 11:44:21 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
20/03/18 11:44:21 INFO DiskBlockManager: Created local directory at C:\Users\haoyajun\AppData\Local\Temp\blockmgr-cf1d2c4c-cc69-4c13-9610-28417f88609a
20/03/18 11:44:21 INFO MemoryStore: MemoryStore started with capacity 1992.0 MB
20/03/18 11:44:21 INFO SparkEnv: Registering OutputCommitCoordinator
20/03/18 11:44:21 INFO Utils: Successfully started service 'SparkUI' on port 4040.
20/03/18 11:44:21 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.230.1:4040
20/03/18 11:44:21 INFO Executor: Starting executor ID driver on host localhost
20/03/18 11:44:21 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 64112.
20/03/18 11:44:21 INFO NettyBlockTransferService: Server created on 192.168.230.1:64112
20/03/18 11:44:21 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
20/03/18 11:44:21 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.230.1, 64112, None)
20/03/18 11:44:21 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.230.1:64112 with 1992.0 MB RAM, BlockManagerId(driver, 192.168.230.1, 64112, None)
20/03/18 11:44:21 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.230.1, 64112, None)
20/03/18 11:44:21 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.230.1, 64112, None)
+---+--------+
| id|    name|
+---+--------+
|  1|xiaohong|
|  2| xiaolan|
+---+--------+

四、注意

1、如果不适用远程主机IP,注意在hosts里面配置映射;

2、错误一

Exception in thread "main" java.lang.IllegalArgumentException: Unable to instantiate SparkSession with Hive support because Hive classes are not found.
	at org.apache.spark.sql.SparkSession$Builder.enableHiveSupport(SparkSession.scala:854)
	at com.cn.sparkSql.SparkSql_Hive$.main(SparkSql_Hive.scala:24)
	at com.cn.sparkSql.SparkSql_Hive.main(SparkSql_Hive.scala)

分析:

      <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-hive_2.11</artifactId>
            <version>2.2.0</version>
            <scope>provided</scope>
        </dependency>

这个依赖的版本,要和spark版本一致:

 <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-hive -->
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-hive_2.11</artifactId>
            <version>2.2.1</version>
        </dependency>

3、错误二

org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
	at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1236)
	at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:174)
	at org.apache.hadoop.hive.ql.metadata.Hive.<clinit>(Hive.java:166)
	at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503)
	at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:191)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264)
	at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:362)
	at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:266)
	at org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:66)
	at org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:65)
	at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply$mcZ$sp(HiveExternalCatalog.scala:195)
	at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:195)
	at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:195)
	at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
	at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:194)
	at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:105)
	at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:93)
	at org.apache.spark.sql.hive.HiveSessionStateBuilder.externalCatalog(HiveSessionStateBuilder.scala:39)
	at org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog$lzycompute(HiveSessionStateBuilder.scala:54)
	at org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:52)
	at org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:35)
	at org.apache.spark.sql.internal.BaseSessionStateBuilder.build(BaseSessionStateBuilder.scala:289)
	at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1059)
	at org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:137)
	at org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:136)
	at scala.Option.getOrElse(Option.scala:121)
	at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:136)
	at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:133)
	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:632)
	at com.cn.sparkSql.SparkSql_Hive$.main(SparkSql_Hive.scala:21)
	at com.cn.sparkSql.SparkSql_Hive.main(SparkSql_Hive.scala)
Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
	at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1523)
	at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:86)
	at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132)
	at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
	at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3005)
	at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3024)
	at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1234)
	... 34 more
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)

分析:

(1)没有开启metastore服务;

(2)远程服务器的IP,或者映射是否写对;

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

郝少

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值