一、配置
1、主要pom.xml
<dependencies>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.11.8</version>
</dependency>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-compiler</artifactId>
<version>2.11.8</version>
</dependency>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-reflect</artifactId>
<version>2.11.8</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.2.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.2.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.11</artifactId>
<version>2.2.1</version>
<!--<scope>provided</scope>-->
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-hive -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.11</artifactId>
<version>2.2.1</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-hive-thriftserver -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive-thriftserver_2.11</artifactId>
<version>2.1.1</version>
<scope>provided</scope>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-client -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.7.3</version>
</dependency>
</dependencies>
2、将hive-site.xml、hdfs-site.xml、core-site.xml拷贝到resources下
在拷贝过来的hive-site.xml配置文件中,添加hive.metastore.uris属性:
<property>
<name>hive.metastore.uris</name>
<value>thrift://master:9083</value>
<description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>
</property>
注意:如果不拷贝这些文件,也可以在api中使用此配置替代:
.config("hive.metastore.uris","thrift://master:9083")
.config("fs.defaultFS","hdfs://master:9000")
3、开启hive的metastore服务
[root@master bin]# ./hive --service metastore &
二、hive shell操作
1、新建立数据库
hive> create database db_hive_test;
2、此数据库中建表
hive> use db_hive_test;
OK
hive> create table student(id int,name string) row format delimited fields terminated by ',';
OK
Time taken: 0.233 seconds
hive> show tables;
OK
student
Time taken: 0.029 seconds, Fetched: 1 row(s)
3、为表造数据
[root@master conf]# cat /home/test/stu.txt
001,xiaohong
002,xiaolan
hive> load data local inpath '/home/test/stu.txt' into table db_hive_test.student
> ;
Loading data to table db_hive_test.student
hive> select * from student;
OK
1 xiaohong
2 xiaolan
Time taken: 2.885 seconds, Fetched: 2 row(s)
三、代码
package com.cn.sparkSql
import java.io.File
import org.apache.spark.sql.SparkSession
object SparkSql_Hive {
def main(args: Array[String]): Unit = {
val warehouseLocation = "/spark-warehouse01"
//val warehouseLocation = new File("spark-warehouse").getAbsolutePath
val spark = SparkSession
.builder()
.master("local[*]")
.config("hive.metastore.uris","thrift://master:9083")
.config("fs.defaultFS","hdfs://master:9000")
.appName("Spark Hive Example")
.config("spark.sql.warehouse.dir", warehouseLocation)
.enableHiveSupport()
.getOrCreate()
spark.sparkContext.setLogLevel("WARN")
spark.sql("select * from db_hive_test.student").show()
}
}
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
20/03/18 11:44:20 INFO SparkContext: Running Spark version 2.2.1
20/03/18 11:44:21 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/03/18 11:44:21 INFO SparkContext: Submitted application: Spark Hive Example
20/03/18 11:44:21 INFO SecurityManager: Changing view acls to: haoyajun
20/03/18 11:44:21 INFO SecurityManager: Changing modify acls to: haoyajun
20/03/18 11:44:21 INFO SecurityManager: Changing view acls groups to:
20/03/18 11:44:21 INFO SecurityManager: Changing modify acls groups to:
20/03/18 11:44:21 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(haoyajun); groups with view permissions: Set(); users with modify permissions: Set(haoyajun); groups with modify permissions: Set()
20/03/18 11:44:21 INFO Utils: Successfully started service 'sparkDriver' on port 64070.
20/03/18 11:44:21 INFO SparkEnv: Registering MapOutputTracker
20/03/18 11:44:21 INFO SparkEnv: Registering BlockManagerMaster
20/03/18 11:44:21 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
20/03/18 11:44:21 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
20/03/18 11:44:21 INFO DiskBlockManager: Created local directory at C:\Users\haoyajun\AppData\Local\Temp\blockmgr-cf1d2c4c-cc69-4c13-9610-28417f88609a
20/03/18 11:44:21 INFO MemoryStore: MemoryStore started with capacity 1992.0 MB
20/03/18 11:44:21 INFO SparkEnv: Registering OutputCommitCoordinator
20/03/18 11:44:21 INFO Utils: Successfully started service 'SparkUI' on port 4040.
20/03/18 11:44:21 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.230.1:4040
20/03/18 11:44:21 INFO Executor: Starting executor ID driver on host localhost
20/03/18 11:44:21 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 64112.
20/03/18 11:44:21 INFO NettyBlockTransferService: Server created on 192.168.230.1:64112
20/03/18 11:44:21 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
20/03/18 11:44:21 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.230.1, 64112, None)
20/03/18 11:44:21 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.230.1:64112 with 1992.0 MB RAM, BlockManagerId(driver, 192.168.230.1, 64112, None)
20/03/18 11:44:21 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.230.1, 64112, None)
20/03/18 11:44:21 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.230.1, 64112, None)
+---+--------+
| id| name|
+---+--------+
| 1|xiaohong|
| 2| xiaolan|
+---+--------+
四、注意
1、如果不适用远程主机IP,注意在hosts里面配置映射;
2、错误一
Exception in thread "main" java.lang.IllegalArgumentException: Unable to instantiate SparkSession with Hive support because Hive classes are not found.
at org.apache.spark.sql.SparkSession$Builder.enableHiveSupport(SparkSession.scala:854)
at com.cn.sparkSql.SparkSql_Hive$.main(SparkSql_Hive.scala:24)
at com.cn.sparkSql.SparkSql_Hive.main(SparkSql_Hive.scala)
分析:
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.11</artifactId>
<version>2.2.0</version>
<scope>provided</scope>
</dependency>
这个依赖的版本,要和spark版本一致:
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-hive -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.11</artifactId>
<version>2.2.1</version>
</dependency>
3、错误二
org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1236)
at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:174)
at org.apache.hadoop.hive.ql.metadata.Hive.<clinit>(Hive.java:166)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503)
at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:191)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264)
at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:362)
at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:266)
at org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:66)
at org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:65)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply$mcZ$sp(HiveExternalCatalog.scala:195)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:195)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:195)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:194)
at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:105)
at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:93)
at org.apache.spark.sql.hive.HiveSessionStateBuilder.externalCatalog(HiveSessionStateBuilder.scala:39)
at org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog$lzycompute(HiveSessionStateBuilder.scala:54)
at org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:52)
at org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:35)
at org.apache.spark.sql.internal.BaseSessionStateBuilder.build(BaseSessionStateBuilder.scala:289)
at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1059)
at org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:137)
at org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:136)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:136)
at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:133)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:632)
at com.cn.sparkSql.SparkSql_Hive$.main(SparkSql_Hive.scala:21)
at com.cn.sparkSql.SparkSql_Hive.main(SparkSql_Hive.scala)
Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1523)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:86)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3005)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3024)
at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1234)
... 34 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
分析:
(1)没有开启metastore服务;
(2)远程服务器的IP,或者映射是否写对;