spark sql读取hive底层_SparkSQL读取hive数据本地idea运行的方法详解

本文介绍了如何在IDEA环境下使用SparkSQL读取Hive数据。首先,需要配置Hadoop(2.6.5)、Spark(2.3.0)、Hive(1.2.2)的环境,并在pom.xml中添加相关依赖。确保hive-site.xml配置文件在工程resources目录下,包含Hive元数据URI等配置。主类代码展示了如何创建SparkSession并展示Hive的数据库和表。遇到`AnalysisException`错误时,可通过设置HADOOP_HOME环境变量解决。
摘要由CSDN通过智能技术生成

环境准备:

hadoop版本:2.6.5

spark版本:2.3.0

hive版本:1.2.2

master主机:192.168.100.201

slave1主机:192.168.100.201

pom.xml依赖如下:

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">

4.0.0

com.spark

spark_practice

1.0-SNAPSHOT

UTF-8

1.8

1.8

2.3.0

junit

junit

4.11

test

org.apache.spark

spark-core_2.11

${spark.core.version}

org.apache.spark

spark-sql_2.11

${spark.core.version}

mysql

mysql-connector-java

5.1.38

org.apache.spark

spark-hive_2.11

2.3.0

注意:一定要将hive-site.xml配置文件放到工程resources目录下

hive-site.xml配置如下:

hive.metastore.uris

thrift://192.168.100.201:9083

hive.server2.thrift.port

10000

javax.jdo.option.ConnectionURL

jdbc:mysql://node01:3306/hive?createDatabaseIfNotExist=true

javax.jdo.option.ConnectionDriverName

com.mysql.jdbc.Driver

javax.jdo.option.ConnectionUserName

root

javax.jdo.option.ConnectionPassword

123456

hive.zookeeper.quorum

node01,node02,node03

hbase.zookeeper.quorum

node01,node02,node03

hive.metastore.warehouse.dir

/user/hive/warehouse

fs.defaultFS

hdfs://192.168.100.201:9000

hive.metastore.schema.verification

false

datanucleus.autoCreateSchema

true

datanucleus.autoStartMechanism

checked

主类代码:

import org.apache.spark.sql.SparkSession

object SparksqlTest2 {

def main(args: Array[String]): Unit = {

val spark: SparkSession = SparkSession

.builder

.master("local[*]")

.appName("Java Spark Hive Example")

.enableHiveSupport

.getOrCreate

spark.sql("show databases").show()

spark.sql("show tables").show()

spark.sql("select * from person").show()

spark.stop()

}

}

前提:数据库访问的是default,表person中有三条数据。

测试前先确保hadoop集群正常启动,然后需要启动hive的metastore服务。

./bin/hive --service metastore

运行,结果如下:

如果报错:

Exception in thread "main" org.apache.spark.sql.AnalysisException: java.lang.RuntimeException: java.io.IOException: (null) entry in command string: null chmod 0700 C:\Users\dell\AppData\Local\Temp\c530fb25-b267-4dd2-b24d-741727a6fbf3_resources;

at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)

at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:194)

at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:114)

at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:102)

at org.apache.spark.sql.hive.HiveSessionStateBuilder.externalCatalog(HiveSessionStateBuilder.scala:39)

at org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog$lzycompute(HiveSessionStateBuilder.scala:54)

at org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:52)

at org.apache.spark.sql.hive.HiveSessionStateBuilder$$anon$1.(HiveSessionStateBuilder.scala:69)

at org.apache.spark.sql.hive.HiveSessionStateBuilder.analyzer(HiveSessionStateBuilder.scala:69)

at org.apache.spark.sql.internal.BaseSessionStateBuilder$$anonfun$build$2.apply(BaseSessionStateBuilder.scala:293)

at org.apache.spark.sql.internal.BaseSessionStateBuilder$$anonfun$build$2.apply(BaseSessionStateBuilder.scala:293)

at org.apache.spark.sql.internal.SessionState.analyzer$lzycompute(SessionState.scala:79)

at org.apache.spark.sql.internal.SessionState.analyzer(SessionState.scala:79)

at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:57)

at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55)

at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47)

at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:74)

at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:638)

at com.tongfang.learn.spark.hive.HiveTest.main(HiveTest.java:15)

解决:

1.下载hadoop windows binary包,链接:https://github.com/steveloughran/winutils

2.在启动类的运行参数中设置环境变量,HADOOP_HOME=D:\winutils\hadoop-2.6.4,后面是hadoop windows 二进制包的目录。

到此这篇关于SparkSQL读取hive数据本地idea运行的方法详解的文章就介绍到这了,更多相关SparkSQL读取hive数据本地idea运行内容请搜索脚本之家以前的文章或继续浏览下面的相关文章希望大家以后多多支持脚本之家!

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值