原生的spark 连接hive表可以直接通过thrift服务连接操作hive
HDP和CDH最新版本都封装hive3不能直接使用thrift操作hive只能查看hive元数据
Hdp3连接要配置3项
先用spark-shell 测试
spark-shell --master yarn
–jars /usr/hdp/3.1.5.0-152/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.3.1.5.0-152.jar
–conf spark.security.credentials.hiveserver2.enabled=false
–conf spark.sql.hive.hiveserver2.jdbc.url=“jdbc:hive2://node1:2181,node2:2181,node3:2181,node4:2181,node5:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-interactive”
–conf spark.hadoop.hive.llap.daemon.service.hosts=@llap0
–conf spark.datasource.hive.warehouse.load.staging.dir=/tmp
–conf spark.datasource.hive.warehouse.metastoreUri=“thrift://node2:9083”
–conf spark.hadoop.hive.zookeeper.quorum=“node1:2181,node2:2181,node3:2181,node4:2181,node5:2181”
import com.hortonworks.hwc.HiveWarehouseSession
val hive = HiveWarehouseSession.session(spark).build()
hive.setDatabase(“test_db”)
val df = hive.executeQuery(“select * from test_acid”)
df.show(20)
测试成功后
使用代码连接
以前导入spark-sql ,spark-hive就可以使用
现在根据上面的spark-shell必须使用hdp开发的hive-warehouse-connector jar包
通过这个引入jar包后
运行代码报错 没有相关方法shutdownhook。。。
这时候无论怎么修改依赖版本都有这个报错
最后我引入集群spark 所有jar包后这个报错消失
发现新的报错 不知道主机 host.name
很明显是域名映射的问题
发现打印
INFO ZooKeeper: Client environment:host.name=DESKTOP-7HQKIT8
说明给ZooKeeper交互的是一个域名而集群无法识别这个域名
我的方法是在集群添加映射自己IP 域名
最后因为需要远程 需要关闭自己的防火墙
成功
最后需要提交代码到集群
spark-submit --master yarn
–conf spark.driver.extraClassPath=/usr/hdp/3.1.5.0-152/spark2/jars/*
–class SparkSQL /tmp/sparksql.jar
我把hive-warehouse-connector-assembly-1.0.0.3.1.5.0-152.jar统一放入spark2/jars/
成功
Demo
import java.util.concurrent.TimeUnit
import com.hortonworks.hwc.HiveWarehouseSession
import com.hortonworks.hwc.HiveWarehouseSession._
import org.apache.spark.SparkConf
import org.apache.spark.sql.SparkSession
/** Property Description Comments
*spark.sql.hive.hiveserver2.jdbc.url URL for HiveServer2 Interactive In Ambari, copy the value from Services > Hive > Summary > HIVESERVER2 INTERACTIVE JDBC URL.
*spark.datasource.hive.warehouse.metastoreUri URI for metastore Copy the value from hive.metastore.uris. For example, thrift://mycluster-1.com:9083.
*spark.datasource.hive.warehouse.load.staging.dir HDFS temp directory for batch writes to Hive For example, /tmp.
*spark.hadoop.hive.llap.daemon.service.hosts Application name for LLAP service Copy value from Advanced hive-interactive-site > hive.llap.daemon.service.hosts.
*spark.hadoop.hive.zookeeper.quorum Zookeeper hosts used by LLAP Copy value from Advanced hive-sitehive.zookeeper.quorum.
*
*
- /
object SparkSQL {
// org.apache.hadoop.hive.llap.LlapBaseInputFormat
// org.apache.spark.sql.catalyst.catalog.SessionCatalog()
//org.apache.hive.common.util.ShutdownHookManager.addShutdownHook(Runnable a,TimeUnit a)
def main(args: Array[String]): Unit = {
System.setProperty(“HADOOP_USER_NAME”, “hdfs”)
val sparkConfiguration = new SparkConf
sparkConfiguration.set(“spark.sql.hive.hiveserver2.jdbc.url”, “jdbc:hive2://node1:2181,node2:2181,node3:2181,node4:2181,node5:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-interactive”)
sparkConfiguration.set(“spark.datasource.hive.warehouse.metastoreUri”, “thrift://node2:9083”)
sparkConfiguration.set(“spark.datasource.hive.warehouse.load.staging.dir”, “hdfs://node1:8020/tmp”)
sparkConfiguration.set(“spark.hadoop.hive.llap.daemon.service.hosts”, “@llap0”)
sparkConfiguration.set(“spark.hadoop.hive.zookeeper.quorum”, “node1:2181,node2:2181,node3:2181,node4:2181,node5:2181”)
// sparkConfiguration.set(“spark.sql.hive.metastore.version”, “2.3.3”)
val spark = SparkSession.builder().appName(“spark2Test”)
.config(sparkConfiguration)
.master("local[]")
.enableHiveSupport
.getOrCreate()
// spark.sql(“show databases”).show()
// spark.sql(“show tables”).show()
val hive = HiveWarehouseSession.session(spark).build()
hive.setDatabase(“test_db”)
val df = hive.executeQuery(“select * from test_acid3”)
df.show(20)
df.write.
format(“com.hortonworks.spark.sql.hive.llap.HiveWarehouseConnector”)
.mode(“append”).option(“table”, “test_acid2”).save
hive.close()
spark.close()
spark.stop()
println(">>>>>>>> close <<<<<<<<<<")
System.exit(0)
}
}
HiveWarehouseSession 不会停止 这里我用 System.exit(0)来强制停止
官网说明
https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.0.1/integrating-hive/content/hive_hivewarehouseconnector_for_handling_apache_spark_data.html