spark 连接HDP hive ACID

原生的spark 连接hive表可以直接通过thrift服务连接操作hive
HDP和CDH最新版本都封装hive3不能直接使用thrift操作hive只能查看hive元数据
Hdp3连接要配置3项

先用spark-shell 测试

spark-shell --master yarn
–jars /usr/hdp/3.1.5.0-152/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.3.1.5.0-152.jar
–conf spark.security.credentials.hiveserver2.enabled=false
–conf spark.sql.hive.hiveserver2.jdbc.url=“jdbc:hive2://node1:2181,node2:2181,node3:2181,node4:2181,node5:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-interactive”
–conf spark.hadoop.hive.llap.daemon.service.hosts=@llap0
–conf spark.datasource.hive.warehouse.load.staging.dir=/tmp
–conf spark.datasource.hive.warehouse.metastoreUri=“thrift://node2:9083”
–conf spark.hadoop.hive.zookeeper.quorum=“node1:2181,node2:2181,node3:2181,node4:2181,node5:2181”

import com.hortonworks.hwc.HiveWarehouseSession
val hive = HiveWarehouseSession.session(spark).build()
hive.setDatabase(“test_db”)
val df = hive.executeQuery(“select * from test_acid”)
df.show(20)
测试成功后
使用代码连接

以前导入spark-sql ,spark-hive就可以使用
现在根据上面的spark-shell必须使用hdp开发的hive-warehouse-connector jar包

通过这个引入jar包后
运行代码报错 没有相关方法shutdownhook。。。
这时候无论怎么修改依赖版本都有这个报错
最后我引入集群spark 所有jar包后这个报错消失
发现新的报错 不知道主机 host.name
很明显是域名映射的问题
发现打印
INFO ZooKeeper: Client environment:host.name=DESKTOP-7HQKIT8
说明给ZooKeeper交互的是一个域名而集群无法识别这个域名
我的方法是在集群添加映射自己IP 域名
最后因为需要远程 需要关闭自己的防火墙

成功
最后需要提交代码到集群
spark-submit --master yarn
–conf spark.driver.extraClassPath=/usr/hdp/3.1.5.0-152/spark2/jars/*
–class SparkSQL /tmp/sparksql.jar
我把hive-warehouse-connector-assembly-1.0.0.3.1.5.0-152.jar统一放入spark2/jars/
成功
Demo

import java.util.concurrent.TimeUnit

import com.hortonworks.hwc.HiveWarehouseSession
import com.hortonworks.hwc.HiveWarehouseSession._
import org.apache.spark.SparkConf
import org.apache.spark.sql.SparkSession

/** Property Description Comments
*spark.sql.hive.hiveserver2.jdbc.url URL for HiveServer2 Interactive In Ambari, copy the value from Services > Hive > Summary > HIVESERVER2 INTERACTIVE JDBC URL.
*spark.datasource.hive.warehouse.metastoreUri URI for metastore Copy the value from hive.metastore.uris. For example, thrift://mycluster-1.com:9083.
*spark.datasource.hive.warehouse.load.staging.dir HDFS temp directory for batch writes to Hive For example, /tmp.
*spark.hadoop.hive.llap.daemon.service.hosts Application name for LLAP service Copy value from Advanced hive-interactive-site > hive.llap.daemon.service.hosts.
*spark.hadoop.hive.zookeeper.quorum Zookeeper hosts used by LLAP Copy value from Advanced hive-sitehive.zookeeper.quorum.
*
*

  • /
    object SparkSQL {
    // org.apache.hadoop.hive.llap.LlapBaseInputFormat
    // org.apache.spark.sql.catalyst.catalog.SessionCatalog()
    //org.apache.hive.common.util.ShutdownHookManager.addShutdownHook(Runnable a,TimeUnit a)
    def main(args: Array[String]): Unit = {
    System.setProperty(“HADOOP_USER_NAME”, “hdfs”)
    val sparkConfiguration = new SparkConf
    sparkConfiguration.set(“spark.sql.hive.hiveserver2.jdbc.url”, “jdbc:hive2://node1:2181,node2:2181,node3:2181,node4:2181,node5:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-interactive”)
    sparkConfiguration.set(“spark.datasource.hive.warehouse.metastoreUri”, “thrift://node2:9083”)
    sparkConfiguration.set(“spark.datasource.hive.warehouse.load.staging.dir”, “hdfs://node1:8020/tmp”)
    sparkConfiguration.set(“spark.hadoop.hive.llap.daemon.service.hosts”, “@llap0”)
    sparkConfiguration.set(“spark.hadoop.hive.zookeeper.quorum”, “node1:2181,node2:2181,node3:2181,node4:2181,node5:2181”)
    // sparkConfiguration.set(“spark.sql.hive.metastore.version”, “2.3.3”)
    val spark = SparkSession.builder().appName(“spark2Test”)
    .config(sparkConfiguration)
    .master("local[
    ]")
    .enableHiveSupport
    .getOrCreate()
    // spark.sql(“show databases”).show()
    // spark.sql(“show tables”).show()
    val hive = HiveWarehouseSession.session(spark).build()
    hive.setDatabase(“test_db”)
    val df = hive.executeQuery(“select * from test_acid3”)
    df.show(20)
    df.write.
    format(“com.hortonworks.spark.sql.hive.llap.HiveWarehouseConnector”)
    .mode(“append”).option(“table”, “test_acid2”).save
    hive.close()
    spark.close()
    spark.stop()
    println(">>>>>>>> close <<<<<<<<<<")
    System.exit(0)

}
}

HiveWarehouseSession 不会停止 这里我用 System.exit(0)来强制停止

官网说明
https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.0.1/integrating-hive/content/hive_hivewarehouseconnector_for_handling_apache_spark_data.html

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值