Spark将数据写入hive时报错:
Exception in thread "main" org.apache.spark.SparkException: Job aborted.
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2.0 (TID 2)
代码:
val conf = new SparkConf().setMaster("local[*]").setAppName("four")
val sc = new SparkSession.Builder()
.config("hive.exec.dynamic.partition", "true")
.config("hive.exec.dynamic.partition.mode", "nonstrict")
.config(conf)
.enableHiveSupport()
.getOrCreate()
//写入hive表操作
df.write
.format("hive")
.mode(SaveMode.Append)
.saveAsTable("user.user_info")
报错分析:
由于写入时我认为是连接hive时不稳定导致的写入报错
问题解决
dfs.client.use.datanode.hostname=true
val conf = new SparkConf().setMaster("local[*]").setAppName("four")
.set("dfs.client.use.datanode.hostname", "true")
val sc = new SparkSession.Builder()
.config("hive.exec.dynamic.partition", "true")
.config("hive.exec.dynamic.partition.mode", "nonstrict")
.config(conf)
.enableHiveSupport()
.getOrCreate()
//写入hive表操作
df.write
.format("hive")
.mode(SaveMode.Append)
.saveAsTable("user.user_info")
HDFS客户端直接使用DataNode的主机名而不是IP地址来连接到DataNode。
默认情况下,HDFS客户端会使用DataNode的IP地址来建立连接。但是,在某些情况下,可能存在多个IP地址对应同一个主机名的情况,这时客户端可能会连接到错误的DataNode。通过设置
dfs.client.use.datanode.hostname
为true
,客户端会使用主机名来建立连接,从而避免这个问题。