Spark使用idea整合Hive无法从hdfs读取文件，报错信息为LOAD DATA input path does not exist: hdfs://解决办法

Arknights Expert

已于 2022-05-11 08:44:16 修改

阅读量2.1k

点赞数

分类专栏：笔记文章标签： spark hive hdfs intellij-idea 大数据

于 2022-05-11 08:22:25 首次发布

本文链接：https://blog.csdn.net/Quin22/article/details/124701953

版权

笔记专栏收录该内容

6 篇文章 1 订阅

订阅专栏

在用idea学习spark整合Hive时，从hdfs导入数据突然报错

完整报错信息:

Exception in thread "main" org.apache.spark.sql.AnalysisException: LOAD DATA input path does not exist: hdfs://master1:9000/sql/athlete_events.csv;
	at org.apache.spark.sql.execution.command.LoadDataCommand.run(tables.scala:389)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
	at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:190)
	at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:190)
	at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3259)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
	at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3258)
	at org.apache.spark.sql.Dataset.<init>(Dataset.scala:190)
	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:75)
	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642)
	at hive$.main(hive.scala:32)
	at hive.main(hive.scala)

一开始以为是文件不存在，但是实际上是存在的

之后以为是hive的问题，但是在hive中这条命令是可以用的

网上查了半天也没有解决办法，最后翻阅官方文档时注意到

于是我将三个配置文件复制到了spark的conf和代码目录的resources下(我这里是用了软连接测试)，同时在代码中添加了一行spark.sql.warehouse.dir来指定数仓默认位置

之后就应该不会报错了，如果报错的话可以试试把图里圈起来的上面两行也加进去试试

同时要注意，这里不能像在hive里直接操作那样不屑文件系统，需要指定地址hdfs地址为hdfs://{主机名或者ip}:9000/{Hive在hdfs上的数据目录}

不然可能会报找不到文件的错误

具体是什么原理我也不清楚，我猜测应该是默认数仓位置不在hdfs上，因为我查看文件时发现不知道什么时候创建了一个spark-warehouse,而进入该目录会发现出现了代码中我要创建的库test1

我个人理解可能不是很对，如果有大牛知道为什么可以在评论说一下，互相学习

Arknights Expert

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
Spark使用idea整合Hive无法从hdfs读取文件，报错信息为LOAD DATA input path does not exist: hdfs://解决办法

Spark使用idea整合Hive无法从hdfs读取文件，报错信息为LOAD DATA input path does not exist: hdfs://解决办法
复制链接

扫一扫