ERROR InsertIntoHadoopFsRelationCommand: Aborting job. ...please set spark.sql.crossJoin.enabled

最新推荐文章于 2024-03-08 17:14:41 发布

叫我三少爷

最新推荐文章于 2024-03-08 17:14:41 发布

阅读量2.6k

点赞数

分类专栏： spark

本文链接：https://blog.csdn.net/charKim/article/details/79093179

版权

spark 专栏收录该内容

12 篇文章 0 订阅

订阅专栏

下面是报错信息：

18/01/18 10:28:00 ERROR InsertIntoHadoopFsRelationCommand: Aborting job.
org.apache.spark.sql.AnalysisException: Cartesian joins could be prohibitively expensive and are disabled by default. To explicitly enable them, please set spark.sql.crossJoin.enabled = true;
	at org.apache.spark.sql.execution.joins.CartesianProductExec.doPrepare(CartesianProductExec.scala:96)
	at org.apache.spark.sql.execution.SparkPlan.prepare(SparkPlan.scala:199)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$prepare$1.apply(SparkPlan.scala:195)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$prepare$1.apply(SparkPlan.scala:195)
	at scala.collection.immutable.List.foreach(List.scala:381)
	at org.apache.spark.sql.execution.SparkPlan.prepare(SparkPlan.scala:195)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$prepare$1.apply(SparkPlan.scala:195)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$prepare$1.apply(SparkPlan.scala:195)
	at scala.collection.immutable.List.foreach(List.scala:381)
	at org.apache.spark.sql.execution.SparkPlan.prepare(SparkPlan.scala:195)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$prepare$1.apply(SparkPlan.scala:195)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$prepare$1.apply(SparkPlan.scala:195)
	at scala.collection.immutable.List.foreach(List.scala:381)
	at org.apache.spark.sql.execution.SparkPlan.prepare(SparkPlan.scala:195)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$prepare$1.apply(SparkPlan.scala:195)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$prepare$1.apply(SparkPlan.scala:195)
	at scala.collection.immutable.List.foreach(List.scala:381)
	at org.apache.spark.sql.execution.SparkPlan.prepare(SparkPlan.scala:195)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:134)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:86)
	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:86)
	at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFsRelationCommand.scala:143)
	at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115)
	at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
	at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:115)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:86)
	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:86)
	at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:525)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:211)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:194)
	at org.apache.spark.sql.DataFrameWriter.csv(DataFrameWriter.scala:573)
	at wangsheng.sibat.highway.cal$.saveFile$1(cal.scala:50)
	at wangsheng.sibat.highway.cal$.main(cal.scala:47)
	at wangsheng.sibat.highway.cal.main(cal.scala)

查看我的代码：

    val IDroadFlow = data2.filter($"InRoadNo" === "3|4|5").groupBy("carID","InRoadNo").count().toDF("carID","InRoadNo","count").join(GPSdata,$"InRoadNo" === "Road")
      .toDF("carID","InRoadNo","count","InRoad","Node","InRoadName","NodeName")

错误提示中有提到

please set spark.sql.crossJoin.enabled = true

是join的问题，所以我主要查看join问题是在哪里。

没有在join操作中的列名前加$符号，也没有指定连接类型，都加上

    val IDroadFlow = data2.filter($"InRoadNo" === "3|4|5").groupBy("carID","InRoadNo").count().toDF("carID","InRoadNo","count").join(GPSdata,$"InRoadNo" === $"Road","inner")
      .toDF("carID","InRoadNo","count","InRoad","Node","InRoadName","NodeName")

运行OK

叫我三少爷

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
ERROR InsertIntoHadoopFsRelationCommand: Aborting job. ...please set spark.sql.crossJoin.enabled

下面是报错信息：18/01/18 10:28:00 ERROR InsertIntoHadoopFsRelationCommand: Aborting job.org.apache.spark.sql.AnalysisException: Cartesian joins could be prohibitively expensive and are disabled by default. T
复制链接

扫一扫