ERROR InsertIntoHadoopFsRelationCommand: Aborting job. ...please set spark.sql.crossJoin.enabled

下面是报错信息:

18/01/18 10:28:00 ERROR InsertIntoHadoopFsRelationCommand: Aborting job.
org.apache.spark.sql.AnalysisException: Cartesian joins could be prohibitively expensive and are disabled by default. To explicitly enable them, please set spark.sql.crossJoin.enabled = true;
	at org.apache.spark.sql.execution.joins.CartesianProductExec.doPrepare(CartesianProductExec.scala:96)
	at org.apache.spark.sql.execution.SparkPlan.prepare(SparkPlan.scala:199)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$prepare$1.apply(SparkPlan.scala:195)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$prepare$1.apply(SparkPlan.scala:195)
	at scala.collection.immutable.List.foreach(List.scala:381)
	at org.apache.spark.sql.execution.SparkPlan.prepare(SparkPlan.scala:195)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$prepare$1.apply(SparkPlan.scala:195)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$prepare$1.apply(SparkPlan.scala:195)
	at scala.collection.immutable.List.foreach(List.scala:381)
	at org.apache.spark.sql.execution.SparkPlan.prepare(SparkPlan.scala:195)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$prepare$1.apply(SparkPlan.scala:195)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$prepare$1.apply(SparkPlan.scala:195)
	at scala.collection.immutable.List.foreach(List.scala:381)
	at org.apache.spark.sql.execution.SparkPlan.prepare(SparkPlan.scala:195)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$prepare$1.apply(SparkPlan.scala:195)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$prepare$1.apply(SparkPlan.scala:195)
	at scala.collection.immutable.List.foreach(List.scala:381)
	at org.apache.spark.sql.execution.SparkPlan.prepare(SparkPlan.scala:195)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:134)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:86)
	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:86)
	at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFsRelationCommand.scala:143)
	at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115)
	at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
	at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:115)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:86)
	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:86)
	at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:525)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:211)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:194)
	at org.apache.spark.sql.DataFrameWriter.csv(DataFrameWriter.scala:573)
	at wangsheng.sibat.highway.cal$.saveFile$1(cal.scala:50)
	at wangsheng.sibat.highway.cal$.main(cal.scala:47)
	at wangsheng.sibat.highway.cal.main(cal.scala)
查看我的代码:

    val IDroadFlow = data2.filter($"InRoadNo" === "3|4|5").groupBy("carID","InRoadNo").count().toDF("carID","InRoadNo","count").join(GPSdata,$"InRoadNo" === "Road")
      .toDF("carID","InRoadNo","count","InRoad","Node","InRoadName","NodeName")
错误提示中有提到
please set spark.sql.crossJoin.enabled = true
是join的问题,所以我主要查看join问题是在哪里。

没有在join操作中的列名前加$符号,也没有指定连接类型,都加上

    val IDroadFlow = data2.filter($"InRoadNo" === "3|4|5").groupBy("carID","InRoadNo").count().toDF("carID","InRoadNo","count").join(GPSdata,$"InRoadNo" === $"Road","inner")
      .toDF("carID","InRoadNo","count","InRoad","Node","InRoadName","NodeName")
运行OK



  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值