执行形如下列的spark sql:
select xxx from TABLENAME WHERE x=1 having CONDITION = 1
报错:
org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to toAttribute on unresolved object, tree: ArrayBuffer(a).*
at org.apache.spark.sql.catalyst.analysis.Star.toAttribute(unresolved.scala:245)
at org.apache.spark.sql.catalyst.plans.logical.Project$$anonfun$output$1.apply(basicLogicalOperators.scala:52)
at org.apache.spark.sql.catalyst.plans.logical.Project$$anonfun$output$1.apply(basicLogicalOperators.scala:52)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:392)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.immutable.List.map(List.scala:296)
at org.apache.spark.sql.catalyst.plans.logical.Project.output(basicLogicalOperators.scala:52)
at org.apache.spark.sql.hive.HiveAnalysis$$anonfun$apply$3.applyOrElse(HiveStrategies.scala:160)
at org.apache.spark.sql.hive.HiveAnalysis$$anonfun$apply$3.applyOrElse(HiveStrategies.scala:148)
原因:
https://spark.apache.org/docs/2.4.0/sql-migration-guide-upgrade.html#upgrading-from-spark-sql-23-to-24
spark sql在2.3和2.4的处理逻辑不一样,
在2.3中如果没有GROUP BY,则HAVING会当成WHERE进行处理
但2.4中就会报上述错误。
可参考的解决方案:
1.改sql语句,加上group by
2. 启动的时候设置spark.sql.legacy.parser.havingWithoutGroupByAsWhere 为true ,即 ./bin/spark-sql --conf spark.sql.legacy.parser.havingWithoutGroupByAsWhere =true