以下spark dataframe 代码
df.groupBy("name").min("date")```
报错信息如下:
Exception in thread "main" org.apache.spark.sql.AnalysisException: "to_account_date" is not a numeric column. Aggregation function can only be applied on a numeric column.;
at org.apache.spark.sql.RelationalGroupedDataset$$anonfun$3.apply(RelationalGroupedDataset.scala:103)
at org.apache.spark.sql.RelationalGroupedDataset$$anonfun$3.apply(RelationalGroupedDataset.scala:100)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at org.apache.spark.sql.RelationalGroupedDataset.aggregateNumericColumns(RelationalGroupedDataset.scala:100)
at org.apache.spark.sql.RelationalGroupedDataset.min(RelationalGroupedDataset.scala:286)
at com.bdt.doep.dhe.ba.huitui.Achievement$.load(Achievement.scala:65)
at com.bdt.doep.dhe.ba.huitui.Main.run(Main.scala:27)
at com.bdt.doep.dhe.ba.huitui.Main$.main(Main.scala:84)
at com.bdt.doep.dhe.ba.huitui.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:904)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
错误信息很明显:groupby 后的min聚集(或其他聚集)只支持数值类型
知道错误信息,目前已知两种修改方式:
方式1:
df.groupBy(“name”).agg(“date”->“min”)
方式2:
修改列类型为数值类型
以后有其他方式,继续完善