前言
本文隶属于专栏《Spark异常问题汇总》,该专栏为笔者原创,引用请注明来源,不足和错误之处请在评论区帮忙指出,谢谢!
本专栏目录结构和参考文献请见 Spark异常问题汇总
正文
问题描述
Spark 编译报错:
Error:(34, 25) overloaded method foreachBatch with alternatives: (function: org.apache.spark.api.java.function.VoidFunction2[org.apache.spark.sql.Dataset[org.apache.spark.sql.Row],java.lang.Long])org.apache.spark.sql.streaming.DataStreamWriter[org.apache.spark.sql.Row] (function: (org.apache.spark.sql.Dataset[org.apache.spark.sql.Row], scala.Long) => Unit)org.apache.spark.sql.streaming.DataStreamWriter[org.apache.spark.sql.Row] cannot be applied to ((org.apache.spark.sql.DataFrame, scala.Long) => org.apache.spark.sql.DataFrame) askDF.writeStream.foreachBatch { (askDF: DataFrame, batchId: Long) =>
我的代码如下所示:
val properties = new java.util.Properties()
properties.setProperty("user", "root")
properties.setProperty("password", "123456")
val query = wordCounts.writeStream
.outputMode("complete")
.foreachBatch((ds, batchID) => {
println("BatchID:" + batchID)
if(ds.count() != 0){
ds.cache()
ds.write.json(PATH_PREFIX + batchID)
ds.write.mode(SaveMode.Overwrite).jdbc("jdbc:mysql://node1:3306/spark_bigdata_analyze", "t_word_count", properties)
ds.unpersist()
}
}).start()
query.awaitTermination()
}
问题定位
这是由于Scala 版本由 2.11 升级成 2.12 所致。
由于Scala 2.12中的一些更改,DataStreamWriter.foreachBatch方法需要对代码进行一些更新,否则就会发生这种模糊性。
可以在此处查看两种foreachBatch方法:
https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/streaming/DataStreamWriter.html
问题解决
可以改用scala 2.11,或者查看已解决这个问题的链接:
https://docs.databricks.com/release-notes/runtime/7.0.html
代码修改
val properties = new java.util.Properties()
properties.setProperty("user", "root")
properties.setProperty("password", "123456")
val query = wordCounts.writeStream
.outputMode("complete")
.foreachBatch((ds: Dataset[Row], batchId: Long) => myFunc(ds, batchId, properties)).start()
query.awaitTermination()
private def myFunc(ds: Dataset[Row], batchID: Long, properties: java.util.Properties): Unit = {
println("BatchID:" + batchID)
if (ds.count() != 0) {
ds.cache()
ds.write.json(PATH_PREFIX + batchID)
ds.write.mode(SaveMode.Overwrite).jdbc("jdbc:mysql://node1:3306/spark_bigdata_analyze", "t_word_count", properties)
ds.unpersist()
}
}
}