spark DataFrame 写出到MySQL时报如下错误:
java.sql.BatchUpdateException: Column ‘name’ specified twice at sun.reflect
原因: 写出的DataFrame 表结构和MySQL中创建的表结构不一致,
2个 DataFrame join 后的结果中有两列都是“name”列。
解决: 修改DataFrame写出结构。核心代码如下:
val res1: Dataset[Row] = studentInfoDF.join(stu_scoresDF,
studentInfoDF.col("name") === stu_scoresDF.col("name"))
.filter(stu_scoresDF.col("score") > 80)
res1.show(false)
import spark.implicits._
val out: Dataset[(String, Int, String)] = res1.map(row => (row.getAs[String](0),
row.getAs[Int](1),
row.getAs[String](3)))
out.toDF("name","age","score").write.mode("append").jdbc(url,"good_stu",prop)