以前spark.write时总要先把原来的删了,但其实是可以设置写入模式的。
val df = spark.read.parquet(input)
df.write.mode("overwrite").parquet(output)
dataframe写入的模式一共有4种:
- overwrite 覆盖已经存在的文件
- append 向存在的文件追加
- ignore 如果文件已存在,则忽略保存操作
- error / default 如果文件存在,则报错
def mode(saveMode: String): DataFrameWriter = {
this.mode = saveMode.toLowerCase match {
case "overwrite" => SaveMode.Overwrite
case "append" => SaveMode.Append
case "ignore" => SaveMode.Ignore
case "error" | "default" => SaveMode.ErrorIfExists
case _ => throw new IllegalArgumentException(s"Unknown save mode: $saveMode. " +
"Accepted modes are 'overwrite', 'append', 'ignore', 'error'.")
}
this
}