mysql save mode,Spark SQL和MySQL- SaveMode.Overwrite不插入修改的数据

I have a test table in MySQL with id and name like below:

+----+-------+

| id | name |

+----+-------+

| 1 | Name1 |

+----+-------+

| 2 | Name2 |

+----+-------+

| 3 | Name3 |

+----+-------+

I am using Spark DataFrame to read this data (using JDBC) and modifying the data like this

Dataset modified = sparkSession.sql("select id, concat(name,' - new') as name from test");

modified.write().mode("overwrite").jdbc(AppProperties.MYSQL_CONNECTION_URL,

"test", connectionProperties);

But my problem is, if I give overwrite mode, it drops the previous table and creates a new table but not inserting any data.

I tried the same program by reading from a csv file (same data as test table) and overwriting. That worked for me.

Am I missing something here ?

Thank You!

解决方案

The problem is in your code. Because you overwrite a table from which you're trying to read you effectively obliterate all data before Spark can actually access it.

Remember that Spark is lazy. When you create a Dataset Spark fetches required metadata, but doesn't load the data. So there is no magic cache which will preserve original content. Data will be loaded when it is actually required. Here it is when you execute write action and when you start writing there is no more data to be fetched.

What you need is something like this:

Create a Dataset.

Apply required transformations and write data to an intermediate MySQL table.

TRUNCATE the original input and INSERT INTO ... SELECT from the intermediate table or DROP the original table and RENAME intermediate table.

Alternative, but less favorable approach, would be:

Create a Dataset.

Apply required transformations and write data to a persistent Spark table (df.write.saveAsTable(...) or equivalent)

TRUNCATE the original input.

Read data back and save (spark.table(...).write.jdbc(...))

Drop Spark table.

We cannot stress enough that using Spark cache / persist is not the way to go. Even in with the conservative StorageLevel (MEMORY_AND_DISK_2 / MEMORY_AND_DISK_SER_2) cached data can be lost (node failures), leading to silent correctness errors.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值