spark导入mysql建表_Spark操作dataFrame进行写入mysql，自定义sql的方式

最新推荐文章于 2023-05-17 11:34:04 发布

信息门下跑狗

最新推荐文章于 2023-05-17 11:34:04 发布

阅读量661

点赞数

文章标签： spark导入mysql建表

本文链接：https://blog.csdn.net/weixin_31978571/article/details/112894655

版权

本文介绍了如何在Spark中使用DataFrame并通过自定义SQL语句，利用c3p0连接池写入MySQL数据库。内容包括：创建连接池、处理主键限制、指定字段写入、更新操作等。示例代码详细展示了数据写入过程。

摘要由CSDN通过智能技术生成

业务场景：

现在项目中需要通过对spark对原始数据进行计算，然后将计算结果写入到mysql中，但是在写入的时候有个限制：

1、mysql中的目标表事先已经存在，并且当中存在主键，自增长的键id

2、在进行将dataFrame写入表的时候，id字段不允许手动写入，因为其实自增长的

要求：

1、写入数据库的时候，需要指定字段写入，也就是说，只指定部分字段写入

2、在写入数据库的时候，对于操作主键相同的记录要实现更新操作，非插入操作

分析：

spark本身提供了对dataframe的写入数据库的操作,即：

/**

* SaveMode is used to specify the expected behavior of saving a DataFrame to a data source.

* @since 1.3.0

public enum SaveMode {

/**

* Append mode means that when saving a DataFrame to a data source, if data/table already exists,

* contents of the DataFrame are expected to be appended to existing data.

* @since 1.3.0

Append,

/**

* Overwrite mode means that when saving a DataFrame to a data source,

* if data/table already exists, existing data is expected to be overwritten by the contents of

* the DataFrame.

* @since 1.3.0

Overwrite,

/**

* ErrorIfExists mode means that when saving a DataFrame to a data source, if data already exists,

* an exception is expected to be thrown.

* @since 1.3.0

ErrorIfExists,

/**

* Ignore mode means that when saving a DataFrame to a data source, if data already exists,

* the save operation is expected to not save the contents of the DataFrame and to not

* change the existing data.

* @since 1.3.0

Ignore

}

但是，显然这种方式写入的时候，需要我们的dataFrame中的每个字段都需要对mysql目标表中相对应，在写入的时候需要全部字段都写入，这是种方式简单，但是这不符合我们的业务需求，所以我们需要换一种思路，也就是说，如果我们能够通过自定义insert语句的方式，也就是说通过jdbc的方式进行写入数据，那就更好了。这样也更符合我们的业务需求。

具体实现(开发环境：IDEA)：

实现方式：通过c3p0连接池的方式进行数据的写入，这样我们就可以直接通过自己拼接sql，来实现我们需要插入数据库的指定的字段值，当然这种方式实现起来也比较繁琐。

第一步：

我们需要先导入响应的依赖包：

sbt项目导入方式：

打开build.sbt文件

在红色框出进行添加即可

maven项目导入方式：

mysql

mysql-connector-java

6.0.6

com.mchange

c3p0

0.9.5

我习惯与将关于数据库操作的几个库类放到单独的一个BDUtils包中

第一步：定义读取配置文件的类

package cn.com.xxx.audit.DBUtils

import java.util.Properties

object PropertiyUtils {

def getFileProperties(fileName: String, propertityKey: String): String = {

val result = this.getClass.getClassLoader.getResourceAsStream(fileName)

val prop = new Properties()

prop.load(result)

prop.getProperty(propertityKey)

}

第二步：定义一个配置文件(db.properties),将该文件放在resource目录中，并且内容使用"="进行连接

db.propreties

mysql.jdbc.url=jdbc:mysql://localhost:3306/test?serverTimezone=UTC

mysql.jdbc.host=127.0.0.1

mysql.jdbc.port=3306

mysql.jdbc.user=root

mysql.jdbc.password=123456

mysql.pool.jdbc.minPoolSize=20

mysql.pool.jdbc.maxPoolSize=50

mysql.pool.jdbc.acquireIncrement=10

mysql.pool.jdbc.maxStatements=50

mysql.driver=com.mysql.jdbc.Driver

第三步：定义一个连接池的类，负责获取配置文件，并创建数据库连接池

package cn.com.xxx.audit.DBUtils

import java.sql.Connection

import com.mchange.v2.c3p0.ComboPooledDataSource

class MySqlPool extends Serializable {

private val cpds: ComboPooledDataSource = new ComboPooledDataSource(true)

try {

cpds.setJdbcUrl(PropertiyUtils.getFileProperties("db.properties", "mysql.jdbc.url"))

cpds.setDriverClass(PropertiyUtils.getFileProperties("db.properties", "mysql.driver"))

cpds.setUser(PropertiyUtils.getFileProperties("db.properties", "mysql.jdbc.user"))

cpds.setPassword(PropertiyUtils.getFileProperties("db.properties", "mysql.jdbc.password"))

cpds.setMinPoolSize(PropertiyUtils.getFileProperties("db.properties", "mysql.pool.jdbc.minPoolSize").toInt)

cpds.setMaxPoolSize(PropertiyUtils.getFileProperties("db.properties", &#

最低0.47元/天解锁文章

信息门下跑狗

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
spark导入mysql建表_Spark操作dataFrame进行写入mysql，自定义sql的方式

业务场景：现在项目中需要通过对spark对原始数据进行计算，然后将计算结果写入到mysql中，但是在写入的时候有个限制：1、mysql中的目标表事先已经存在，并且当中存在主键，自增长的键id2、在进行将dataFrame写入表的时候，id字段不允许手动写入，因为其实自增长的要求：1、写入数据库的时候，需要指定字段写入，也就是说，只指定部分字段写入2、在写入数据库的时候，对于操作主键相同的记录要实现...
复制链接

扫一扫