spark读写mysql数据量大_spark写数据到Mysql

最新推荐文章于 2021-02-11 10:14:26 发布

weixin_39575648

最新推荐文章于 2021-02-11 10:14:26 发布

阅读量510

点赞数

文章标签： spark读写mysql数据量大

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/weixin_39575648/article/details/113111974

版权

本文介绍了如何在Spark中使用c3p0连接池优化读写MySQL大数据量的操作。通过使用`foreachPartition`算子减少数据库连接，配合批量插入和关闭自动提交来提高效率。同时，文章强调了配置数据库URL以启用批处理操作以及在操作完成后正确关闭资源的重要性。

摘要由CSDN通过智能技术生成

1、使用c3p0

这个主要是因为c3p0实现了序列化，这样就可以直接传输到Worker上

ComboPooledDataSource

这个类主要是用来做生成数据库连接实例的，让它传到Worker上就可以直接使用了

2、业务代码

获取datasource

def getC3p0DateSource(filename:String,config:String): ComboPooledDataSource ={

val dataSource : ComboPooledDataSource = new ComboPooledDataSource(true)

val conf = FileUtils.readJsonFile2Prop(filename,config)

dataSource.setJdbcUrl(conf.getProperty("url"))

dataSource.setDriverClass(conf.getProperty("driverClassName"))

dataSource.setUser(conf.getProperty("username"))

dataSource.setPassword(conf.getProperty("password"))

dataSource.setMaxPoolSize(Integer.valueOf(conf.getProperty("maxPoolSize")))

dataSource.setMinPoolSize(Integer.valueOf(conf.getProperty("minPoolSize")))

dataSource.setAcquireIncrement(Integer.valueOf(conf.getProperty("acquireIncrement")))

dataSource.setInitialPoolSize(Integer.valueOf(conf.getProperty("initialPoolSize")))

dataSource.setMaxIdleTime(Integer.valueOf(conf.getProperty("maxIdleTime")))

dataSource

}

注意这里的InitialPoolSize不能太大

.foreachPartition(it=>{

val conn = comboPooledDataSource.getConnection

val statement = conn.prepareStatement("insert into tb_eventclass_min values (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)")

conn.setAutoCommit(false)

it.foreach(x=>{

statement.setString(1,UUID.randomUUID().toString)

statement.setLong(2,x._1._1.toString.toLong)

statement.setLong(3,x._1._2.toString.toLong)

statement.setString(4,x._1._3.toString)

statement.setString(5,x._1._4.toString)

statement.setString(6,x._1._5.toString)

statement.setString(7,x._1._6.toString)

statement.setString(8,x._1._7.toString)

statement.setString(9,x._1._8.toString)

statement.setLong(10,x._2)

statement.setShort(11,x._1._10.toString.toShort)

statement.setLong(12,x._1._9/60000L)

val calendar = Calendar.getInstance()

calendar.setTime(new Date(x._1._9))

val year = calendar.get(Calendar.YEAR)

val month = calendar.get(Calendar.MONTH)+1

val day = calendar.get(Calendar.DAY_OF_MONTH)

val hour = calendar.get(Calendar.HOUR_OF_DAY)

val min = calendar.get(Calendar.MINUTE)

statement.setInt(13,year)

statement.setInt(14,month)

statement.setInt(15,day)

statement.setInt(16,hour)

statement.setInt(17,min)

statement.addBatch()

})

try {

statement.executeBatch()conn.commit()

}catch {

case e:Exception=>e.printStackTrace()

}finally {

statement.close()

conn.close()

}})

在这里有四个注意点：

1、使用foreachPartition算子，减少数据库连接

这样dataSource.getConnection生成的连接和partition数量是一直的(不会很多)

2、使用批量插入，提高效率

这里要注意要开启批量插入

在数据库连接的URl后面加上 rewriteBatchedStatements=true (启动批处理操作)

String dbUrl = "jdbc:mysql://localhost:3306/User? rewriteBatchedStatements=true";

3、关闭自动提交，防止死锁

conn.setAutoCommit(false)

4、在执行结束时要将statement和connect关闭

statement会一直增加消耗内存 connect归还到资源池中

weixin_39575648

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
spark读写mysql数据量大_spark写数据到Mysql

1、使用c3p0这个主要是因为c3p0实现了序列化，这样就可以直接传输到Worker上ComboPooledDataSource这个类主要是用来做生成数据库连接实例的，让它传到Worker上就可以直接使用了2、业务代码获取datasourcedef getC3p0DateSource(filename:String,config:String): ComboPooledDataSource ={v...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。