spark jdbc java,Spark jdbc重用连接

本文探讨如何在Spark应用中,通过JDBC连接数据库时避免频繁创建连接,提高性能。作者提出利用已有的JDBC对象,尤其是在多节点集群环境下,确保每个分区都有可用的数据库连接,以减少资源开销和提升效率。
摘要由CSDN通过智能技术生成

In my spark application, i use the following code to retrieve the data from sql server database using JDBC driver.

Dataset dfResult= sparksession.read().jdbc("jdbc:sqlserver://server\dbname", tableName,partitionColumn, lowerBound, upperBound, numberOfPartitions, properties);

and use map operation on dfResult dataset.

While running the application in standalone mode, i see spark creates unique connection for each rdd.From the Api description, I understand spark takes care of closing the connection.

May i know whether there is a way to reuse the connection instead of opening and closing the jdbc connection for each rdd partition?

Thanks

解决方案

Even when you're pushing data manually into a database over an API, I often see recommendations that you create one connection per partition.

# pseudo-code

rdd.foreachPartition(iterator =>

connection = SomeAPI.connect()

for i in iterator:

connection.insert(i)

)

And so, if the jdbc object is already doing that, then that must be confirming that the pattern should be that way.

Here's another example of this pattern being recommended:

AcHRd.png

I presume the reason why this is the recommended pattern is because when you're working in a multi-node cluster, you never know on which node a particular partition will be evaluated, and thus, you'd want to ensure it has a DB connection for it.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值