spark jdbc java,Spark jdbc重用连接

最新推荐文章于 2023-05-18 12:38:13 发布

旧文字

最新推荐文章于 2023-05-18 12:38:13 发布

阅读量207

点赞数

文章标签： spark jdbc java

本文探讨如何在Spark应用中，通过JDBC连接数据库时避免频繁创建连接，提高性能。作者提出利用已有的JDBC对象，尤其是在多节点集群环境下，确保每个分区都有可用的数据库连接，以减少资源开销和提升效率。

摘要由CSDN通过智能技术生成

In my spark application, i use the following code to retrieve the data from sql server database using JDBC driver.

Dataset dfResult= sparksession.read().jdbc("jdbc:sqlserver://server\dbname", tableName,partitionColumn, lowerBound, upperBound, numberOfPartitions, properties);

and use map operation on dfResult dataset.

While running the application in standalone mode, i see spark creates unique connection for each rdd.From the Api description, I understand spark takes care of closing the connection.

May i know whether there is a way to reuse the connection instead of opening and closing the jdbc connection for each rdd partition?

Thanks

解决方案

Even when you're pushing data manually into a database over an API, I often see recommendations that you create one connection per partition.

# pseudo-code

rdd.foreachPartition(iterator =>

connection = SomeAPI.connect()

for i in iterator:

connection.insert(i)

)

And so, if the jdbc object is already doing that, then that must be confirming that the pattern should be that way.

Here's another example of this pattern being recommended:

I presume the reason why this is the recommended pattern is because when you're working in a multi-node cluster, you never know on which node a particular partition will be evaluated, and thus, you'd want to ensure it has a DB connection for it.

旧文字

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
spark jdbc java,Spark jdbc重用连接

In my spark application, i use the following code to retrieve the data from sql server database using JDBC driver.Dataset dfResult= sparksession.read().jdbc("jdbc:sqlserver://server\dbname", tableName...
复制链接

扫一扫