spark sql并行读取实践

最新推荐文章于 2023-04-10 17:14:30 发布

当当是个程序员

最新推荐文章于 2023-04-10 17:14:30 发布

阅读量1.4k

点赞数

分类专栏： spark 文章标签： spark sql 大数据

本文链接：https://blog.csdn.net/u012601009/article/details/120993406

版权

spark sql 并行查询第一种使用指定分区列的方式http://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databasespartitionColumn must be a numeric, date, or timestamp column from the table in question.partitionColumn, lowerBound, upperBound These optio

摘要由CSDN通过智能技术生成

spark sql 并行查询

第一种使用指定分区列的方式

http://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases
partitionColumn must be a numeric, date, or timestamp column from the table in question.

partitionColumn, lowerBound, upperBound These options must all be specified if any of them is specified. In addition, numPartitions must be specified. They describe how to partition the table when reading in parallel from multiple workers. partitionColumn must be a numeric column from the table in question. Notice that lowerBound and upperBound are just used to decide the partition stride, not for filtering the rows in table. So all rows in the table will be partitioned and returned. This option applies only to reading.

numPartitions The maximum number of partitions that can be used for parallelism in table reading and writing. This also determines the maximum number of concurrent JDBC connections. If the number of partitions to write exceeds this limit, we decrease it to this limit by calling coalesce(numPartitions) before writing.

fetchsize The JDBC fetch size, which determines how many rows to fetch per round trip. This can help performance on JDBC drivers which default to low fetch size (eg. Oracle with 10 rows). This option applies only to reading.

batchsize The JDBC batch size, which determines how many rows to insert per round trip. This can help performance on JDBC drivers. This option applies only to writing. It defaults to 1000.

Dataset<Row> pgJdbcDF2 = spark.read()
    .format("jdbc")
    .option("url", "jdbc:postgresql://ip:15400/postgres?gssEncMode=disable")
    .

最低0.47元/天解锁文章

当当是个程序员

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
spark sql并行读取实践

spark sql 并行查询第一种使用指定分区列的方式http://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databasespartitionColumn must be a numeric, date, or timestamp column from the table in question.partitionColumn, lowerBound, upperBound These optio
复制链接

扫一扫