spark如何读取某一个月的数据？

最新推荐文章于 2023-12-20 18:51:28 发布

卢子墨

最新推荐文章于 2023-12-20 18:51:28 发布

阅读量867

点赞数 1

分类专栏： Spark原理、实战、总结文章标签： spark

本文链接：https://blog.csdn.net/lukabruce/article/details/101757466

版权

Spark原理、实战、总结专栏收录该内容

56 篇文章 4 订阅 ¥39.90 ¥99.00

订阅专栏

超级会员免费看

这篇博客介绍了如何在Spark中针对分区路径设定basePath，以便读取特定月份的数据。当根路径不同时，建议通过加载各个路径的数据并使用union进行合并。

摘要由CSDN通过智能技术生成

If provided paths are partition directories, 
please set "basePath" in the options of the data source to specify the root directory of the table. 
If there are multiple root directories, please load them separately and then union them.

翻译过来就是：如果提供的路径是分区路径，那么请在数据源的option中设置“basePath”来单独指定表的根路径；如果根路径不同，那么就分别加载数据，然后采用union的方式加数据合并。

//方法一

val basePath="hdfs://hadoop01:9000/user/hive/warehouse/"

sparkSession.read
.option("basePath", basePath)
.parquet(basePath + "date=2019-09-*")

//方法二：

val HDFS_PATH="hdfs://hadoop01:9000/user/hive/warehouse/date=2019-"
sparkSession.read.parquet(HDFS_PATH + "09-13")
   .union(sparkSession.read.parquet(HDFS_PATH + "09-09"))  
   .union(sparkSession.read.parquet(HDFS_PATH + "09-07"))       
   .union(sparkSession.read.parquet(HDFS_PATH + "09-08"))

卢子墨

关注

1
点赞
踩
2

收藏

觉得还不错? 一键收藏
打赏
0
评论
spark如何读取某一个月的数据？

If provided paths are partition directories, please set "basePath" in the options of the data source to specify the root directory of the table. If there are multiple root directories, please load ...
复制链接

扫一扫