If provided paths are partition directories,
please set "basePath" in the options of the data source to specify the root directory of the table.
If there are multiple root directories, please load them separately and then union them.
翻译过来就是:如果提供的路径是分区路径,那么请在数据源的option中设置“basePath”来单独指定表的根路径;如果根路径不同,那么就分别加载数据,然后采用union的方式加数据合并。
//方法一
val basePath="hdfs://hadoop01:9000/user/hive/warehouse/"
sparkSession.read
.option("basePath", basePath)
.parquet(basePath + "date=2019-09-*")
//方法二:
val HDFS_PATH="hdfs://hadoop01:9000/user/hive/warehouse/date=2019-"
sparkSession.read.parquet(HDFS_PATH + "09-13")
.union(sparkSession.read.parquet(HDFS_PATH + "09-09"))
.union(sparkSession.read.parquet(HDFS_PATH + "09-07"))
.union(sparkSession.read.parquet(HDFS_PATH + "09-08"))