spark读取单个文件和多个文件的区别

夜月行者

已于 2022-07-23 16:49:23 修改

阅读量207

点赞数

分类专栏： spark 文章标签： hdfs spark hadoop

于 2020-09-03 14:11:38 首次发布

本文链接：https://blog.csdn.net/u013200380/article/details/108381673

版权

spark 专栏收录该内容

17 篇文章 1 订阅

订阅专栏

public class UserProfileTest {

    //static String filePath = "hdfs:///user/daily/20200828/*.parquet";
    static String filePath = "/user/daily/20200828/part-00057-0e0dc5b5-5061-41ca-9fa6-9fb7b3e09e98-c000.snappy.parquet";
    public static void main(String[] args) {
        SparkConf sparkConf = new SparkConf()
                .setMaster("local")
                .setAppName("user_profile_test")
                .set(ConfigurationOptions.ES_NODES, "")
                .set(ConfigurationOptions.ES_PORT, "")
                .set(ConfigurationOptions.ES_MAPPING_ID, "uid");

        SparkSession sparkSession = SparkSession.builder().config(sparkConf).getOrCreate();
        Dataset<Row> userProfileSource = sparkSession.read().parquet(filePath);
        userProfileSource.count();
        userProfileSource.write().parquet("hdfs:///user/daily/result2020082808/");
    }
}

对应的parquet的read变成了一个job，一个stage
在这里插入图片描述

好像是在文件数大于32的时候会多一个job来执行Listing leaf files and directories

关注博主即可阅读全文

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

夜月行者

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
spark读取单个文件和多个文件的区别

public class UserProfileTest { //static String filePath = "hdfs:///user/daily/20200828/*.parquet"; static String filePath = "/user/daily/20200828/part-00057-0e0dc5b5-5061-41ca-9fa6-9fb7b3e09e98-c000.snappy.parquet"; public static void main(Str
复制链接

扫一扫