spark读取文件源码分析-3

本篇是spark read一个parquet源码分析的第三篇,这一篇主要介绍spark的默认的partition的设置逻辑,当然,这一篇实际上算不上源码分析了
第一篇
第二篇

1 . userProfileSource 的partition数量决定因素

这一块儿之前总是看到说是由文件的大小决定的,每个block是一个partition(一般是128M,可以在hdfs上设置),但是分析一般是对于单个文件做的分析,这里尝试对多个文件进行分析。

1. 增加调试代码

为了调试rdd partition的具体信息,增加了下面的代码

userProfileSource.javaRDD().getNumPartitions();

代码变成


public class UserProfileTest {

    static String filePath = "hdfs://test:9000/user/daily/20200828/*.parquet";
    public static void main(String[] args) {
        SparkConf sparkConf = new SparkConf()
                .setMaster("local")
                .setAppName("user_profile_test")
                .set(ConfigurationOptions.ES_NODES, "")
                .set(ConfigurationOptions.ES_PORT, "")
                .set(ConfigurationOptions.ES_MAPPING_ID, "uid");

        SparkSession sparkSession = SparkSession.builder().config(sparkConf).getOrCreate();
        Dataset<Row> userProfileSource = sparkSession.read().parquet(filePath);
        userProfileSource.javaRDD().getNumPartitions();
        userProfileSource.count();
        userProfileSource.write().parquet("hdfs:///user/daily/result2020091008/");
    }
}

2. 对应目录下的文件信息


# hsl -h hdfs://test:9000/user/daily/20200828/ |head
    184.2 M 2020-09-02 21:19 hdfs://test:9000/user/daily/20200828/part-00000-41ca-c000.snappy.parquet
    182.2 M 2020-09-02 21:19 hdfs://test:9000/user/daily/20200828/part-00001-41ca-c000.snappy.parquet
    183.3 M 2020-09-02 21:19 hdfs://test:9000/user/daily/20200828/part-00002-41ca-c000.snappy.parquet
    183.2 M 2020-09-02 21:19 hdfs://test:9000/user/daily/20200828/part-00003-41ca-c000.snappy.parquet
    183.6 M 2020-09-02 21:19 hdfs://test:9000/user/daily/20200828/part-00004-41ca-c000.snappy.parquet
    182.7 M 2020-09-02 21:19 hdfs://test:9000/user/daily/20200828/part-00005-41ca-c000.snappy.parquet
    183.0 M 2020-09-02 21:19 hdfs://test:9000/user/daily/20200828/part-00006-41ca-c000.snappy.parquet
    182.1 M 2020-09-02 21:19 hdfs://test:9000/user/daily/20200828/part-00007-41ca-c000.snappy.parquet

可以看到文件的大小基本都是180M,总共有100个这样的文件。

3. rdd的partition信息

在调试中和程序运行中也可以看到(对应的job图),rdd有150个partition,为了方便把前文中的DAG图再这里再粘贴一遍。下图中count的第一个stage就是有150个task运行。
在这里插入图片描述

这里我就不再仔细的调试rdd生成的逻辑了,直接给出rdd的分区信息,这个分区信息是通过debug模式从rdd中获取到的,下面显示了150个partition的具体情况。


result = {Partition[150]@13053}
 0 = {FilePartition@13054} "FilePartition(0,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00000-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 1 = {FilePartition@13055} "FilePartition(1,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00001-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 2 = {FilePartition@13056} "FilePartition(2,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00002-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 3 = {FilePartition@13057} "FilePartition(3,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00003-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 4 = {FilePartition@13058} "FilePartition(4,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00004-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 5 = {FilePartition@13059} "FilePartition(5,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00005-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 6 = {FilePartition@13060} "FilePartition(6,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00006-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 7 = {FilePartition@13061} "FilePartition(7,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00007-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 8 = {FilePartition@13062} "FilePartition(8,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00008-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 9 = {FilePartition@13063} "FilePartition(9,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00009-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 10 = {FilePartition@13064} "FilePartition(10,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00010-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 11 = {FilePartition@13065} "FilePartition(11,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00011-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 12 = {FilePartition@13066} "FilePartition(12,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00012-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 13 = {FilePartition@13067} "FilePartition(13,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00013-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 14 = {FilePartition@13068} "FilePartition(14,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00014-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 15 = {FilePartition@13069} "FilePartition(15,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00015-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 16 = {FilePartition@13070} "FilePartition(16,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00016-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 17 = {FilePartition@13071} "FilePartition(17,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00017-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 18 = {FilePartition@13072} "FilePartition(18,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00018-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 19 = {FilePartition@13073} "FilePartition(19,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00019-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 20 = {FilePartition@13074} "FilePartition(20,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00020-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 21 = {FilePartition@13075} "FilePartition(21,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00021-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 22 = {FilePartition@13076} "FilePartition(22,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00022-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 23 = {FilePartition@13077} "FilePartition(23,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00023-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 24 = {FilePartition@13078} "FilePartition(24,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00024-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 25 = {FilePartition@13079} "FilePartition(25,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00025-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 26 = {FilePartition@13080} "FilePartition(26,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00026-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 27 = {FilePartition@13081} "FilePartition(27,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00027-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 28 = {FilePartition@13082} "FilePartition(28,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00028-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 29 = {FilePartition@13083} "FilePartition(29,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00029-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 30 = {FilePartition@13084} "FilePartition(30,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00030-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 31 = {FilePartition@13085} "FilePartition(31,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00031-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 32 = {FilePartition@13086} "FilePartition(32,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00032-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 33 = {FilePartition@13087} "FilePartition(33,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00033-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 34 = {FilePartition@13088} "FilePartition(34,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00034-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 35 = {FilePartition@13089} "FilePartition(35,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00035-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 36 = {FilePartition@13090} "FilePartition(36,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00036-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 37 = {FilePartition@13091} "FilePartition(37,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00037-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 38 = {FilePartition@13092} "FilePartition(38,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00038-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 39 = {FilePartition@13093} "FilePartition(39,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00039-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 40 = {FilePartition@13094} "FilePartition(40,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00040-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 41 = {FilePartition@13095} "FilePartition(41,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00041-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 42 = {FilePartition@13096} "FilePartition(42,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00042-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 43 = {FilePartition@13097} "FilePartition(43,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00043-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 44 = {FilePartition@13098} "FilePartition(44,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00044-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 45 = {FilePartition@13099} "FilePartition(45,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00045-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 46 = {FilePartition@13100} "FilePartition(46,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00046-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 47 = {FilePartition@13101} "FilePartition(47,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00047-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 48 = {FilePartition@13102} "FilePartition(48,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00048-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 49 = {FilePartition@13103} "FilePartition(49,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00049-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 50 = {FilePartition@13104} "FilePartition(50,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00050-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 51 = {FilePartition@13105} "FilePartition(51,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00051-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 52 = {FilePartition@13106} "FilePartition(52,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00052-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 53 = {FilePartition@13107} "FilePartition(53,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00053-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 54 = {FilePartition@13108} "FilePartition(54,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00054-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 55 = {FilePartition@13109} "FilePartition(55,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00055-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 56 = {FilePartition@13110} "FilePartition(56,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00056-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 57 = {FilePartition@13111} "FilePartition(57,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00057-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 58 = {FilePartition@13112} "FilePartition(58,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00058-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 59 = {FilePartition@13113} "FilePartition(59,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00059-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 60 = {FilePartition@13114} "FilePartition(60,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00060-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 61 = {FilePartition@13115} "FilePartition(61,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00061-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 62 = {FilePartition@13116} "FilePartition(62,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00062-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 63 = {FilePartition@13117} "FilePartition(63,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00063-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 64 = {FilePartition@13118} "FilePartition(64,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00064-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 65 = {FilePartition@13119} "FilePartition(65,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00065-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 66 = {FilePartition@13120} "FilePartition(66,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00066-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 67 = {FilePartition@13121} "FilePartition(67,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00067-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 68 = {FilePartition@13122} "FilePartition(68,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00068-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 69 = {FilePartition@13123} "FilePartition(69,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00069-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 70 = {FilePartition@13124} "FilePartition(70,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00070-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 71 = {FilePartition@13125} "FilePartition(71,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00071-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 72 = {FilePartition@13126} "FilePartition(72,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00072-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 73 = {FilePartition@13127} "FilePartition(73,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00073-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 74 = {FilePartition@13128} "FilePartition(74,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00074-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 75 = {FilePartition@13129} "FilePartition(75,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00075-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 76 = {FilePartition@13130} "FilePartition(76,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00076-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 77 = {FilePartition@13131} "FilePartition(77,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00077-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 78 = {FilePartition@13132} "FilePartition(78,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00078-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 79 = {FilePartition@13133} "FilePartition(79,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00079-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 80 = {FilePartition@13134} "FilePartition(80,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00080-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 81 = {FilePartition@13135} "FilePartition(81,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00081-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 82 = {FilePartition@13136} "FilePartition(82,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00082-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 83 = {FilePartition@13137} "FilePartition(83,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00083-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 84 = {FilePartition@13138} "FilePartition(84,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00084-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 85 = {FilePartition@13139} "FilePartition(85,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00085-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 86 = {FilePartition@13140} "FilePartition(86,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00086-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 87 = {FilePartition@13141} "FilePartition(87,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00087-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 88 = {FilePartition@13142} "FilePartition(88,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00088-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 89 = {FilePartition@13143} "FilePartition(89,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00089-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 90 = {FilePartition@13144} "FilePartition(90,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00090-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 91 = {FilePartition@13145} "FilePartition(91,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00091-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 92 = {FilePartition@13146} "FilePartition(92,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00092-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 93 = {FilePartition@13147} "FilePartition(93,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00093-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 94 = {FilePartition@13148} "FilePartition(94,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00094-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 95 = {FilePartition@13149} "FilePartition(95,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00095-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 96 = {FilePartition@13150} "FilePartition(96,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00096-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 97 = {FilePartition@13151} "FilePartition(97,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00097-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 98 = {FilePartition@13152} "FilePartition(98,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00098-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 99 = {FilePartition@13153} "FilePartition(99,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00099-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
 100 = {FilePartition@13265} "FilePartition(100,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00020-41ca-c000.snappy.parquet, range: 134217728-195123990, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00009-41ca-c000.snappy.parquet, range: 134217728-195046711, partition values: [empty row]))"
 101 = {FilePartition@13266} "FilePartition(101,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00051-41ca-c000.snappy.parquet, range: 134217728-194146721, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00078-41ca-c000.snappy.parquet, range: 134217728-193958286, partition values: [empty row]))"
 102 = {FilePartition@13267} "FilePartition(102,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00072-41ca-c000.snappy.parquet, range: 134217728-193883957, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00017-41ca-c000.snappy.parquet, range: 134217728-193740729, partition values: [empty row]))"
 103 = {FilePartition@13268} "FilePartition(103,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00052-41ca-c000.snappy.parquet, range: 134217728-193733952, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00035-41ca-c000.snappy.parquet, range: 134217728-193630805, partition values: [empty row]))"
 104 = {FilePartition@13269} "FilePartition(104,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00060-41ca-c000.snappy.parquet, range: 134217728-193595463, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00040-41ca-c000.snappy.parquet, range: 134217728-193582950, partition values: [empty row]))"
 105 = {FilePartition@13270} "FilePartition(105,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00097-41ca-c000.snappy.parquet, range: 134217728-193553823, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00088-41ca-c000.snappy.parquet, range: 134217728-193318554, partition values: [empty row]))"
 106 = {FilePartition@13271} "FilePartition(106,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00028-41ca-c000.snappy.parquet, range: 134217728-193247136, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00081-41ca-c000.snappy.parquet, range: 134217728-193238350, partition values: [empty row]))"
 107 = {FilePartition@13272} "FilePartition(107,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00083-41ca-c000.snappy.parquet, range: 134217728-193192582, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00057-41ca-c000.snappy.parquet, range: 134217728-193169659, partition values: [empty row]))"
 108 = {FilePartition@13273} "FilePartition(108,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00000-41ca-c000.snappy.parquet, range: 134217728-193154555, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00096-41ca-c000.snappy.parquet, range: 134217728-193056440, partition values: [empty row]))"
 109 = {FilePartition@13274} "FilePartition(109,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00027-41ca-c000.snappy.parquet, range: 134217728-193040434, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00089-41ca-c000.snappy.parquet, range: 134217728-193010886, partition values: [empty row]))"
 110 = {FilePartition@13275} "FilePartition(110,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00012-41ca-c000.snappy.parquet, range: 134217728-192989901, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00099-41ca-c000.snappy.parquet, range: 134217728-192957150, partition values: [empty row]))"
 111 = {FilePartition@13276} "FilePartition(111,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00021-41ca-c000.snappy.parquet, range: 134217728-192890064, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00091-41ca-c000.snappy.parquet, range: 134217728-192871474, partition values: [empty row]))"
 112 = {FilePartition@13277} "FilePartition(112,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00030-41ca-c000.snappy.parquet, range: 134217728-192862266, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00058-41ca-c000.snappy.parquet, range: 134217728-192846129, partition values: [empty row]))"
 113 = {FilePartition@13278} "FilePartition(113,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00092-41ca-c000.snappy.parquet, range: 134217728-192831718, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00049-41ca-c000.snappy.parquet, range: 134217728-192826946, partition values: [empty row]))"
 114 = {FilePartition@13279} "FilePartition(114,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00087-41ca-c000.snappy.parquet, range: 134217728-192797029, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00053-41ca-c000.snappy.parquet, range: 134217728-192723879, partition values: [empty row]))"
 115 = {FilePartition@13280} "FilePartition(115,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00064-41ca-c000.snappy.parquet, range: 134217728-192715391, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00042-41ca-c000.snappy.parquet, range: 134217728-192710676, partition values: [empty row]))"
 116 = {FilePartition@13281} "FilePartition(116,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00055-41ca-c000.snappy.parquet, range: 134217728-192685175, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00033-41ca-c000.snappy.parquet, range: 134217728-192652367, partition values: [empty row]))"
 117 = {FilePartition@13282} "FilePartition(117,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00056-41ca-c000.snappy.parquet, range: 134217728-192639026, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00036-41ca-c000.snappy.parquet, range: 134217728-192636886, partition values: [empty row]))"
 118 = {FilePartition@13283} "FilePartition(118,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00077-41ca-c000.snappy.parquet, range: 134217728-192628108, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00037-41ca-c000.snappy.parquet, range: 134217728-192619533, partition values: [empty row]))"
 119 = {FilePartition@13284} "FilePartition(119,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00016-41ca-c000.snappy.parquet, range: 134217728-192610267, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00004-41ca-c000.snappy.parquet, range: 134217728-192553300, partition values: [empty row]))"
 120 = {FilePartition@13285} "FilePartition(120,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00038-41ca-c000.snappy.parquet, range: 134217728-192550224, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00079-41ca-c000.snappy.parquet, range: 134217728-192528242, partition values: [empty row]))"
 121 = {FilePartition@13286} "FilePartition(121,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00048-41ca-c000.snappy.parquet, range: 134217728-192518314, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00014-41ca-c000.snappy.parquet, range: 134217728-192492734, partition values: [empty row]))"
 122 = {FilePartition@13287} "FilePartition(122,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00043-41ca-c000.snappy.parquet, range: 134217728-192488828, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00075-41ca-c000.snappy.parquet, range: 134217728-192457423, partition values: [empty row]))"
 123 = {FilePartition@13288} "FilePartition(123,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00080-41ca-c000.snappy.parquet, range: 134217728-192436303, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00086-41ca-c000.snappy.parquet, range: 134217728-192422791, partition values: [empty row]))"
 124 = {FilePartition@13289} "FilePartition(124,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00010-41ca-c000.snappy.parquet, range: 134217728-192401383, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00031-41ca-c000.snappy.parquet, range: 134217728-192390802, partition values: [empty row]))"
 125 = {FilePartition@13290} "FilePartition(125,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00073-41ca-c000.snappy.parquet, range: 134217728-192384328, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00068-41ca-c000.snappy.parquet, range: 134217728-192373709, partition values: [empty row]))"
 126 = {FilePartition@13291} "FilePartition(126,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00045-41ca-c000.snappy.parquet, range: 134217728-192324641, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00074-41ca-c000.snappy.parquet, range: 134217728-192320288, partition values: [empty row]))"
 127 = {FilePartition@13292} "FilePartition(127,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00054-41ca-c000.snappy.parquet, range: 134217728-192310762, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00013-41ca-c000.snappy.parquet, range: 134217728-192309053, partition values: [empty row]))"
 128 = {FilePartition@13293} "FilePartition(128,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00029-41ca-c000.snappy.parquet, range: 134217728-192307665, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00085-41ca-c000.snappy.parquet, range: 134217728-192300557, partition values: [empty row]))"
 129 = {FilePartition@13294} "FilePartition(129,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00023-41ca-c000.snappy.parquet, range: 134217728-192287705, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00026-41ca-c000.snappy.parquet, range: 134217728-192244675, partition values: [empty row]))"
 130 = {FilePartition@13295} "FilePartition(130,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00002-41ca-c000.snappy.parquet, range: 134217728-192242883, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00032-41ca-c000.snappy.parquet, range: 134217728-192218855, partition values: [empty row]))"
 131 = {FilePartition@13296} "FilePartition(131,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00062-41ca-c000.snappy.parquet, range: 134217728-192200827, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00050-41ca-c000.snappy.parquet, range: 134217728-192191229, partition values: [empty row]))"
 132 = {FilePartition@13297} "FilePartition(132,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00069-41ca-c000.snappy.parquet, range: 134217728-192169417, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00098-41ca-c000.snappy.parquet, range: 134217728-192160782, partition values: [empty row]))"
 133 = {FilePartition@13298} "FilePartition(133,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00025-41ca-c000.snappy.parquet, range: 134217728-192152594, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00003-41ca-c000.snappy.parquet, range: 134217728-192146313, partition values: [empty row]))"
 134 = {FilePartition@13299} "FilePartition(134,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00095-41ca-c000.snappy.parquet, range: 134217728-192096023, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00047-41ca-c000.snappy.parquet, range: 134217728-192084102, partition values: [empty row]))"
 135 = {FilePartition@13300} "FilePartition(135,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00008-41ca-c000.snappy.parquet, range: 134217728-192046162, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00046-41ca-c000.snappy.parquet, range: 134217728-192036123, partition values: [empty row]))"
 136 = {FilePartition@13301} "FilePartition(136,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00090-41ca-c000.snappy.parquet, range: 134217728-191994182, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00066-41ca-c000.snappy.parquet, range: 134217728-191989716, partition values: [empty row]))"
 137 = {FilePartition@13302} "FilePartition(137,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00006-41ca-c000.snappy.parquet, range: 134217728-191934315, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00061-41ca-c000.snappy.parquet, range: 134217728-191865624, partition values: [empty row]))"
 138 = {FilePartition@13303} "FilePartition(138,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00015-41ca-c000.snappy.parquet, range: 134217728-191799380, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00065-41ca-c000.snappy.parquet, range: 134217728-191686604, partition values: [empty row]))"
 139 = {FilePartition@13304} "FilePartition(139,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00011-41ca-c000.snappy.parquet, range: 134217728-191638428, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00094-41ca-c000.snappy.parquet, range: 134217728-191613916, partition values: [empty row]))"
 140 = {FilePartition@13305} "FilePartition(140,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00093-41ca-c000.snappy.parquet, range: 134217728-191569247, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00005-41ca-c000.snappy.parquet, range: 134217728-191566051, partition values: [empty row]))"
 141 = {FilePartition@13306} "FilePartition(141,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00070-41ca-c000.snappy.parquet, range: 134217728-191540698, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00022-41ca-c000.snappy.parquet, range: 134217728-191510871, partition values: [empty row]))"
 142 = {FilePartition@13307} "FilePartition(142,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00024-41ca-c000.snappy.parquet, range: 134217728-191501951, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00041-41ca-c000.snappy.parquet, range: 134217728-191431095, partition values: [empty row]))"
 143 = {FilePartition@13308} "FilePartition(143,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00019-41ca-c000.snappy.parquet, range: 134217728-191423121, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00063-41ca-c000.snappy.parquet, range: 134217728-191381412, partition values: [empty row]))"
 144 = {FilePartition@13309} "FilePartition(144,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00067-41ca-c000.snappy.parquet, range: 134217728-191378487, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00039-41ca-c000.snappy.parquet, range: 134217728-191304847, partition values: [empty row]))"
 145 = {FilePartition@13310} "FilePartition(145,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00018-41ca-c000.snappy.parquet, range: 134217728-191205335, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00034-41ca-c000.snappy.parquet, range: 134217728-191131303, partition values: [empty row]))"
 146 = {FilePartition@13311} "FilePartition(146,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00084-41ca-c000.snappy.parquet, range: 134217728-191112887, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00001-41ca-c000.snappy.parquet, range: 134217728-191061715, partition values: [empty row]))"
 147 = {FilePartition@13312} "FilePartition(147,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00044-41ca-c000.snappy.parquet, range: 134217728-191023822, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00007-41ca-c000.snappy.parquet, range: 134217728-190977337, partition values: [empty row]))"
 148 = {FilePartition@13313} "FilePartition(148,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00059-41ca-c000.snappy.parquet, range: 134217728-190961642, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00071-41ca-c000.snappy.parquet, range: 134217728-190749907, partition values: [empty row]))"
 149 = {FilePartition@13314} "FilePartition(149,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00076-41ca-c000.snappy.parquet, range: 134217728-190698048, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00082-41ca-c000.snappy.parquet, range: 134217728-190665081, partition values: [empty row]))"

这里我们首先关注前100个partition,这些partition对应的文件都只有一个,对应的range都是range: 0-134217728,对应的是128M,也就是hdfs存储的一个block。

再看后面的50个partition的情况
这里的每个partition对应的都是两个文件,拿最后一个partition,partition149来说

 149 = {FilePartition@13314}
"FilePartition(149,WrappedArray(

path: hdfs://test:9000/user/daily/20200828/part-00076-41ca-c000.snappy.parquet, range: 134217728-190698048, partition values: [empty row],
path: hdfs://test:9000/user/daily/20200828/part-00082-41ca-c000.snappy.parquet, range: 134217728-190665081, partition values: [empty row]

))"

对应的是两个文件,分别是range: 134217728-19069804853.8M 和range: 134217728-19066508153.8M
对应的是两个文件的第二个block,猜测一下为何这两个文件会被分配到同一个partition呢,我想应该是因为他们在hdfs上面属于同一个data-node,这样的话可以认为两个数据是放在一起的。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值