本篇是spark read一个parquet源码分析的第三篇,这一篇主要介绍spark的默认的partition的设置逻辑,当然,这一篇实际上算不上源码分析了
第一篇
第二篇
1 . userProfileSource 的partition数量决定因素
这一块儿之前总是看到说是由文件的大小决定的,每个block是一个partition(一般是128M,可以在hdfs上设置),但是分析一般是对于单个文件做的分析,这里尝试对多个文件进行分析。
1. 增加调试代码
为了调试rdd partition的具体信息,增加了下面的代码
userProfileSource.javaRDD().getNumPartitions();
代码变成
public class UserProfileTest {
static String filePath = "hdfs://test:9000/user/daily/20200828/*.parquet";
public static void main(String[] args) {
SparkConf sparkConf = new SparkConf()
.setMaster("local")
.setAppName("user_profile_test")
.set(ConfigurationOptions.ES_NODES, "")
.set(ConfigurationOptions.ES_PORT, "")
.set(ConfigurationOptions.ES_MAPPING_ID, "uid");
SparkSession sparkSession = SparkSession.builder().config(sparkConf).getOrCreate();
Dataset<Row> userProfileSource = sparkSession.read().parquet(filePath);
userProfileSource.javaRDD().getNumPartitions();
userProfileSource.count();
userProfileSource.write().parquet("hdfs:///user/daily/result2020091008/");
}
}
2. 对应目录下的文件信息
# hsl -h hdfs://test:9000/user/daily/20200828/ |head
184.2 M 2020-09-02 21:19 hdfs://test:9000/user/daily/20200828/part-00000-41ca-c000.snappy.parquet
182.2 M 2020-09-02 21:19 hdfs://test:9000/user/daily/20200828/part-00001-41ca-c000.snappy.parquet
183.3 M 2020-09-02 21:19 hdfs://test:9000/user/daily/20200828/part-00002-41ca-c000.snappy.parquet
183.2 M 2020-09-02 21:19 hdfs://test:9000/user/daily/20200828/part-00003-41ca-c000.snappy.parquet
183.6 M 2020-09-02 21:19 hdfs://test:9000/user/daily/20200828/part-00004-41ca-c000.snappy.parquet
182.7 M 2020-09-02 21:19 hdfs://test:9000/user/daily/20200828/part-00005-41ca-c000.snappy.parquet
183.0 M 2020-09-02 21:19 hdfs://test:9000/user/daily/20200828/part-00006-41ca-c000.snappy.parquet
182.1 M 2020-09-02 21:19 hdfs://test:9000/user/daily/20200828/part-00007-41ca-c000.snappy.parquet
可以看到文件的大小基本都是180M,总共有100个这样的文件。
3. rdd的partition信息
在调试中和程序运行中也可以看到(对应的job图),rdd有150个partition,为了方便把前文中的DAG图再这里再粘贴一遍。下图中count的第一个stage就是有150个task运行。
这里我就不再仔细的调试rdd生成的逻辑了,直接给出rdd的分区信息,这个分区信息是通过debug模式从rdd中获取到的,下面显示了150个partition的具体情况。
result = {Partition[150]@13053}
0 = {FilePartition@13054} "FilePartition(0,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00000-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
1 = {FilePartition@13055} "FilePartition(1,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00001-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
2 = {FilePartition@13056} "FilePartition(2,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00002-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
3 = {FilePartition@13057} "FilePartition(3,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00003-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
4 = {FilePartition@13058} "FilePartition(4,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00004-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
5 = {FilePartition@13059} "FilePartition(5,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00005-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
6 = {FilePartition@13060} "FilePartition(6,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00006-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
7 = {FilePartition@13061} "FilePartition(7,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00007-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
8 = {FilePartition@13062} "FilePartition(8,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00008-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
9 = {FilePartition@13063} "FilePartition(9,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00009-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
10 = {FilePartition@13064} "FilePartition(10,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00010-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
11 = {FilePartition@13065} "FilePartition(11,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00011-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
12 = {FilePartition@13066} "FilePartition(12,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00012-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
13 = {FilePartition@13067} "FilePartition(13,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00013-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
14 = {FilePartition@13068} "FilePartition(14,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00014-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
15 = {FilePartition@13069} "FilePartition(15,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00015-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
16 = {FilePartition@13070} "FilePartition(16,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00016-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
17 = {FilePartition@13071} "FilePartition(17,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00017-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
18 = {FilePartition@13072} "FilePartition(18,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00018-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
19 = {FilePartition@13073} "FilePartition(19,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00019-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
20 = {FilePartition@13074} "FilePartition(20,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00020-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
21 = {FilePartition@13075} "FilePartition(21,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00021-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
22 = {FilePartition@13076} "FilePartition(22,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00022-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
23 = {FilePartition@13077} "FilePartition(23,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00023-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
24 = {FilePartition@13078} "FilePartition(24,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00024-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
25 = {FilePartition@13079} "FilePartition(25,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00025-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
26 = {FilePartition@13080} "FilePartition(26,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00026-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
27 = {FilePartition@13081} "FilePartition(27,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00027-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
28 = {FilePartition@13082} "FilePartition(28,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00028-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
29 = {FilePartition@13083} "FilePartition(29,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00029-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
30 = {FilePartition@13084} "FilePartition(30,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00030-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
31 = {FilePartition@13085} "FilePartition(31,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00031-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
32 = {FilePartition@13086} "FilePartition(32,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00032-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
33 = {FilePartition@13087} "FilePartition(33,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00033-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
34 = {FilePartition@13088} "FilePartition(34,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00034-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
35 = {FilePartition@13089} "FilePartition(35,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00035-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
36 = {FilePartition@13090} "FilePartition(36,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00036-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
37 = {FilePartition@13091} "FilePartition(37,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00037-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
38 = {FilePartition@13092} "FilePartition(38,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00038-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
39 = {FilePartition@13093} "FilePartition(39,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00039-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
40 = {FilePartition@13094} "FilePartition(40,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00040-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
41 = {FilePartition@13095} "FilePartition(41,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00041-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
42 = {FilePartition@13096} "FilePartition(42,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00042-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
43 = {FilePartition@13097} "FilePartition(43,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00043-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
44 = {FilePartition@13098} "FilePartition(44,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00044-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
45 = {FilePartition@13099} "FilePartition(45,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00045-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
46 = {FilePartition@13100} "FilePartition(46,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00046-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
47 = {FilePartition@13101} "FilePartition(47,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00047-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
48 = {FilePartition@13102} "FilePartition(48,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00048-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
49 = {FilePartition@13103} "FilePartition(49,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00049-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
50 = {FilePartition@13104} "FilePartition(50,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00050-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
51 = {FilePartition@13105} "FilePartition(51,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00051-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
52 = {FilePartition@13106} "FilePartition(52,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00052-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
53 = {FilePartition@13107} "FilePartition(53,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00053-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
54 = {FilePartition@13108} "FilePartition(54,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00054-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
55 = {FilePartition@13109} "FilePartition(55,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00055-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
56 = {FilePartition@13110} "FilePartition(56,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00056-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
57 = {FilePartition@13111} "FilePartition(57,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00057-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
58 = {FilePartition@13112} "FilePartition(58,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00058-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
59 = {FilePartition@13113} "FilePartition(59,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00059-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
60 = {FilePartition@13114} "FilePartition(60,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00060-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
61 = {FilePartition@13115} "FilePartition(61,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00061-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
62 = {FilePartition@13116} "FilePartition(62,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00062-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
63 = {FilePartition@13117} "FilePartition(63,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00063-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
64 = {FilePartition@13118} "FilePartition(64,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00064-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
65 = {FilePartition@13119} "FilePartition(65,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00065-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
66 = {FilePartition@13120} "FilePartition(66,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00066-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
67 = {FilePartition@13121} "FilePartition(67,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00067-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
68 = {FilePartition@13122} "FilePartition(68,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00068-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
69 = {FilePartition@13123} "FilePartition(69,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00069-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
70 = {FilePartition@13124} "FilePartition(70,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00070-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
71 = {FilePartition@13125} "FilePartition(71,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00071-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
72 = {FilePartition@13126} "FilePartition(72,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00072-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
73 = {FilePartition@13127} "FilePartition(73,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00073-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
74 = {FilePartition@13128} "FilePartition(74,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00074-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
75 = {FilePartition@13129} "FilePartition(75,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00075-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
76 = {FilePartition@13130} "FilePartition(76,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00076-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
77 = {FilePartition@13131} "FilePartition(77,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00077-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
78 = {FilePartition@13132} "FilePartition(78,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00078-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
79 = {FilePartition@13133} "FilePartition(79,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00079-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
80 = {FilePartition@13134} "FilePartition(80,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00080-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
81 = {FilePartition@13135} "FilePartition(81,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00081-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
82 = {FilePartition@13136} "FilePartition(82,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00082-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
83 = {FilePartition@13137} "FilePartition(83,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00083-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
84 = {FilePartition@13138} "FilePartition(84,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00084-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
85 = {FilePartition@13139} "FilePartition(85,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00085-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
86 = {FilePartition@13140} "FilePartition(86,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00086-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
87 = {FilePartition@13141} "FilePartition(87,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00087-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
88 = {FilePartition@13142} "FilePartition(88,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00088-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
89 = {FilePartition@13143} "FilePartition(89,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00089-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
90 = {FilePartition@13144} "FilePartition(90,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00090-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
91 = {FilePartition@13145} "FilePartition(91,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00091-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
92 = {FilePartition@13146} "FilePartition(92,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00092-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
93 = {FilePartition@13147} "FilePartition(93,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00093-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
94 = {FilePartition@13148} "FilePartition(94,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00094-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
95 = {FilePartition@13149} "FilePartition(95,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00095-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
96 = {FilePartition@13150} "FilePartition(96,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00096-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
97 = {FilePartition@13151} "FilePartition(97,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00097-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
98 = {FilePartition@13152} "FilePartition(98,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00098-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
99 = {FilePartition@13153} "FilePartition(99,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00099-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"
100 = {FilePartition@13265} "FilePartition(100,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00020-41ca-c000.snappy.parquet, range: 134217728-195123990, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00009-41ca-c000.snappy.parquet, range: 134217728-195046711, partition values: [empty row]))"
101 = {FilePartition@13266} "FilePartition(101,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00051-41ca-c000.snappy.parquet, range: 134217728-194146721, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00078-41ca-c000.snappy.parquet, range: 134217728-193958286, partition values: [empty row]))"
102 = {FilePartition@13267} "FilePartition(102,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00072-41ca-c000.snappy.parquet, range: 134217728-193883957, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00017-41ca-c000.snappy.parquet, range: 134217728-193740729, partition values: [empty row]))"
103 = {FilePartition@13268} "FilePartition(103,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00052-41ca-c000.snappy.parquet, range: 134217728-193733952, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00035-41ca-c000.snappy.parquet, range: 134217728-193630805, partition values: [empty row]))"
104 = {FilePartition@13269} "FilePartition(104,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00060-41ca-c000.snappy.parquet, range: 134217728-193595463, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00040-41ca-c000.snappy.parquet, range: 134217728-193582950, partition values: [empty row]))"
105 = {FilePartition@13270} "FilePartition(105,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00097-41ca-c000.snappy.parquet, range: 134217728-193553823, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00088-41ca-c000.snappy.parquet, range: 134217728-193318554, partition values: [empty row]))"
106 = {FilePartition@13271} "FilePartition(106,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00028-41ca-c000.snappy.parquet, range: 134217728-193247136, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00081-41ca-c000.snappy.parquet, range: 134217728-193238350, partition values: [empty row]))"
107 = {FilePartition@13272} "FilePartition(107,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00083-41ca-c000.snappy.parquet, range: 134217728-193192582, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00057-41ca-c000.snappy.parquet, range: 134217728-193169659, partition values: [empty row]))"
108 = {FilePartition@13273} "FilePartition(108,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00000-41ca-c000.snappy.parquet, range: 134217728-193154555, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00096-41ca-c000.snappy.parquet, range: 134217728-193056440, partition values: [empty row]))"
109 = {FilePartition@13274} "FilePartition(109,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00027-41ca-c000.snappy.parquet, range: 134217728-193040434, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00089-41ca-c000.snappy.parquet, range: 134217728-193010886, partition values: [empty row]))"
110 = {FilePartition@13275} "FilePartition(110,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00012-41ca-c000.snappy.parquet, range: 134217728-192989901, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00099-41ca-c000.snappy.parquet, range: 134217728-192957150, partition values: [empty row]))"
111 = {FilePartition@13276} "FilePartition(111,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00021-41ca-c000.snappy.parquet, range: 134217728-192890064, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00091-41ca-c000.snappy.parquet, range: 134217728-192871474, partition values: [empty row]))"
112 = {FilePartition@13277} "FilePartition(112,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00030-41ca-c000.snappy.parquet, range: 134217728-192862266, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00058-41ca-c000.snappy.parquet, range: 134217728-192846129, partition values: [empty row]))"
113 = {FilePartition@13278} "FilePartition(113,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00092-41ca-c000.snappy.parquet, range: 134217728-192831718, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00049-41ca-c000.snappy.parquet, range: 134217728-192826946, partition values: [empty row]))"
114 = {FilePartition@13279} "FilePartition(114,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00087-41ca-c000.snappy.parquet, range: 134217728-192797029, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00053-41ca-c000.snappy.parquet, range: 134217728-192723879, partition values: [empty row]))"
115 = {FilePartition@13280} "FilePartition(115,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00064-41ca-c000.snappy.parquet, range: 134217728-192715391, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00042-41ca-c000.snappy.parquet, range: 134217728-192710676, partition values: [empty row]))"
116 = {FilePartition@13281} "FilePartition(116,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00055-41ca-c000.snappy.parquet, range: 134217728-192685175, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00033-41ca-c000.snappy.parquet, range: 134217728-192652367, partition values: [empty row]))"
117 = {FilePartition@13282} "FilePartition(117,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00056-41ca-c000.snappy.parquet, range: 134217728-192639026, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00036-41ca-c000.snappy.parquet, range: 134217728-192636886, partition values: [empty row]))"
118 = {FilePartition@13283} "FilePartition(118,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00077-41ca-c000.snappy.parquet, range: 134217728-192628108, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00037-41ca-c000.snappy.parquet, range: 134217728-192619533, partition values: [empty row]))"
119 = {FilePartition@13284} "FilePartition(119,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00016-41ca-c000.snappy.parquet, range: 134217728-192610267, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00004-41ca-c000.snappy.parquet, range: 134217728-192553300, partition values: [empty row]))"
120 = {FilePartition@13285} "FilePartition(120,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00038-41ca-c000.snappy.parquet, range: 134217728-192550224, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00079-41ca-c000.snappy.parquet, range: 134217728-192528242, partition values: [empty row]))"
121 = {FilePartition@13286} "FilePartition(121,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00048-41ca-c000.snappy.parquet, range: 134217728-192518314, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00014-41ca-c000.snappy.parquet, range: 134217728-192492734, partition values: [empty row]))"
122 = {FilePartition@13287} "FilePartition(122,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00043-41ca-c000.snappy.parquet, range: 134217728-192488828, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00075-41ca-c000.snappy.parquet, range: 134217728-192457423, partition values: [empty row]))"
123 = {FilePartition@13288} "FilePartition(123,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00080-41ca-c000.snappy.parquet, range: 134217728-192436303, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00086-41ca-c000.snappy.parquet, range: 134217728-192422791, partition values: [empty row]))"
124 = {FilePartition@13289} "FilePartition(124,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00010-41ca-c000.snappy.parquet, range: 134217728-192401383, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00031-41ca-c000.snappy.parquet, range: 134217728-192390802, partition values: [empty row]))"
125 = {FilePartition@13290} "FilePartition(125,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00073-41ca-c000.snappy.parquet, range: 134217728-192384328, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00068-41ca-c000.snappy.parquet, range: 134217728-192373709, partition values: [empty row]))"
126 = {FilePartition@13291} "FilePartition(126,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00045-41ca-c000.snappy.parquet, range: 134217728-192324641, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00074-41ca-c000.snappy.parquet, range: 134217728-192320288, partition values: [empty row]))"
127 = {FilePartition@13292} "FilePartition(127,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00054-41ca-c000.snappy.parquet, range: 134217728-192310762, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00013-41ca-c000.snappy.parquet, range: 134217728-192309053, partition values: [empty row]))"
128 = {FilePartition@13293} "FilePartition(128,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00029-41ca-c000.snappy.parquet, range: 134217728-192307665, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00085-41ca-c000.snappy.parquet, range: 134217728-192300557, partition values: [empty row]))"
129 = {FilePartition@13294} "FilePartition(129,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00023-41ca-c000.snappy.parquet, range: 134217728-192287705, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00026-41ca-c000.snappy.parquet, range: 134217728-192244675, partition values: [empty row]))"
130 = {FilePartition@13295} "FilePartition(130,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00002-41ca-c000.snappy.parquet, range: 134217728-192242883, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00032-41ca-c000.snappy.parquet, range: 134217728-192218855, partition values: [empty row]))"
131 = {FilePartition@13296} "FilePartition(131,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00062-41ca-c000.snappy.parquet, range: 134217728-192200827, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00050-41ca-c000.snappy.parquet, range: 134217728-192191229, partition values: [empty row]))"
132 = {FilePartition@13297} "FilePartition(132,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00069-41ca-c000.snappy.parquet, range: 134217728-192169417, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00098-41ca-c000.snappy.parquet, range: 134217728-192160782, partition values: [empty row]))"
133 = {FilePartition@13298} "FilePartition(133,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00025-41ca-c000.snappy.parquet, range: 134217728-192152594, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00003-41ca-c000.snappy.parquet, range: 134217728-192146313, partition values: [empty row]))"
134 = {FilePartition@13299} "FilePartition(134,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00095-41ca-c000.snappy.parquet, range: 134217728-192096023, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00047-41ca-c000.snappy.parquet, range: 134217728-192084102, partition values: [empty row]))"
135 = {FilePartition@13300} "FilePartition(135,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00008-41ca-c000.snappy.parquet, range: 134217728-192046162, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00046-41ca-c000.snappy.parquet, range: 134217728-192036123, partition values: [empty row]))"
136 = {FilePartition@13301} "FilePartition(136,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00090-41ca-c000.snappy.parquet, range: 134217728-191994182, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00066-41ca-c000.snappy.parquet, range: 134217728-191989716, partition values: [empty row]))"
137 = {FilePartition@13302} "FilePartition(137,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00006-41ca-c000.snappy.parquet, range: 134217728-191934315, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00061-41ca-c000.snappy.parquet, range: 134217728-191865624, partition values: [empty row]))"
138 = {FilePartition@13303} "FilePartition(138,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00015-41ca-c000.snappy.parquet, range: 134217728-191799380, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00065-41ca-c000.snappy.parquet, range: 134217728-191686604, partition values: [empty row]))"
139 = {FilePartition@13304} "FilePartition(139,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00011-41ca-c000.snappy.parquet, range: 134217728-191638428, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00094-41ca-c000.snappy.parquet, range: 134217728-191613916, partition values: [empty row]))"
140 = {FilePartition@13305} "FilePartition(140,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00093-41ca-c000.snappy.parquet, range: 134217728-191569247, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00005-41ca-c000.snappy.parquet, range: 134217728-191566051, partition values: [empty row]))"
141 = {FilePartition@13306} "FilePartition(141,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00070-41ca-c000.snappy.parquet, range: 134217728-191540698, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00022-41ca-c000.snappy.parquet, range: 134217728-191510871, partition values: [empty row]))"
142 = {FilePartition@13307} "FilePartition(142,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00024-41ca-c000.snappy.parquet, range: 134217728-191501951, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00041-41ca-c000.snappy.parquet, range: 134217728-191431095, partition values: [empty row]))"
143 = {FilePartition@13308} "FilePartition(143,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00019-41ca-c000.snappy.parquet, range: 134217728-191423121, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00063-41ca-c000.snappy.parquet, range: 134217728-191381412, partition values: [empty row]))"
144 = {FilePartition@13309} "FilePartition(144,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00067-41ca-c000.snappy.parquet, range: 134217728-191378487, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00039-41ca-c000.snappy.parquet, range: 134217728-191304847, partition values: [empty row]))"
145 = {FilePartition@13310} "FilePartition(145,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00018-41ca-c000.snappy.parquet, range: 134217728-191205335, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00034-41ca-c000.snappy.parquet, range: 134217728-191131303, partition values: [empty row]))"
146 = {FilePartition@13311} "FilePartition(146,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00084-41ca-c000.snappy.parquet, range: 134217728-191112887, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00001-41ca-c000.snappy.parquet, range: 134217728-191061715, partition values: [empty row]))"
147 = {FilePartition@13312} "FilePartition(147,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00044-41ca-c000.snappy.parquet, range: 134217728-191023822, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00007-41ca-c000.snappy.parquet, range: 134217728-190977337, partition values: [empty row]))"
148 = {FilePartition@13313} "FilePartition(148,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00059-41ca-c000.snappy.parquet, range: 134217728-190961642, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00071-41ca-c000.snappy.parquet, range: 134217728-190749907, partition values: [empty row]))"
149 = {FilePartition@13314} "FilePartition(149,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00076-41ca-c000.snappy.parquet, range: 134217728-190698048, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00082-41ca-c000.snappy.parquet, range: 134217728-190665081, partition values: [empty row]))"
这里我们首先关注前100个partition,这些partition对应的文件都只有一个,对应的range都是range: 0-134217728
,对应的是128M,也就是hdfs存储的一个block。
再看后面的50个partition的情况
这里的每个partition对应的都是两个文件,拿最后一个partition,partition149来说
149 = {FilePartition@13314}
"FilePartition(149,WrappedArray(
path: hdfs://test:9000/user/daily/20200828/part-00076-41ca-c000.snappy.parquet, range: 134217728-190698048, partition values: [empty row],
path: hdfs://test:9000/user/daily/20200828/part-00082-41ca-c000.snappy.parquet, range: 134217728-190665081, partition values: [empty row]
))"
对应的是两个文件,分别是range: 134217728-190698048
53.8M 和range: 134217728-190665081
53.8M
对应的是两个文件的第二个block,猜测一下为何这两个文件会被分配到同一个partition呢,我想应该是因为他们在hdfs上面属于同一个data-node,这样的话可以认为两个数据是放在一起的。