Spark 某两个节点数据分析速度慢 - hbase数据删除(分裂) 元信息未删除导致 There is an overlap in the region chain.

22 篇文章 0 订阅

基于Hbase2.0,Spark2.2

问题描述

执行Spark处理Hbase数据时,遇到某两个Excutor处理速度特别慢,如图

正常速度10多分钟 左右处理完成, 一个多小时有另外一个处理完成,还有一个在处理中。

分析原因

1.查看hbase数据分布状况,看看是否存在数据倾斜问题,可以看出平均为6.8G左右,但有部分数据是1.71GB和3.4GB左右,但之前任务跑spark是80多个Executor不会导致时间差距这么大。

其实并非倾斜,下图1.71GB是6.8GB刚刚分裂的Region,分裂成了四个1.71*4 = 6.84GB,大小刚刚是平均大小,

startrowkey :bc27300~bf2a300 大小3.41+1.71.+1.71 = 6.83 都是正常的(数据存储前rowkey做了处理)

ReadRequestsWriteRequestsStorefileSizeNum.StorefilesMemSizeLocalityStart KeyEnd Key
(1,887,103,211)(18,159,897)(437.56 GB)(171)(315 MB)
000 B00 B0 4003300_c126f5cc6cef33e84de0cafc9e52d44f
39,112,630327,2946.83 GB35 MB1 0400300_a439d4f851fa7540d87ae33454a7f892
37,563,645326,7346.83 GB26 MB10400300_a439d4f851fa7540d87ae33454a7f89207fe300_4130b717f89f58b3b6f530eec7a54b05
37,212,501327,6196.84 GB25 MB107fe300_4130b717f89f58b3b6f530eec7a54b050bfd300_4a2dc6b7b1a17472484041e49382b11d
37,392,60357,9116.84 GB25 MB10bfd300_4a2dc6b7b1a17472484041e49382b11d0ffd300_74fdb5eebd54ee32e19abb31bccec158
000 B00 B00ffd300_74fdb5eebd54ee32e19abb31bccec15811fe300_2aa404cbc291ffd913bab13d9339085c
13,882,24714,5251.71 GB21 MB10ffd300_74fdb5eebd54ee32e19abb31bccec15810fd300_7a83ffc34c076654e69d558f1c9f1038
12,799,60982,3951.71 GB21 MB110fd300_7a83ffc34c076654e69d558f1c9f103811fe300_2aa404cbc291ffd913bab13d9339085c
10,021,06981,9781.71 GB31 MB111fe300_2aa404cbc291ffd913bab13d9339085c12fe300_3f46699293dbeaa042e6856597d26dca
8,877,98216,4281.71 GB31 MB112fe300_3f46699293dbeaa042e6856597d26dca13fd300_c223d551c26e191ffacb9981c5fb2cfd
36,137,439328,9386.84 GB35 MB113fd300_c223d551c26e191ffacb9981c5fb2cfd1800300_0c10469f3ca178e9d5d81b507cc1f684
37,475,208328,7586.83 GB35 MB11800300_0c10469f3ca178e9d5d81b507cc1f6841c00300_b0e2e6ce62897dbd1e033f88bcf22336
35,610,681329,0506.83 GB35 MB11c00300_b0e2e6ce62897dbd1e033f88bcf223362002300_cb4302be610bfedebaedce4a53cd0e4e
36,154,352328,4136.86 GB35 MB12002300_cb4302be610bfedebaedce4a53cd0e4e2404300_806ce32f00569ec34eb1fdc008c574cc
35,637,40557,1446.85 GB25 MB12404300_806ce32f00569ec34eb1fdc008c574cc2805300_3063b4db3bc49e0cd2d2e2430b8f92e8
35,617,576328,4296.85 GB35 MB12805300_3063b4db3bc49e0cd2d2e2430b8f92e82c04300_c3f674b39aeb8f289313f87d8d21bfe7
37,567,336328,3926.85 GB25 MB12c04300_c3f674b39aeb8f289313f87d8d21bfe73005300_be2766fa12b179733c9a55801367d56f
33,905,338329,3936.84 GB25 MB13005300_be2766fa12b179733c9a55801367d56f3406300_ec238a180804b26ea232125ff11fcf68
33,344,294328,0096.84 GB35 MB13406300_ec238a180804b26ea232125ff11fcf683806300_886cf9e6502abd928a8c063ca517f54e
32,536,915326,6716.82 GB25 MB13806300_886cf9e6502abd928a8c063ca517f54e3c05300_7ea6294dd2cd312d63d846bf5de5cd35
29,736,155327,2156.82 GB25 MB13c05300_7ea6294dd2cd312d63d846bf5de5cd354003300_c126f5cc6cef33e84de0cafc9e52d44f
22,272,474328,3546.85 GB25 MB14003300_c126f5cc6cef33e84de0cafc9e52d44f4406300_18e8e63d5a80f546cdcadda551531dfc
22,267,884329,5156.84 GB25 MB14406300_18e8e63d5a80f546cdcadda551531dfc4807300_8fb3f14d16049e78c52c1c1c416dda4f
22,259,84456,7446.84 GB25 MB14807300_8fb3f14d16049e78c52c1c1c416dda4f4c08300_e44ffb40f82cf98c9e0802ee74070cc6
22,258,382327,8126.84 GB24 MB14c08300_e44ffb40f82cf98c9e0802ee74070cc6500a300_fff0f934443b857109d1dbece53c770f
22,269,549328,8706.85 GB35 MB1500a300_fff0f934443b857109d1dbece53c770f540c300_b9bbb1361667ab3ac5669a3144a2fa71
22,250,497328,2776.84 GB25 MB1540c300_b9bbb1361667ab3ac5669a3144a2fa71580c300_c7d45406bd17aace3ad5f8586b9dff80
22,224,993328,2546.84 GB36 MB1580c300_c7d45406bd17aace3ad5f8586b9dff805c0c300_0526a7b197b249ca24427063cf16657a
22,209,147327,6296.83 GB25 MB15c0c300_0526a7b197b249ca24427063cf16657a600a300_5523694821dfcaab1fc9ff5ed40aab87
22,106,81556,4736.81 GB25 MB1600a300_5523694821dfcaab1fc9ff5ed40aab876405300_0030bf135d4c2c20e9915c1941bd8cc7
22,136,156327,0956.80 GB35 MB16405300_0030bf135d4c2c20e9915c1941bd8cc767ff300_beefe04a96f547703165d057642c7a6e
21,805,246325,5016.81 GB25 MB167ff300_beefe04a96f547703165d057642c7a6e6bfa300_c556fbef5490c3ca3168c7eb02015d16
21,160,31556,5036.81 GB35 MB16bfa300_c556fbef5490c3ca3168c7eb02015d166ff8300_471ab4d62d3bfe6ae86fe08b15524ee2
22,647,694328,9016.83 GB25 MB16ff8300_471ab4d62d3bfe6ae86fe08b15524ee273f7300_84b51415b091cdf93f380303f2a53658
22,209,72456,8826.83 GB25 MB173f7300_84b51415b091cdf93f380303f2a5365877f5300_a6447ed7062c78c6f2216ec46ee47593
22,194,551327,6346.83 GB25 MB177f5300_a6447ed7062c78c6f2216ec46ee475937bf3300_3eab3e0f5794997bffffa596ca42d78d
22,150,573326,6006.81 GB35 MB17bf3300_3eab3e0f5794997bffffa596ca42d78d7fef300_18b81d0e3245848ddfaf6a97ea868de5
000 B00 B07fef300_18b81d0e3245848ddfaf6a97ea868de5c02b300_52789a8b10e2d896e9be1272d9169afd
37,290,422328,7076.87 GB26 MB17fef300_18b81d0e3245848ddfaf6a97ea868de583f3300_50ee6021166da9b30d27205aa93c7a47
35,724,649329,4636.87 GB25 MB183f3300_50ee6021166da9b30d27205aa93c7a4787f7300_3a43c0b12b4c893d99bc9795c6fedde4
35,732,883329,0706.87 GB35 MB187f7300_3a43c0b12b4c893d99bc9795c6fedde48bfb300_4143081d05ed98403cf157074bd12a5d
17,903,43934,0803.43 GB22 MB18bfb300_4143081d05ed98403cf157074bd12a5d8dff300_146416041229f7ddc0d92a8f3ba72e22
17,856,51628,2463.43 GB22 MB18dff300_146416041229f7ddc0d92a8f3ba72e229001300_1a9a07fdf1ba346228d63f9a4db9ebbf
35,734,987330,8556.87 GB25 MB19001300_1a9a07fdf1ba346228d63f9a4db9ebbf9404300_f5e0017433b307e15eb645c82ba69eb1
34,355,672330,8966.87 GB25 MB19404300_f5e0017433b307e15eb645c82ba69eb1980a300_eb3ea8174076a9343b9fefc131da4bfb
35,863,28057,5286.89 GB25 MB1980a300_eb3ea8174076a9343b9fefc131da4bfb9c13300_b4c65bb87a51729cdc51720b0cb0f0b7
35,775,342328,7826.87 GB35 MB19c13300_b4c65bb87a51729cdc51720b0cb0f0b7a019
35,636,124328,9146.86 GB25 MB1a019a41a300_3756e9713d68d9e36f1dc7fe07829779
32,629,922329,0016.85 GB35 MB1a41a300_3756e9713d68d9e36f1dc7fe07829779a81d300_2f82d2b7ea67878b8c1ef86bd0e1c169
33,134,529328,8226.84 GB25 MB1a81d300_2f82d2b7ea67878b8c1ef86bd0e1c169ac1e300_0bd8ddff51d175087f6cda84b271b0eb
33,375,442328,6686.85 GB25 MB1ac1e300_0bd8ddff51d175087f6cda84b271b0ebb01e300_c52d0d2a792286a1b88d7b91c488813b
33,476,456329,5926.86 GB35 MB1b01e300_c52d0d2a792286a1b88d7b91c488813bb423300_8a37f3ac542446d2c3e16b069b08e09d
31,318,752328,3896.85 GB35 MB1b423300_8a37f3ac542446d2c3e16b069b08e09db826300_1cdc89eeef046475883578cf0fc1d950
30,799,781329,0796.85 GB25 MB1b826300_1cdc89eeef046475883578cf0fc1d950bc27300_2b9315f844b8d5e1a253f18c4d2eabe1
14,482,00729,1993.42 GB32 MB1bc27300_2b9315f844b8d5e1a253f18c4d2eabe1be28300_f3a448e233baeddf68cc03027bf45a74
7,271,60682,8541.71 GB21 MB1be28300_f3a448e233baeddf68cc03027bf45a74bf2a300_5023e51cd969429bc3f008d780b46f61
7,269,34982,5631.71 GB21 MB1bf2a300_5023e51cd969429bc3f008d780b46f61c02b300_52789a8b10e2d896e9be1272d9169afd
20,762,362325,9036.82 GB25 MB1c02b300_52789a8b10e2d896e9be1272d9169afdc429300_6c3de304911e18799fb01d20f69be03f
22,220,126327,0006.82 GB35 MB1c429300_6c3de304911e18799fb01d20f69be03fc828300_c50e85412bf2cdfcb005505377a0ba73
21,767,188327,8706.83 GB24 MB1c828300_c50e85412bf2cdfcb005505377a0ba73cc27300_2e0fea9d4e083026bf18ea43642a73d0
21,537,300326,7816.83 GB24 MB1cc27300_2e0fea9d4e083026bf18ea43642a73d0d023300_fa22e502bb83df688bb331aaeafc4ee6
11,061,31928,1003.40 GB32 MB1d023300_fa22e502bb83df688bb331aaeafc4ee6d221300_4eb87797b5869a19a7bf18100abf50ec
11,081,110162,6803.40 GB32 MB1d221300_4eb87797b5869a19a7bf18100abf50ecd41e300_961e28d64cbf6e460820a7d47b9efa12
20,708,894326,1546.81 GB35 MB1d41e300_961e28d64cbf6e460820a7d47b9efa12d81a300_5229559bbe763cc7ebd875db357002cd
22,158,241327,0006.81 GB25 MB1d81a300_5229559bbe763cc7ebd875db357002cddc17300_7ac1eebea0c79430e0b0b0af4dfeffc2
22,058,558327,6736.82 GB25 MB1dc17300_7ac1eebea0c79430e0b0b0af4dfeffc2e014300_ec7ef53bab65a01697e2c51175c52435
000 B00 B0e014300_ec7ef53bab65a01697e2c51175c52435e813300_d1e7c088b528940a93a289f9e66e88d7
37,529,10257,6056.83 GB25 MB1e014300_ec7ef53bab65a01697e2c51175c52435e416300_0edad48cf0a12f8e2b1370bd6e3fc151
35,533,233327,2176.83 GB35 MB1e416300_0edad48cf0a12f8e2b1370bd6e3fc151e813300_d1e7c088b528940a93a289f9e66e88d7
22,219,616328,0196.83 GB25 MB1e813300_d1e7c088b528940a93a289f9e66e88d7ec12300_f9576788ea38623c324f1fd1ff84385c
22,188,307327,9436.82 GB25 MB1ec12300_f9576788ea38623c324f1fd1ff84385cf010300_2053a69a8c06dd64c1a7d749ba16ef65
000 B00 B0f010300_2053a69a8c06dd64c1a7d749ba16ef65f808300_3bfbe199c6b2cd26acc4a94e98a410d1
37,715,743327,5656.81 GB25 MB1f010300_2053a69a8c06dd64c1a7d749ba16ef65f40d300_152e1c39039240e45a842b1e6bf955d4
9,448,94582,1831.70 GB21 MB1f40d300_152e1c39039240e45a842b1e6bf955d4f50c300_0cc62bd5f3b5d1fcdba2ff0cc95fff37
9,236,84181,3781.70 GB21 MB1f50c300_0cc62bd5f3b5d1fcdba2ff0cc95fff37f60a300_e9bd0585d0aff0c3cbd93b1cf59b14b6
17,719,497163,7743.40 GB22 MB1f60a300_e9bd0585d0aff0c3cbd93b1cf59b14b6f808300_3bfbe199c6b2cd26acc4a94e98a410d1
20,755,049327,9576.81 GB25 MB1f808300_3bfbe199c6b2cd26acc4a94e98a410d1fc05300_e476fad884176fe964420de97e5cdaaf
20,761,793326,0476.80 GB25 MB1fc05300_e476fad884176fe964420de97e5cdaaf 

2.查看Spark日志看看有什么问题

 发现日志,可以看到startRow=0000,stopRow=fffg ,这个Spark扫描Hbase设置的条件

19/04/15 09:29:41 INFO rdd.NewHadoopRDD: Input split: HBase table split(table name: zh_ams_ns:wechat_article, scan: {"loadColumnFamiliesOnDemand":null,"startRow":"0000","stopRow":"fffg","batch":-1,"cacheBlocks":true,"totalColumns":2,"maxResultSize":-1,"families":{"fn":["article_type","download_type"]},"caching":-1,"maxVersions":1,"timeRange":[0,9223372036854775807]}, start row: 7fef300_18b81d0e3245848ddfaf6a97ea868de1, end row: c02b300_52789a8b10e2d896e9be1272d9169af1, region location: hostname1, encoded region name: 8943fc0bd38fd292d9acb1c6bb4b7a6)
19/04/15 09:29:41 INFO rdd.NewHadoopRDD: Input split: HBase table split(table name: zh_ams_ns:wechat_article, scan: {"loadColumnFamiliesOnDemand":null,"startRow":"0000","stopRow":"fffg","batch":-1,"cacheBlocks":true,"totalColumns":2,"maxResultSize":-1,"families":{"fn":["article_type","download_type"]},"caching":-1,"maxVersions":1,"timeRange":[0,9223372036854775807]}, start row: 6405300_0030bf135d4c2c20e9915c1941bd8cc3, end row: 67ff300_beefe04a96f547403165d057642c7a6e, region location: hostname1, encoded region name: 96c9963bdd044ebdf2bd883435735d5)
19/04/15 09:29:41 INFO broadcast.TorrentBroadcast: Started reading broadcast variable 0

这个节点的具体处理数据信息为

start row: 7fef300_18b81d0e3245848ddfaf6a97ea868de5, end row: c02b300_52789a8b10e2d896e9be1272d9169afd, region location: hostname1
start row: 6405300_0030bf135d4c2c20e9915c1941bd8cc3, end row: 67ff300_beefe04a96f547403165d057642c7a6e, region location: hostname1

明显看出7fef300~c02b300,范围远大于6405300~67ff300,查看其他Executor,也发现跨度不会这么大,查看Hbase表的数据状态发现7fef300~c02b300的region没有数据,所有信息都为零。

可以初步判断出region分裂后,region的元信息没有进行删除, 导致spark读取时把rowkey 7fef300~c02b300的数据重新读取一遍,其中包含多个region导致Executor处理速度过慢。

为什么发现很多个region信息分裂后,元信息都没有删除,但是只有两个是非常慢的,可以从hbase数据中发现只有两个元信息跨度比较大,其他的rowkey范围宽度较小,所以没有明显感觉出来,如下图只跨了两个region

 

判断完成后,我们通过hbase hbck查看一下表的状态,截取报异常的一段,提示tableName,,1539422767596.259a2349c4f07b7d625142d323bede56.   region与其他region有重复区域,验证了我们刚才的想法

ERROR: (regions tableName,,1539422767596.259a2349c4f07b7d625142d323bede56. and tableName,0ffd300_74fdb5eebd54ee32e19abb31bccec158,1542774878705.a860b3230759d601187e9dcf6a4bdad0.) There is an overlap in the region chain.
ERROR: (regions tableName,,1539422767596.259a2349c4f07b7d625142d323bede56. and tableName,10fd300_7a83ffc34c076654e69d558f1c9f1038,1550065067486.62a2f5707f49a536755d8a357d30bfd0.) There is an overlap in the region chain.
ERROR: (regions tableName,0ffd300_74fdb5eebd54ee32e19abb31bccec158,1542774878705.a860b3230759d601187e9dcf6a4bdad0. and tableName,10fd300_7a83ffc34c076654e69d558f1c9f1038,1550065067486.62a2f5707f49a536755d8a357d30bfd0.) There is an overlap in the region chain.
ERROR: (regions tableName,,1539422767596.259a2349c4f07b7d625142d323bede56. and tableName,11fe300_2aa404cbc291ffd913bab13d9339085c,1543086502643.f4c52419535c5574f7b79c05f82a3bd2.) There is an overlap in the region chain.
ERROR: (regions tableName,,1539422767596.259a2349c4f07b7d625142d323bede56. and tableName,12fe300_3f46699293dbeaa042e6856597d26dca,1543086502643.dcfff4e27445fd7ef7d54e931fc3fee1.) There is an overlap in the region chain.
ERROR: (regions tableName,,1539422767596.259a2349c4f07b7d625142d323bede56. and tableName,13fd300_c223d551c26e191ffacb9981c5fb2cfd,1542770662897.f0c75de535b324f02267b2e2d0020c75.) There is an overlap in the region chain.
ERROR: (regions tableName,,1539422767596.259a2349c4f07b7d625142d323bede56. and tableName,1800300_0c10469f3ca178e9d5d81b507cc1f684,1548215404844.8546360f0bb0b7d0dcbd2c07772127ab.) There is an overlap in the region chain.
ERROR: (regions tableName,,1539422767596.259a2349c4f07b7d625142d323bede56. and tableName,1c00300_b0e2e6ce62897dbd1e033f88bcf22336,1548215404844.b12cda6b9cb167c54ed47f62cc0aa7a2.) There is an overlap in the region chain.
ERROR: (regions tableName,,1539422767596.259a2349c4f07b7d625142d323bede56. and tableName,2002300_cb4302be610bfedebaedce4a53cd0e4e,1549815723592.9b62e6522d5c687a088d95fbd1ca0c1b.) There is an overlap in the region chain.
ERROR: (regions tableName,,1539422767596.259a2349c4f07b7d625142d323bede56. and tableName,2404300_806ce32f00569ec34eb1fdc008c574cc,1549815723592.dada7d00670779d1aaf6c0d47ebb50c7.) There is an overlap in the region chain.
ERROR: (regions tableName,,1539422767596.259a2349c4f07b7d625142d323bede56. and tableName,2805300_3063b4db3bc49e0cd2d2e2430b8f92e8,1551035005714.2a6ad9c50e10527e2f08cad1d5189e4a.) There is an overlap in the region chain.
ERROR: (regions tableName,,1539422767596.259a2349c4f07b7d625142d323bede56. and tableName,2c04300_c3f674b39aeb8f289313f87d8d21bfe7,1551035005714.df8ba688e677572214f75b270dedcd1b.) There is an overlap in the region chain.
ERROR: (regions tableName,,1539422767596.259a2349c4f07b7d625142d323bede56. and tableName,3005300_be2766fa12b179733c9a55801367d56f,1551240890445.e479bcacf255f34bd0a98540e6e02ac5.) There is an overlap in the region chain.
ERROR: (regions tableName,,1539422767596.259a2349c4f07b7d625142d323bede56. and tableName,3406300_ec238a180804b26ea232125ff11fcf68,1551240890445.21a2461c5242d970cc46486b14817e12.) There is an overlap in the region chain.
ERROR: (regions tableName,,1539422767596.259a2349c4f07b7d625142d323bede56. and tableName,3806300_886cf9e6502abd928a8c063ca517f54e,1548178323143.01d1784953e10b53ed9a41953302b3bc.) There is an overlap in the region chain.
ERROR: (regions tableName,,1539422767596.259a2349c4f07b7d625142d323bede56. and tableName,3c05300_7ea6294dd2cd312d63d846bf5de5cd35,1548178323143.f861eb4119dc4dc3f7c5e73555824552.) There is an overlap in the region chain.
19/04/16 13:28:43 WARN util.HBaseFsck: reached end of problem group: 4003300_c126f5cc6cef33e84de0cafc9e52d44f

 

 

重大问题:统计完成时发现,会导致数据重复读取,正常数据量为142825979条  ,结果查询出来224847552条。

解决方法

1.运行过程中,不但慢后来因为Executor处理数据过大,内存不足导致异常,解决方式

https://blog.csdn.net/zhangshenghang/article/details/89313245

2.删除分裂前的region

根据region name进行删除,元信息在hbase:meta表中,查看状态

get 'hbase:meta','tableName,7fef300_18331d0e3245848ddfa16a97ea868de5,1539422555091.dee970bfd95a946d9a9268db01f7ec77.'

删除

 deleteall 'hbase:meta','tableName,7fef300_18b81d0e3245848ddfaf6a97ea868de5,1539422555091.dee970bfd95a946d9a9268db01f7ec77.'

删除后再次跑spark即可过滤掉这个region,解决该问题。

 

 但是存在一个隐藏问题,将表enable之后,删除的region又出现了,查看zookeeper也没有改配置,如何恢复的呢?后来经查阅相关资料发现master是有缓存存储这些信息的,删除元信息(目录 hdfs   /hbase/MasterProcWALs )后重启hbase Master集群元信息就可以了。

 

问题:

千万不要删除有数据或者正常的region,删除并重启后,表数据异常无法读取数据,提示如下

hbase(main):062:0> count 'tableName'

ERROR: Unknown table tableName!

使用 hbase hbck查看状态,提示region rowkey 范围开始应该为 '' 空字符串,如果删除正常的region会导致异常

ERROR: (region tableName_201615,201604097053999999_38599d53219f66c8605abbe144b33844,1494968976496.08e0b835edffba9c7f47c68d800f297a.) First region should start with an empty key.  You need to  create a new region and regioninfo in HDFS to plug the hole.
ERROR: Found inconsistency in table tableName_201615

正常 如下图所示

 

 

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值