HDFS 维护节点

31 篇文章 0 订阅
7 篇文章 0 订阅

1.添加一个DataNode

集群扩容需要添加新DataNode,通常情况是需要添加存储,虽然有时也为了添加IO带宽或减小单台机器失效的影响。
在运行中的HDFS集群上增加新的DataNode是一个在线操作或者说是热操作。
步骤如下:
1. 把DataNode的IP地址加入dfs.hosts参数指定的文件中。每个IP地址占用一行。
2. 以HDFS超级用户或者有类似特权的用户执行命令hadoop dfsadmin -refreshNodes.
3. 如果使用机器感知机制,需要为新加主机更新相关机器信息。
4. 启动DataNode进程。
5. 通过NameNode的Web界面或者命令hadoop dfsadmin -report 的输出来确定新节点是否已连接。

2.卸载DataNode

为了保证所有数据块的安全,就需要使用安全卸载功能。卸载过程依赖于HDFS主机的include和exclude文件,如果没有使用这些文件就无法安全卸载一个DataNode。
步骤如下:
1. 把DataNode的IP地址添加到 dfs.hosts.exclude 参数指定的文件中。每个IP地址占用一行。
2. 以HDFS超级用户或者拥有类似特权的用户执行 hadoop dfsadmin -refreshNodes。
3. 监控NameNode的Web界面确保卸载正在进行。有时,更新会滞后几秒。
4. 因为DataNode上的数据较多,干脆去喝杯咖啡或者回家睡觉吧。卸载有时会持续数小时甚至几天!卸载完成时,NameNode的界面会把DataNode显示成以卸载。
5. 停止DataNode进程。
6. 如果不打算把机器放回集群,就需要在HDFS的include和exclude文件中去除DataNode,同时更新机架拓扑数据库。
7. 执行hadoop dfsadmin -refreshNodes 让NameNode进行节点更新。

3.用 fsck 来检查文件系统的一致性

有时 HDFS 会发生一些病态情况。如果一个文件的一个或者多个数据块的所有备份都无法读取,文件就损坏了。这回导致在这个文件中留下一个黑洞,有时黑洞会大至数据分块的大小,任何读取处于这种状态下的文件的操作会发生异常。导致这种问题的原因是一个数据块的所有备份几乎同时出了问题以至于系统不能及时侦测并备份数据。尽快这很少发生,管理员需要一种工具检测类似问题并帮助查找丢失的数据,以防止类似灾难发生。

fsck会生产一份总汇报,列明文件系统的总统健康状况。当且仅当HDFS的所有文件都有最小数量的副本可用时,文件系统才是健康的。每个文件检查后会打上一个圆点;汇总信息包括块总数、平均备份因子、系统容量、丢失块数以及其他重要指标。

[root@hadoop ~]# hadoop fsck /
FSCK started by root from /192.168.137.130 for path / at Wed Aug 19 22:44:03 CST 2015
...........................................Status: HEALTHY
 Total size:    292283696 B
 Total dirs:    58
 Total files:   43
 Total blocks (validated):      40 (avg. block size 7307092 B)
 Minimally replicated blocks:   40 (100.0 %)
 Over-replicated blocks:        0 (0.0 %)
 Under-replicated blocks:       0 (0.0 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    1
 Average block replication:     1.0
 Corrupt blocks:                0
 Missing replicas:              0 (0.0 %)
 Number of data-nodes:          2
 Number of racks:               1
FSCK ended at Wed Aug 19 22:44:03 CST 2015 in 20 milliseconds


The filesystem under path '/' is HEALTHY

一些参数的使用,

[root@hadoop ~]# hadoop fsck / -files -blocks -locations
FSCK started by root from /192.168.137.130 for path / at Wed Aug 19 22:56:59 CST 2015
/ <dir>
/apache_logs <dir>
/apache_logs/access_2013_05_30.log 61084192 bytes, 1 block(s):  OK
0. blk_-4833243339324073037_1229 len=61084192 repl=1 [192.168.137.131:50010]

/ext <dir>
/external <dir>
/external/id 53 bytes, 1 block(s):  OK
0. blk_2036756907093028766_1132 len=53 repl=1 [192.168.137.132:50010]

/hbase <dir>
/hbase/-ROOT- <dir>
/hbase/-ROOT-/.tableinfo.0000000001 728 bytes, 1 block(s):  OK
0. blk_1805412309673802489_1022 len=728 repl=1 [192.168.137.131:50010]

/hbase/-ROOT-/.tmp <dir>
/hbase/-ROOT-/70236052 <dir>
/hbase/-ROOT-/70236052/.oldlogs <dir>
/hbase/-ROOT-/70236052/.oldlogs/hlog.1437495295893 421 bytes, 1 block(s):  OK
0. blk_-414492115728353317_1020 len=421 repl=1 [192.168.137.131:50010]

/hbase/-ROOT-/70236052/.regioninfo 109 bytes, 1 block(s):  OK
0. blk_1129077452203097409_1017 len=109 repl=1 [192.168.137.131:50010]

/hbase/-ROOT-/70236052/info <dir>
/hbase/-ROOT-/70236052/info/34b60c58dfce47c0b313332129b882a9 796 bytes, 1 block(s):  OK
0. blk_-1017769386369979229_1021 len=796 repl=1 [192.168.137.131:50010]

/hbase/.META. <dir>
/hbase/.META./1028785192 <dir>
/hbase/.META./1028785192/.oldlogs <dir>
/hbase/.META./1028785192/.oldlogs/hlog.1437495296776 134 bytes, 1 block(s):  OK
0. blk_-8134176369857809736_1021 len=134 repl=1 [192.168.137.132:50010]

/hbase/.META./1028785192/.regioninfo 111 bytes, 1 block(s):  OK
0. blk_1249614448727799934_1019 len=111 repl=1 [192.168.137.131:50010]

/hbase/.META./1028785192/info <dir>
/hbase/.logs <dir>
/hbase/.logs/hadoop1,60020,1437495301957 <dir>
/hbase/.logs/hadoop1,60020,1437495301957/hadoop1%2C60020%2C1437495301957.1437495303837 0 bytes, 0 block(s):  OK

/hbase/.logs/hadoop2,60020,1437495300483 <dir>
/hbase/.logs/hadoop2,60020,1437495300483/hadoop2%2C60020%2C1437495300483.1437495302124 304 bytes, 1 block(s):  OK
0. blk_-5969305522960748136_1080 len=304 repl=1 [192.168.137.132:50010]

/hbase/.oldlogs <dir>
/hbase/.tmp <dir>
/hbase/hbase.id 38 bytes, 1 block(s):  OK
0. blk_8308249816346163537_1016 len=38 repl=1 [192.168.137.131:50010]

/hbase/hbase.version 3 bytes, 1 block(s):  OK
0. blk_-7232964833652814569_1015 len=3 repl=1 [192.168.137.132:50010]

/hive <dir>
/hive/hmbbs_2013_05_30 <dir>
/hive/hmbbs_2013_05_30/000000_0 32 bytes, 1 block(s):  OK
0. blk_-4331825443819995755_1428 len=32 repl=1 [192.168.137.132:50010]

/hive/hmbbs_ip_2013_05_30 <dir>
/hive/hmbbs_ip_2013_05_30/000000_0 6 bytes, 1 block(s):  OK
0. blk_7703313283621377524_1347 len=6 repl=1 [192.168.137.132:50010]

/hive/hmbbs_jumper_2013_05_30 <dir>
/hive/hmbbs_jumper_2013_05_30/000000_0 5 bytes, 1 block(s):  OK
0. blk_9022556680828031083_1374 len=5 repl=1 [192.168.137.132:50010]

/hive/hmbbs_pv_2013_05_30 <dir>
/hive/hmbbs_pv_2013_05_30/000000_0 7 bytes, 1 block(s):  OK
0. blk_2783858293669003544_1301 len=7 repl=1 [192.168.137.132:50010]

/hive/hmbbs_reguser_2013_05_30 <dir>
/hive/hmbbs_reguser_2013_05_30/000000_0 3 bytes, 1 block(s):  OK
0. blk_6647468447483036669_1320 len=3 repl=1 [192.168.137.132:50010]

/hive/t1 <dir>
/hive/t1/id 13 bytes, 1 block(s):  OK
0. blk_9090054515115022211_1082 len=13 repl=1 [192.168.137.131:50010]

/hive/t1/id2 13 bytes, 1 block(s):  OK
0. blk_-7364529739357885380_1083 len=13 repl=1 [192.168.137.131:50010]

/hive/t2 <dir>
/hive/t2/id 53 bytes, 1 block(s):  OK
0. blk_4456042822126421025_1084 len=53 repl=1 [192.168.137.132:50010]

/hive/t2/mingdan 53 bytes, 1 block(s):  OK
0. blk_-4655964205569421870_1085 len=53 repl=1 [192.168.137.132:50010]

/hive/t3 <dir>
/hive/t3/day=24 <dir>
/hive/t3/day=24/id 53 bytes, 1 block(s):  OK
0. blk_3173775080633849700_1129 len=53 repl=1 [192.168.137.131:50010]

/hmbbs_cleaned <dir>
/hmbbs_cleaned/2013_05_30 <dir>
/hmbbs_cleaned/2013_05_30/_SUCCESS 0 bytes, 0 block(s):  OK

/hmbbs_cleaned/2013_05_30/_logs <dir>
/hmbbs_cleaned/2013_05_30/_logs/history <dir>
/hmbbs_cleaned/2013_05_30/_logs/history/job_201508132139_0001_1439478232329_root_HmbbsCleaner 13692 bytes, 1 block(s):  OK
0. blk_8630808591879667815_1244 len=13692 repl=1 [192.168.137.131:50010]

/hmbbs_cleaned/2013_05_30/_logs/history/job_201508132139_0001_conf.xml 45450 bytes, 1 block(s):  OK
0. blk_-6362301113520590638_1241 len=45450 repl=1 [192.168.137.131:50010]

/hmbbs_cleaned/2013_05_30/part-r-00000 12794925 bytes, 1 block(s):  OK
0. blk_4228932799050096870_1243 len=12794925 repl=1 [192.168.137.131:50010]

/hmbbs_cleaned/2013_05_31 <dir>
/hmbbs_logs <dir>
/hmbbs_logs/access_2013_05_30.log 61084192 bytes, 1 block(s):  OK
0. blk_4224204346634080030_1234 len=61084192 repl=1 [192.168.137.131:50010]

/hmbbs_logs/access_2013_05_31.log 157069653 bytes, 3 block(s):  OK
0. blk_3706166188961946512_1226 len=67108864 repl=1 [192.168.137.132:50010]
1. blk_389773787940735549_1226 len=67108864 repl=1 [192.168.137.132:50010]
2. blk_-5356155428543672292_1226 len=22851925 repl=1 [192.168.137.131:50010]

/id 12 bytes, 1 block(s):  OK
0. blk_-5765217947919665010_1134 len=12 repl=1 [192.168.137.132:50010]

/ids <dir>
/ids/id 17 bytes, 1 block(s):  OK
0. blk_1729058135902306855_1163 len=17 repl=1 [192.168.137.131:50010]

/tmp <dir>
/tmp/hive-root <dir>
/user <dir>
/user/hive <dir>
/user/hive/warehouse <dir>
/user/hive/warehouse/t1 <dir>
/user/root <dir>
/user/root/TBLS <dir>
/user/root/TBLS/_SUCCESS 0 bytes, 0 block(s):  OK

/user/root/TBLS/_logs <dir>
/user/root/TBLS/_logs/history <dir>
/user/root/TBLS/_logs/history/job_201508072109_0001_1438954218575_root_TBLS.jar 16595 bytes, 1 block(s):  OK
0. blk_-3376785535960543130_1162 len=16595 repl=1 [192.168.137.131:50010]

/user/root/TBLS/_logs/history/job_201508072109_0001_conf.xml 52203 bytes, 1 block(s):  OK
0. blk_-8793725543070965963_1156 len=52203 repl=1 [192.168.137.131:50010]

/user/root/TBLS/part-m-00000 106 bytes, 1 block(s):  OK
0. blk_-8368707346793660309_1158 len=106 repl=1 [192.168.137.131:50010]

/user/root/TBLS/part-m-00001 0 bytes, 0 block(s):  OK

/user/root/TBLS/part-m-00002 55 bytes, 1 block(s):  OK
0. blk_-6466579814782096440_1161 len=55 repl=1 [192.168.137.131:50010]

/user/root/TBLS/part-m-00003 112 bytes, 1 block(s):  OK
0. blk_-1586982331520825397_1161 len=112 repl=1 [192.168.137.131:50010]

/usr <dir>
/usr/local <dir>
/usr/local/hadoop <dir>
/usr/local/hadoop/tmp <dir>
/usr/local/hadoop/tmp/mapred <dir>
/usr/local/hadoop/tmp/mapred/staging <dir>
/usr/local/hadoop/tmp/mapred/staging/root <dir>
/usr/local/hadoop/tmp/mapred/staging/root/.staging <dir>
/usr/local/hadoop/tmp/mapred/system <dir>
/usr/local/hadoop/tmp/mapred/system/jobtracker.info 4 bytes, 1 block(s):  OK
0. blk_4135630100726887975_1511 len=4 repl=1 [192.168.137.131:50010]

/wlan 2214 bytes, 1 block(s):  OK
0. blk_-131599298487211701_1027 len=2214 repl=1 [192.168.137.132:50010]

/wlan_result <dir>
/wlan_result/_SUCCESS 0 bytes, 0 block(s):  OK

/wlan_result/_logs <dir>
/wlan_result/_logs/history <dir>
/wlan_result/_logs/history/job_201507242155_0005_1437747401571_root_PigLatin%3ADefaultJobName 13410 bytes, 1 block(s):  OK
0. blk_1815765538230662949_1077 len=13410 repl=1 [192.168.137.131:50010]

/wlan_result/_logs/history/job_201507242155_0005_conf.xml 103373 bytes, 1 block(s):  OK
0. blk_-3659292233687112569_1074 len=103373 repl=1 [192.168.137.132:50010]

/wlan_result/part-r-00000 556 bytes, 1 block(s):  OK
0. blk_6266982223961738595_1076 len=556 repl=1 [192.168.137.131:50010]

Status: HEALTHY
 Total size:    292283696 B
 Total dirs:    58
 Total files:   43
 Total blocks (validated):      40 (avg. block size 7307092 B)
 Minimally replicated blocks:   40 (100.0 %)
 Over-replicated blocks:        0 (0.0 %)
 Under-replicated blocks:       0 (0.0 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    1
 Average block replication:     1.0
 Corrupt blocks:                0
 Missing replicas:              0 (0.0 %)
 Number of data-nodes:          2
 Number of racks:               1
FSCK ended at Wed Aug 19 22:56:59 CST 2015 in 11 milliseconds


The filesystem under path '/' is HEALTHY
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值