安装Hadoop集群
安装版本:hadoop-0.20.2
设置免登陆(略)
master:idc01-vm-test-124
slave:idc01-vm-test-123
修改配置文件:
core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://idc01-vm-test-124:9019</value>
<description>the name of the default file system.URI</description>
</property>
</configuration>
more hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
<description>default block replaction</description>
</property>
</configuration>
more hadoop-env.sh
打开并配置JDK的环境变量
export JAVA_HOME=/usr/local/java/jdk1.7.0/
mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>idc01-vm-test-124:9000</value>
</property>
</configuration>
more masters (机器名或IP)
idc01-vm-test-124
more slaves
(注意该Slave中得配置只需要在namenode上进行配置)
(Slave机器只需要设置成localhost即可)
idc01-vm-test-124
idc01-vm-test-123
启动namenode
格式化NameNode
./hadoop namenode -format
启动服务
start-all.sh
创建目录
./hadoop fs -mkdir input
上传文件
./hadoop fs -put ./test.out input
查看文件目录
./hadoop fs -ls input
如果DataNode没有启动成功或者增加DataNode节点是需要在Master节点增加到Slave节点后。
在DataNode节点启动即可:
./hadoop-daemon.sh start datanode
./hadoop-daemon.sh start jobtracker
启动成功的话,需要在master 节点执行均可数据节点
$bin/hadoop balancer
查看Hadoop集群的
./hadoop dfsadmin -report
Configured Capacity: 42275692544 (39.37 GB)
Present Capacity: 20235141120 (18.85 GB)
DFS Remaining: 19849265152 (18.49 GB)
DFS Used: 385875968 (368 MB)
DFS Used%: 1.91%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Datanodes available: 2 (2 total, 0 dead)
Name: 10.1.15.124:50010
Decommission Status : Normal
Configured Capacity: 21137846272 (19.69 GB)
DFS Used: 192937984 (184 MB)
Non DFS Used: 12496318464 (11.64 GB)
DFS Remaining: 8448589824(7.87 GB)
DFS Used%: 0.91%
DFS Remaining%: 39.97%
Last contact: Sat Apr 19 19:06:05 CST 2014
Name: 10.1.15.123:50010
Decommission Status : Normal
Configured Capacity: 21137846272 (19.69 GB)
DFS Used: 192937984 (184 MB)
Non DFS Used: 9544232960 (8.89 GB)
DFS Remaining: 11400675328(10.62 GB)
DFS Used%: 0.91%
DFS Remaining%: 53.93%
Last contact: Sat Apr 19 19:06:06 CST 2014
Hadoop 设置目录拥有者权限
>hadoop fs -chown username:username /user/username
设置各目录的空间容量限制1TB
>hadoop dfsadmin -setSpaceQuota 1t /user/name
查看namenode是否处于安全模式
>hadoop dfsadmin -safemode get
Safe mode is ON
进入安全模式
>hadoop dfsadmin -safemode enter
退出安全模式
>hadoop dfsadmin -safemode leave
使用wait选项可以达到退出安全模式
>hadoop dfsadmin -safemode wait
了解正在负责的货等待复制的信息
>hadoop dfsadmin -metasave
fsck 工具
Hadoop 提供fsck工具来检查HDFS中文件的健康状况
>hadoop fsck / #”/“ 代表跟目录
fsck 工具根据特定文件寻找改文件的数据块
>Hadoop fsck /user/log/part-0001 -files -blocks -racks
-files 选项显示包括文件名称,大小,块数量和健康状况(是否有缺失的块)
-blocks 选项描述文件中各个块的信息
-racks 显示各个块的机架位置和datanode的地址
设置日志级别
针对JobTracker类启动日志调试特性可以访问jobtracker的网页
http://jobtracker-host:50030/logLevel 将 org.apache.hadoop.maperd.JobTracker属性设置为Debug级别
也可以通过以下命令
>hadoop daemonlog -setlevel jobtracker-host:50030 org.apache.hadoop.maperd.JobTracker DEBUG
使用这两种方式设置的日志级别,会再守护进程重启时被充值。如果想永久变更需要修改log4j.properties
获取堆栈轨迹
Hadoop 守护进程提供一个网页界面对正在守护进程的JVM运行着的线程执行线程转储(Thread dump)
http://jobtracker-host:50030/stacks
如果修改了datanode 增加或减少需要更新namenode的设置。
>hadoop dfsadmin -refreshNodes
执行wordcount job
./hadoop jar ../hadoop-0.20.2-examples.jar wordcount input output
查看执行结果
./hadoop fs -cat output/part-r-00000
删除目录
./hadoop fs -rmr output
hdfs 中fsck指令可以显示块信息
例如:执行以下命令将列出文件系统中各个文件由哪些块构成
[root@idc01-vm-test-124 bin]# ./hadoop fsck / -files -blocks/system <dir>
/tmp <dir>
/tmp/hadoop-root <dir>
/tmp/hadoop-root/mapred <dir>
/tmp/hadoop-root/mapred/system <dir>
/tmp/hadoop-root/mapred/system/jobtracker.info 4 bytes, 1 block(s): OK
0. blk_-2878039351999088155_1009 len=4 repl=2
/user <dir>
/user/root <dir>
/user/root/input <dir>
/user/root/input/activity.log.2013-12-10_0 96065578 bytes, 2 block(s): OK
0. blk_-3110298124513959180_1002 len=67108864 repl=2
1. blk_2451777541316345897_1002 len=28956714 repl=2
/user/root/input/activity.log.2013-12-10_2 95325775 bytes, 2 block(s): OK
0. blk_4492270836927550498_1003 len=67108864 repl=2
1. blk_-5677525524765262814_1003 len=28216911 repl=2
Status: HEALTHY
Total size: 191391357 B
Total dirs: 8
Total files: 3
Total blocks (validated): 5 (avg. block size 38278271 B)
Minimally replicated blocks: 5 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 2
Average block replication: 2.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 2
Number of racks: 1
The filesystem under path '/' is HEALTHY