注:开源力量Hadoop Development网络培训,链接:http://new.osforce.cn/course/52 个人笔记,不具参考性。
选用 hadoop-1.2.1, hadoop-2.0.3-alpha,地址:http://www.apache.org/dyn/closer.cgi/hadoop/common/
在Linux下解压hadoop-1.2.1, tar -zxvf,用tree命令可看到目录结构
了解hadoop-1.2.1中的bin、conf、sbin、src:
- bin:hadoop的启动脚本和命令脚本
- conf:hadoop的配置信息
- sbin:hadoop的环境配置脚本
- src:hadoop的源代码
- etc:hadoop的所有配置信息
- share:hadoop相关的jar包和对应的字节码
7 <property>
8 <name>fs.default.name</name>
9 <value>hdfs://192.168.128.3:9100</value>
10 </property>
11 </configuration>
7 <property>
8 <name>dfs.replication</name>
9 <value>1</value>
10 </property>
11
12 <property>
13 <name>dfs.name.dir</name>
14 <value>/tmp/hadoop/dfs/name</value>
15 </property>
16
17 <property>
18 <name>dfs.data.dir</name>
19 <value>/tmp/hadoop/dfs/data</value>
20 </property>
21 </configuration>
starting namenode, logging to /home/michaelchen/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-michaelchen-namenode-mars.clustertech.com.out
localhost: starting datanode, logging to /home/michaelchen/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-michaelchen-datanode-mars.clustertech.com.out
localhost: starting secondarynamenode, logging to /home/michaelchen/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-michaelchen-secondarynamenode-mars.clustertech.com.out
[michaelchen@mars bin]$ jps
6585 DataNode
6721 SecondaryNameNode
6797 Jps
6450 NameNode
Formatting the Namenode
The first step to starting up your Hadoop installation is formatting the Hadoop filesystem, which is implemented on top of the local filesystems of your cluster. You need to do this the first time you set up a Hadoop installation. Do not format a running Hadoop filesystem, this will cause all your data to be erased. Before formatting, ensure that thedfs.name.dir directory exists. If you just used the default, then mkdir -p /tmp/hadoop-username/dfs/name will create the directory. To format the filesystem (which simply initializes the directory specified by the dfs.name.dir variable), run the command:
% $HADOOP_INSTALL/hadoop/bin/hadoop namenode -format
If asked to [re]format, you must reply Y (not just y) if you want to reformat, else Hadoop will abort the format.
---------------------------------./hadoop dfsadmin -report (文件系统的基本信息和统计信息)
或者通过网页端:http://192.168.128.3:50070/ 也可以显示
./hadoop dfsadmin -safemode enter
./hadoop dfsadmin -safemode leave
./hadoop dfsadmin -setQuota 1500 /xwchen
./hadoop fsck
./hadoop fsck /xwchen -files -blocks
fsck作用:
-- 检查文件系统的一个健康状态
-- 查看一个文件所在的数据块
-- 删除一个坏块
-- 查找一个缺失的块
磁盘均衡器balancer
./hadoop balancer 或 start-balancer.sh
文件归档Archive
./hadoop archive -archiveName pack.har -p /xwchen hadoop archiveDir
./hadoop fs -lsr /user/michaelchen/archiveDir/pack.har
./hadoop fs -cat /user/michaelchen/archiveDir/pack.har/part-0
Shell脚本分析:
1)start-dfs.sh
2) hadoop-daemon.sh 调用了一个个java程序
hadoop-daemons.sh 与 hadoop-daemon.sh 区别:前者启动多台机器上的daemon,后者负责在一台机器上启动daemon,前者调用后者。连接这两着的桥梁就是sbin/slave.sh,就是通过ssh登陆到slave机器上,然后在每台slave机器上执行hadoop-daemon.sh。
多研究例子文件:jar -tf hadoop-examples-1.2.1.jar