Progressive Notes

Don't tell me how educated you are. Tell me how much you have traveled.

[Hadoop培训笔记]02-HDFS集群的安装与部署

注:开源力量Hadoop Development网络培训,链接:http://new.osforce.cn/course/52  个人笔记,不具参考性。


选用 hadoop-1.2.1, hadoop-2.0.3-alpha,地址:http://www.apache.org/dyn/closer.cgi/hadoop/common/

在Linux下解压hadoop-1.2.1, tar -zxvf,用tree命令可看到目录结构

了解hadoop-1.2.1中的bin、conf、sbin、src:

  • bin:hadoop的启动脚本和命令脚本
  • conf:hadoop的配置信息
  • sbin:hadoop的环境配置脚本
  • src:hadoop的源代码
hadoop-2.0.3-alpha中没有src:
  • etc:hadoop的所有配置信息
  • share:hadoop相关的jar包和对应的字节码

Hadoop部署:独立模式、伪分布式模式、分布式模式


HDFS配置:
主要是:conf/core-site.xml,conf/hdfs-site.xml,conf/mapred-site.xml


以下是在hadoop-1.2.1配置:

conf/core-site.xml
--------------------
  6 <configuration>
  7   <property>
  8     <name>fs.default.name</name>
  9     <value>hdfs://192.168.128.3:9100</value>
 10   </property>
 11 </configuration>
--------------------

conf/hdfs-site.xml
--------------------
  6 <configuration>
  7   <property>
  8     <name>dfs.replication</name>
  9     <value>1</value>
 10   </property>
 11
 12   <property>
 13     <name>dfs.name.dir</name>
 14     <value>/tmp/hadoop/dfs/name</value>
 15   </property>
 16
 17   <property>
 18     <name>dfs.data.dir</name>
 19     <value>/tmp/hadoop/dfs/data</value>
 20   </property>
 21 </configuration>
--------------------

在hadoop-1.2.1/bin 中执行 start-dfs.sh时,出现错误:JAVA_HOME is not set

下载JDK,使用pwd获得JDK地址:/home/michaelchen/hadoop/jdk1.7.0_45
(注意jdk的是32位还是64位,要与操作系统对应,可用uname查看。否则会出现cannot execute错误)
将上述地址添加到hadoop-1.2.1/conf/hadoop-env.sh中(找到JAVA_HOME,去掉#)

再次执行 start-dfs.sh,成功。
----------------------------------
[michaelchen@mars bin]$ ./start-dfs.sh
starting namenode, logging to /home/michaelchen/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-michaelchen-namenode-mars.clustertech.com.out
localhost: starting datanode, logging to /home/michaelchen/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-michaelchen-datanode-mars.clustertech.com.out
localhost: starting secondarynamenode, logging to /home/michaelchen/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-michaelchen-secondarynamenode-mars.clustertech.com.out

[michaelchen@mars bin]$ jps
6585 DataNode
6721 SecondaryNameNode
6797 Jps
6450 NameNode
----------------------------------
SecondaryNameNode一是镜像备份,二是日志与镜像的定期合并。

使用jps,出现过问题:-bash: jps: command not found
解决方法:在JDK的bin下,执行:alias jps='/home/michaelchen/hadoop/jdk1.7.0_45/bin/jps'

jps执行后,DataNode,SecondaryNameNode都能看到,但是奇怪,namenode看不到(无法启动)。
重现现象:删除/tmp 目录里的hadoop文件夹
解决方法:在bin目录里,执行:./hadoop namenode -format
原因解释:
---------------------------------

Formatting the Namenode

The first step to starting up your Hadoop installation is formatting the Hadoop filesystem, which is implemented on top of the local filesystems of your cluster. You need to do this the first time you set up a Hadoop installation. Do not format a running Hadoop filesystem, this will cause all your data to be erased. Before formatting, ensure that thedfs.name.dir directory exists. If you just used the default, then mkdir -p /tmp/hadoop-username/dfs/name will create the directory. To format the filesystem (which simply initializes the directory specified by the dfs.name.dir variable), run the command: 
% $HADOOP_INSTALL/hadoop/bin/hadoop namenode -format

If asked to [re]format, you must reply Y (not just y) if you want to reformat, else Hadoop will abort the format.

---------------------------------


./hadoop fs -put hadoop /  (拷贝hadoop到指定目录)
./hadoop fs -lsr /  (显示拷贝文件)
./hadoop fs -du /  (占用磁盘空间)
./hadoop fs -rm /hadoop  (删除文件)
./hadoop fs -rmr /xwchen   (删除目录下所有文件及目录)
./hadoop fs -mkdir /xwchen  (创建目录)
./hadoop fs -lsr /  (显示目录)

./hadoop dfsadmin -report  (文件系统的基本信息和统计信息)

或者通过网页端:http://192.168.128.3:50070/  也可以显示

./hadoop dfsadmin -safemode enter
./hadoop dfsadmin -safemode leave


./hadoop dfsadmin -setQuota 1500 /xwchen


./hadoop fsck
./hadoop fsck /xwchen -files -blocks
fsck作用:
-- 检查文件系统的一个健康状态
-- 查看一个文件所在的数据块
-- 删除一个坏块
-- 查找一个缺失的块


磁盘均衡器balancer
./hadoop balancer  或  start-balancer.sh 


文件归档Archive
./hadoop archive -archiveName pack.har -p /xwchen hadoop archiveDir
./hadoop fs -lsr /user/michaelchen/archiveDir/pack.har
./hadoop fs -cat /user/michaelchen/archiveDir/pack.har/part-0


Shell脚本分析:
1)start-dfs.sh
2)   hadoop-daemon.sh  调用了一个个java程序

hadoop-daemons.sh 与 hadoop-daemon.sh 区别:前者启动多台机器上的daemon,后者负责在一台机器上启动daemon,前者调用后者。连接这两着的桥梁就是sbin/slave.sh,就是通过ssh登陆到slave机器上,然后在每台slave机器上执行hadoop-daemon.sh。


多研究例子文件:jar -tf hadoop-examples-1.2.1.jar

阅读更多
个人分类: Big Data
想对作者说点什么? 我来说一句

没有更多推荐了,返回首页

加入CSDN,享受更精准的内容推荐,与500万程序员共同成长!
关闭
关闭