HDFS的安装和部署


1.准备工作

准备3台机器,设置好hosts

一台作为Namenodecc-staging-session2命名为master

两台作为dataNodecc-staging-front命名为slave1, cc-staging-imcenter 命名为slave2


#3台机器都创建hadoop用户

useradd hadoop

passwd hadoop


# 安装JDK,并设置JAVA_HOMEPATH

#下载安装jdk1.7

http://www.oracle.com/technetwork/java/javase/downloads/index.html

tar zxvf jdk-7u21-linux-x64.gz -C /usr/local/


#/etc/profile增加环境变量

pathmunge /usr/local/jdk1.7.0_21/bin

export JAVA_HOME=/usr/local/jdk1.7.0_21/

export JRE_HOME=/usr/local/jdk1.7.0_21/jre

export  CLASSPATH=.:$JAVA_HOME/lib/tools.jar:$JAVA_HOME/lib/dt.jar



2.下载安装hadoop

#下载hadoop

下载地址https://ccp.cloudera.com/display/SUPPORT/CDH3+Downloadable+Tarballs

wget  http://archive.cloudera.com/cdh/3/hadoop-0.20.2-cdh3u6.tar.gz

wget  http://archive.cloudera.com/cdh/3/hbase-0.90.6-cdh3u6.tar.gz

wget  http://archive.cloudera.com/cdh/3/hive-0.7.1-cdh3u6.tar.gz


#3太机器上创建相同的目录路径, name目录只存放在master上,且权限为755,否则会导致后面的格式化失败

mkdir -p  /hadoop/{install,name,data1, data2,tmp}


#解压安装包到/hadoop/install

tar zxvf hadoop-0.20.2-cdh3u6.tar.gz -C /hadoop/install/


#修改属主为hadoop

chown -R hadoop.hadoop /hadoop



3.设置hadoop账户的ssh信任关系

#master机器上操作

su – hadoop

ssh-keygen

ssh-copy-id -i .ssh/id_rsa.pub  hadoop@cc-staging-front

ssh-copy-id -i .ssh/id_rsa.pub  hadoop@cc-staging-imcenter

ssh-copy-id -i .ssh/id_rsa.pub  hadoop@cc-staging-session2


#测试一下,都能成功登录就行

ssh hadoop@master

ssh hadoop@slave1

ssh hadoop@slave2



4.编辑HDFS配置文件,所以节点都有保持一致

cd /hadoop/install/hadoop-0.20.2-cdh3u6/conf

#core-site.xml核心配置

<configuration>

<property>

      <name>fs.default.name</name>

      <value>hdfs://master:9000</value>

 </property>

</configuration>


#hdfs-site.xml:站点多项参数配置

<configuration>

<property>

      <name>dfs.replication</name>

     <value>2</value>

</property>

<property>

     <name>dfs.name.dir</name>

     <value>/hadoop/name</value>

</property>

<property>

     <name>dfs.data.dir</name>

      <value>/hadoop/data1,/hadoop/data2</value>

</property>

<property>

     <name>dfs.tmp.dir</name>

     <value>/hadoop/tmp</value>

</property>

</configuration>


#hadoop-env.sh中配置JAVA_HOME变量

export JAVA_HOME=/usr/local/jdk1.7.0_21/


5.初始化namenode节点

#master上操作,格式化Image文件的存储空间,必需是大写的Y

su - hadoop

cd  /hadoop/install/hadoop-0.20.2-cdh3u6/bin/

[hadoop@cc-staging-session2 bin]$ ./hadoop  namenode -format

13/04/27 01:46:40 INFO namenode.NameNode:  STARTUP_MSG:

/************************************************************

STARTUP_MSG: Starting NameNode

STARTUP_MSG:   host = cc-staging-session2/127.0.0.1

STARTUP_MSG:   args = [-format]

STARTUP_MSG:   version = 0.20.2-cdh3u6

STARTUP_MSG:   build = git://ubuntu-slave01/var/lib/jenkins/workspace/CDH3u6-Full-RC/build/cdh3/hadoop20/0.20.2-cdh3u6/source  -r efb405d2aa54039bdf39e0733cd0bb9423a1eb0a; compiled by 'jenkins' on Wed Mar  20 11:45:36 PDT 2013

************************************************************/

Re-format filesystem in  /hadoop/name ? (Y or N) Y

13/04/27 01:46:42 INFO util.GSet: VM  type       = 64-bit

13/04/27 01:46:42 INFO util.GSet: 2% max  memory = 17.77875 MB

13/04/27 01:46:42 INFO util.GSet:  capacity      = 2^21 = 2097152 entries

13/04/27 01:46:42 INFO util.GSet:  recommended=2097152, actual=2097152

13/04/27 01:46:42 INFO  namenode.FSNamesystem: fsOwner=hadoop (auth:SIMPLE)

13/04/27 01:46:42 INFO  namenode.FSNamesystem: supergroup=supergroup

13/04/27 01:46:42 INFO  namenode.FSNamesystem: isPermissionEnabled=true

13/04/27 01:46:42 INFO  namenode.FSNamesystem: dfs.block.invalidate.limit=1000

13/04/27 01:46:42 INFO  namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0  min(s), accessTokenLifetime=0 min(s)

13/04/27 01:46:43 INFO common.Storage:  Image file of size 112 saved in 0 seconds.

13/04/27 01:46:43 INFO common.Storage: Storage directory /hadoop/name has been successfully  formatted.

13/04/27 01:46:43 INFO namenode.NameNode:  SHUTDOWN_MSG:

/************************************************************

SHUTDOWN_MSG: Shutting down NameNode at  cc-staging-session2/127.0.0.1

************************************************************/


#启动namenodedatanode

cd  /hadoop/install/hadoop-0.20.2-cdh3u6/bin/

./hadoop-daemon.sh start namenode


#/hadoop/install/hadoop-0.20.2-cdh3u6/bin/下有很多命令,
 * start-all.sh 启动所有的Hadoop守护,包括namenode,  datanodejobtrackertasktracksecondarynamenode
 * stop-all.sh 停止所有的Hadoop
 * start-mapred.sh 启动Map/Reduce守护,包括JobtrackerTasktrack
 * stop-mapred.sh 停止Map/Reduce守护
 * start-dfs.sh 启动Hadoop DFS守护,NamenodeDatanode
 * stop-dfs.sh 停止DFS守护


#slave1slave2上启动datanode

cd  /hadoop/install/hadoop-0.20.2-cdh3u6/bin/

./hadoop-daemon.sh start datanode


#可以在各个节点上运行jps命令查看是否启动成功

[hadoop@cc-staging-session2 bin]$  jps

11926 NameNode

12566 Jps

12233 SecondaryNameNode

12066 DataNode


#数据节点必需在硬盘上不然会报错

[hadoop@cc-staging-front bin]$ jps

14582 DataNode

14637 Jps


[hadoop@cc-staging-imcenter bin]$ jps

23355 DataNode

23419 Jps


#



6.简单的测试

#在任意一个节点创建目录:

./hadoop dfs -mkdir test


#可以在所有数据节点上查看到目录:

./hadoop dfs -ls

Found 2 items

drwxr-xr-x   - hadoop supergroup          0 2013-04-27 02:32  /user/hadoop/test


#拷贝文件,即把本地的文件存放到HDFS

./hadoop dfs -put /etc/services test


#删除文件

./hadoop dfs -rm test/services