原文来自 http://zhans52.iteye.com/blog/1102649,忘情游天下的CentOS 安装 hadoop(伪分布模式)。我的第一个hadoop测试环境,就是参考这篇文档搭建的。在搭建过程中,对原文一些SSH设置不太详细的地方做一些补充。感谢忘情游天下提供的好文章。
我装的CentOS 6 虚拟机,
软件准备:jdk 1.6 U29
hadoop:hadoop-0.20.205.tar.gz
ava安装文件和后续的hadoop安装文件都可以先下载到windows机器上,然后把windows的目录设置为共享。在centos虚拟机中执行
类似下面这样的命令,mount Windows机器中共享的目录
mount -t cifs -o username=******,password=****** //server/share /local/dir
第一步:ssh检查配置(补充authorized_keys文件权限的修改,并且用的是非root用户)
1)首先切换到要后续用来运行hadoop的用户,本文中用户名为dev
[root@localhost ~]# su dev
2)执行下面脚本,对应屏幕的提示直接回车
[dev@localhost lib]$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/dev/.ssh/id_rsa): 直接回车
Enter passphrase (empty for no passphrase): 直接回车
Enter same passphrase again: 直接回车
Your identification has been saved in /home/dev/.ssh/id_rsa.
Your public key has been saved in /home/dev/.ssh/id_rsa.pub.
The key fingerprint is:
21:f8:47:01:e0:44:7c:d2:1d:4f:73:87:c4:c8:76:3f dev@localhost.localdomain
The key's randomart image is:
+--[ RSA 2048]----+
| o+o.oooo+o.. |
| oo.o .++o+. |
| oo. o... . |
| . o . E |
| . S . |
| . |
| |
| |
| |
+-----------------+
3)进入用户在/home下对应目录
[dev@localhost lib]$ cd /home/dev/.ssh
4)生成文件authorized_keys文件
[dev@localhost .ssh]$ cat id_rsa.pub > authorized_keys
5)修改文件authorized_keys属性(这个地方也是查了很久的网上资料才找到,呵呵)
[dev@localhost .ssh]$ chmod 600 authorized_keys
6)验证方法(输入ssh localhost,不再提示输入密码即说明设置成功)
[dev@localhost .ssh]$ ssh localhost
Last login: Tue Dec 6 17:23:13 2011 from localhost.localdomain
[dev@localhost ~]$ exit
注意点:
1)选择java的版本,64位操作系统,请选择x64后缀的bin
jdk-6u29-linux-i586.bin jdk-6u29-linux-x64.bin
2)选择好Java的安装目录
执行类似下面的命令(红字部分和选择的版本有关系)
[root@localhost java]# chmod +x jdk-6u26-linux-i586.bin
[root@localhost java]# ./jdk-6u26-linux-i586.bin
......
......
......
For more information on what data Registration collects and
how it is managed and used, see:
http://java.sun.com/javase/registration/JDKRegistrationPrivacy.html
Press Enter to continue.....
Done.
安装完成后生成文件夹:jdk1.6.0_26
第三步:配置环境变量(红字部分和版本,安装后的具体目录相关)
[root@localhost java]# vi /etc/profile
#添加如下信息
# set java environment
export JAVA_HOME=/usr/java/jdk1.6.0_29
export CLASSPATH=$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib
export PATH=$JAVA_HOME/lib:$JAVA_HOME/jre/bin:$PATH:$HOME/bin
export HADOOP_HOME=/usr/local/hadoop/hadoop-0.20.205
export PATH=$PATH:$HADOOP_HOME/bin
[root@localhost java]# chmod +x /etc/profile
[root@localhost java]# source /etc/profile
[root@localhost java]#
[root@localhost java]# java -version
java version "1.6.0_29"
Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
Java HotSpot(TM) Client VM (build 20.1-b02, mixed mode, sharing)
[root@localhost java]#
第四步:修改HOST文件
- [root@localhost conf]# vi /etc/hosts
- # Do not remove the following line, or various programs
- # that require network functionality will fail.
- 127.0.0.1 localhost.localdomain localhost
- ::1 localhost6.localdomain6 localhost6
- 127.0.0.1 namenode datanode01
第五步:解压缩并安装hadoop
- [root@localhost hadoop]# tar zxvf hadoop-0.20.203.tar.gz
- ......
- ......
- ......
- hadoop-0.20.203.0/src/contrib/ec2/bin/image/create-hadoop-image-remote
- hadoop-0.20.203.0/src/contrib/ec2/bin/image/ec2-run-user-data
- hadoop-0.20.203.0/src/contrib/ec2/bin/launch-hadoop-cluster
- hadoop-0.20.203.0/src/contrib/ec2/bin/launch-hadoop-master
- hadoop-0.20.203.0/src/contrib/ec2/bin/launch-hadoop-slaves
- hadoop-0.20.203.0/src/contrib/ec2/bin/list-hadoop-clusters
- hadoop-0.20.203.0/src/contrib/ec2/bin/terminate-hadoop-cluster
- [root@localhost hadoop]#
第六步:进入hadoop的conf目录修改配置文件(配置文件中具体设置的目录和hadoop安装目录相关)
- ####################################
- [root@localhost conf]# vi hadoop-env.sh
- # 添加代码
- # set java environment
- export JAVA_HOME=/usr/java/jdk1.6.0_26
- #####################################
- [root@localhost conf]# vi core-site.xml
- <?xml version="1.0"?>
- <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
- <!-- Put site-specific property overrides in this file. -->
- <configuration>
- <property>
- <name>fs.default.name</name>
- <value>hdfs://namenode:9000/</value>
- </property>
- <property>
- <name>hadoop.tmp.dir</name>
- <value>/usr/local/hadoop/hadooptmp</value>
- </property>
- </configuration>
- #######################################
- [root@localhost conf]# vi hdfs-site.xml
- <?xml version="1.0"?>
- <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
- <!-- Put site-specific property overrides in this file. -->
- <configuration>
- <property>
- <name>dfs.name.dir</name>
- <value>/usr/local/hadoop/hdfs/name</value>
- </property>
- <property>
- <name>dfs.data.dir</name>
- <value>/usr/local/hadoop/hdfs/data</value>
- </property>
- <property>
- <name>dfs.replication</name>
- <value>1</value>
- </property>
- </configuration>
- #########################################
- [root@localhost conf]# vi mapred-site.xml
- <?xml version="1.0"?>
- <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
- <!-- Put site-specific property overrides in this file. -->
- <configuration>
- <property>
- <name>mapred.job.tracker</name>
- <value>namenode:9001</value>
- </property>
- <property>
- <name>mapred.local.dir</name>
- <value>/usr/local/hadoop/mapred/local</value>
- </property>
- <property>
- <name>mapred.system.dir</name>
- <value>/tmp/hadoop/mapred/system</value>
- </property>
- </configuration>
- #########################################
- [root@localhost conf]# vi masters
- #localhost
- namenode
- #########################################
- [root@localhost conf]# vi slaves
- #localhost
- datanode01
第七步:启动hadoop
-
- [root@localhost bin]# hadoop namenode -format
- 11/06/23 00:43:54 INFO namenode.NameNode: STARTUP_MSG:
- /************************************************************
- STARTUP_MSG: Starting NameNode
- STARTUP_MSG: host = localhost.localdomain/127.0.0.1
- STARTUP_MSG: args = [-format]
- STARTUP_MSG: version = 0.20.203.0
- STARTUP_MSG: build = http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-security-203 -r 1099333; compiled by 'oom' on Wed May 4 07:57:50 PDT 2011
- ************************************************************/
- 11/06/23 00:43:55 INFO util.GSet: VM type = 32-bit
- 11/06/23 00:43:55 INFO util.GSet: 2% max memory = 19.33375 MB
- 11/06/23 00:43:55 INFO util.GSet: capacity = 2^22 = 4194304 entries
- 11/06/23 00:43:55 INFO util.GSet: recommended=4194304, actual=4194304
- 11/06/23 00:43:56 INFO namenode.FSNamesystem: fsOwner=root
- 11/06/23 00:43:56 INFO namenode.FSNamesystem: supergroup=supergroup
- 11/06/23 00:43:56 INFO namenode.FSNamesystem: isPermissionEnabled=true
- 11/06/23 00:43:56 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
- 11/06/23 00:43:56 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
- 11/06/23 00:43:56 INFO namenode.NameNode: Caching file names occuring more than 10 times
- 11/06/23 00:43:57 INFO common.Storage: Image file of size 110 saved in 0 seconds.
- 11/06/23 00:43:57 INFO common.Storage: Storage directory /usr/local/hadoop/hdfs/name has been successfully formatted.
- 11/06/23 00:43:57 INFO namenode.NameNode: SHUTDOWN_MSG:
- /************************************************************
- SHUTDOWN_MSG: Shutting down NameNode at localhost.localdomain/127.0.0.1
- ************************************************************/
- [root@localhost bin]#
- ###########################################
- [root@localhost bin]# ./start-all.sh
- starting namenode, logging to /usr/local/hadoop/hadoop-0.20.203/bin/../logs/hadoop-root-namenode-localhost.localdomain.out
- datanode01: starting datanode, logging to /usr/local/hadoop/hadoop-0.20.203/bin/../logs/hadoop-root-datanode-localhost.localdomain.out
- namenode: starting secondarynamenode, logging to /usr/local/hadoop/hadoop-0.20.203/bin/../logs/hadoop-root-secondarynamenode-localhost.localdomain.out
- starting jobtracker, logging to /usr/local/hadoop/hadoop-0.20.203/bin/../logs/hadoop-root-jobtracker-localhost.localdomain.out
- datanode01: starting tasktracker, logging to /usr/local/hadoop/hadoop-0.20.203/bin/../logs/hadoop-root-tasktracker-localhost.localdomain.out
- [root@localhost bin]# jps
- 11971 TaskTracker
- 11807 SecondaryNameNode
- 11599 NameNode
- 12022 Jps
- 11710 DataNode
- 11877 JobTracker
查看集群状态
- [root@localhost bin]# hadoop dfsadmin -report
- Configured Capacity: 4055396352 (3.78 GB)
- Present Capacity: 464142351 (442.64 MB)
- DFS Remaining: 464089088 (442.59 MB)
- DFS Used: 53263 (52.01 KB)
- DFS Used%: 0.01%
- Under replicated blocks: 0
- Blocks with corrupt replicas: 0
- Missing blocks: 0
- -------------------------------------------------
- Datanodes available: 1 (1 total, 0 dead)
- Name: 127.0.0.1:50010
- Decommission Status : Normal
- Configured Capacity: 4055396352 (3.78 GB)
- DFS Used: 53263 (52.01 KB)
- Non DFS Used: 3591254001 (3.34 GB)
- DFS Remaining: 464089088(442.59 MB)
- DFS Used%: 0%
- DFS Remaining%: 11.44%
- Last contact: Thu Jun 23 01:11:15 PDT 2011
- [root@localhost bin]#
其他问题: 1
- ####################启动报错##########
- [root@localhost bin]# ./start-all.sh
- starting namenode, logging to /usr/local/hadoop/hadoop-0.20.203/bin/../logs/hadoop-root-namenode-localhost.localdomain.out
- The authenticity of host 'datanode01 (127.0.0.1)' can't be established.
- RSA key fingerprint is 41:c8:d4:e4:60:71:6f:6a:33:6a:25:27:62:9b:e3:90.
- Are you sure you want to continue connecting (yes/no)? y
- Please type 'yes' or 'no': yes
- datanode01: Warning: Permanently added 'datanode01' (RSA) to the list of known hosts.
- datanode01: starting datanode, logging to /usr/local/hadoop/hadoop-0.20.203/bin/../logs/hadoop-root-datanode-localhost.localdomain.out
- <strong><span style="color: #ff0000;">datanode01: Unrecognized option: -jvm
- datanode01: Could not create the Java virtual machine.</span>
- </strong>
- namenode: starting secondarynamenode, logging to /usr/local/hadoop/hadoop-0.20.203/bin/../logs/hadoop-root-secondarynamenode-localhost.localdomain.out
- starting jobtracker, logging to /usr/local/hadoop/hadoop-0.20.203/bin/../logs/hadoop-root-jobtracker-localhost.localdomain.out
- datanode01: starting tasktracker, logging to /usr/local/hadoop/hadoop-0.20.203/bin/../logs/hadoop-root-tasktracker-localhost.localdomain.out
- [root@localhost bin]# jps
- 10442 JobTracker
- 10533 TaskTracker
- 10386 SecondaryNameNode
- 10201 NameNode
- 10658 Jps
- ################################################
- [root@localhost bin]# vi hadoop
- elif [ "$COMMAND" = "datanode" ] ; then
- CLASS='org.apache.hadoop.hdfs.server.datanode.DataNode'
- if [[ $EUID -eq 0 ]]; then
- HADOOP_OPTS="$HADOOP_OPTS -jvm server $HADOOP_DATANODE_OPTS"
- else
- HADOOP_OPTS="$HADOOP_OPTS -server $HADOOP_DATANODE_OPTS"
- fi
- #http://javoft.net/2011/06/hadoop-unrecognized-option-jvm-could-not-create-the-java-virtual-machine/
- #改为
- elif [ "$COMMAND" = "datanode" ] ; then
- CLASS='org.apache.hadoop.hdfs.server.datanode.DataNode'
- # if [[ $EUID -eq 0 ]]; then
- # HADOOP_OPTS="$HADOOP_OPTS -jvm server $HADOOP_DATANODE_OPTS"
- # else
- HADOOP_OPTS="$HADOOP_OPTS -server $HADOOP_DATANODE_OPTS"
- # fi
- #或者换非root用户启动
- #启动成功
2,启动时要关闭防火墙
查看运行情况:
http://localhost:50070
- NameNode 'localhost.localdomain:9000'
- Started: Thu Jun 23 01:07:18 PDT 2011
- Version: 0.20.203.0, r1099333
- Compiled: Wed May 4 07:57:50 PDT 2011 by oom
- Upgrades: There are no upgrades in progress.
- Browse the filesystem
- Namenode Logs
- Cluster Summary
- 6 files and directories, 1 blocks = 7 total. Heap Size is 31.38 MB / 966.69 MB (3%)
- Configured Capacity : 3.78 GB
- DFS Used : 52.01 KB
- Non DFS Used : 3.34 GB
- DFS Remaining : 442.38 MB
- DFS Used% : 0 %
- DFS Remaining% : 11.44 %
- Live Nodes : 1
- Dead Nodes : 0
- Decommissioning Nodes : 0
- Number of Under-Replicated Blocks : 0
- NameNode Storage:
- Storage Directory Type State
- /usr/local/hadoop/hdfs/name IMAGE_AND_EDITS Active
http://localhost:50030
- namenode Hadoop Map/Reduce Administration
- Quick Links
- * Scheduling Info
- * Running Jobs
- * Retired Jobs
- * Local Logs
- State: RUNNING
- Started: Thu Jun 23 01:07:30 PDT 2011
- Version: 0.20.203.0, r1099333
- Compiled: Wed May 4 07:57:50 PDT 2011 by oom
- Identifier: 201106230107
- Cluster Summary (Heap Size is 15.31 MB/966.69 MB)
- Running Map Tasks Running Reduce Tasks Total Submissions Nodes Occupied Map Slots Occupied Reduce Slots Reserved Map Slots Reserved Reduce Slots Map Task Capacity Reduce Task Capacity Avg. Tasks/Node Blacklisted Nodes Graylisted Nodes Excluded Nodes
- 0 0 0 1 0 0 0 0 2 2 4.00 0 0 0
- Scheduling Information
- Queue Name State Scheduling Information
- default running N/A
- Filter (Jobid, Priority, User, Name)
- Example: 'user:smith 3200' will filter by 'smith' only in the user field and '3200' in all fields
- Running Jobs
- none
- Retired Jobs
- none
- Local Logs
- Log directory, Job Tracker History This is Apache Hadoop release 0.20.203.0
测试:
- ##########建立目录名称##########
- [root@localhost bin]# hadoop fs -mkdir testFolder
- ###############拷贝文件到文件夹中
- [root@localhost local]# ls
- bin etc games hadoop include lib libexec sbin share src SSH_key_file
- [root@localhost local]# hadoop fs -copyFromLocal SSH_key_file testFolder
- 进入web页面即可查看
参考:http://bxyzzy.blog.51cto.com/854497/352692