通过测试该教程是可行的。
转载:http://zhans52.iteye.com/blog/1102649
在本机上装的CentOS 5.5 虚拟机,
软件准备:jdk 1.6 U26
hadoop:hadoop-0.20.203.tar.gz
ssh检查配置
- [root@localhost ~]# ssh-keygen -t rsa
- Generating public/private rsa key pair.
- Enter file in which to save the key (/root/.ssh/id_rsa):
- Created directory '/root/.ssh'.
- Enter passphrase (empty for no passphrase):
- Enter same passphrase again:
- Your identification has been saved in /root/.ssh/id_rsa.
- Your public key has been saved in /root/.ssh/id_rsa.pub.
- The key fingerprint is:
- a8:7a:3e:f6:92:85:b8:c7:be:d9:0e:45:9c:d1:36:3b root@localhost.localdomain
- [root@localhost ~]#
- [root@localhost ~]# cd ..
- [root@localhost /]# cd root
- [root@localhost ~]# ls
- anaconda-ks.cfg Desktop install.log install.log.syslog
- [root@localhost ~]# cd .ssh
- [root@localhost .ssh]# cat id_rsa.pub > authorized_keys
- [root@localhost .ssh]#
- [root@localhost .ssh]# ssh localhost
- The authenticity of host 'localhost (127.0.0.1)' can't be established.
- RSA key fingerprint is 41:c8:d4:e4:60:71:6f:6a:33:6a:25:27:62:9b:e3:90.
- Are you sure you want to continue connecting (yes/no)? yes
- Warning: Permanently added 'localhost' (RSA) to the list of known hosts.
- Last login: Tue Jun 21 22:40:31 2011
- [root@localhost ~]#
安装jdk
- [root@localhost java]# chmod +x jdk-6u26-linux-i586.bin
- [root@localhost java]# ./jdk-6u26-linux-i586.bin
- ......
- ......
- ......
- For more information on what data Registration collects and
- how it is managed and used, see:
- http://java.sun.com/javase/registration/JDKRegistrationPrivacy.html
- Press Enter to continue.....
- Done.
安装完成后生成文件夹:jdk1.6.0_26
配置环境变量
- [root@localhost java]# vi /etc/profile
- #添加如下信息
- # set java environment
- export JAVA_HOME=/usr/java/jdk1.6.0_26
- export CLASSPATH=$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib
- export PATH=$JAVA_HOME/lib:$JAVA_HOME/jre/bin:$PATH:$HOME/bin
- export HADOOP_HOME=/usr/local/hadoop/hadoop-0.20.203
- export PATH=$PATH:$HADOOP_HOME/bin
- [root@localhost java]# chmod +x /etc/profile
- [root@localhost java]# source /etc/profile
- [root@localhost java]#
- [root@localhost java]# java -version
- java version "1.6.0_26"
- Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
- Java HotSpot(TM) Client VM (build 20.1-b02, mixed mode, sharing)
- [root@localhost java]#
修改hosts
- [root@localhost conf]# vi /etc/hosts
- # Do not remove the following line, or various programs
- # that require network functionality will fail.
- 127.0.0.1 localhost.localdomain localhost
- ::1 localhost6.localdomain6 localhost6
- 127.0.0.1 namenode datanode01
解压安装hadoop
- [root@localhost hadoop]# tar zxvf hadoop-0.20.203.tar.gz
- ......
- ......
- ......
- hadoop-0.20.203.0/src/contrib/ec2/bin/image/create-hadoop-image-remote
- hadoop-0.20.203.0/src/contrib/ec2/bin/image/ec2-run-user-data
- hadoop-0.20.203.0/src/contrib/ec2/bin/launch-hadoop-cluster
- hadoop-0.20.203.0/src/contrib/ec2/bin/launch-hadoop-master
- hadoop-0.20.203.0/src/contrib/ec2/bin/launch-hadoop-slaves
- hadoop-0.20.203.0/src/contrib/ec2/bin/list-hadoop-clusters
- hadoop-0.20.203.0/src/contrib/ec2/bin/terminate-hadoop-cluster
- [root@localhost hadoop]#
进入hadoop配置conf
- ####################################
- [root@localhost conf]# vi hadoop-env.sh
- # 添加代码
- # set java environment
- export JAVA_HOME=/usr/java/jdk1.6.0_26
- #####################################
- [root@localhost conf]# vi core-site.xml
- <?xml version="1.0"?>
- <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
- <!-- Put site-specific property overrides in this file. -->
- <configuration>
- <property>
- <name>fs.default.name</name>
- <value>hdfs://namenode:9000/</value>
- </property>
- <property>
- <name>hadoop.tmp.dir</name>
- <value>/usr/local/hadoop/hadooptmp</value>
- </property>
- </configuration>
- #######################################
- [root@localhost conf]# vi hdfs-site.xml
- <?xml version="1.0"?>
- <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
- <!-- Put site-specific property overrides in this file. -->
- <configuration>
- <property>
- <name>dfs.name.dir</name>
- <value>/usr/local/hadoop/hdfs/name</value>
- </property>
- <property>
- <name>dfs.data.dir</name>
- <value>/usr/local/hadoop/hdfs/data</value>
- </property>
- <property>
- <name>dfs.replication</name>
- <value>1</value>
- </property>
- </configuration>
- #########################################
- [root@localhost conf]# vi mapred-site.xml
- <?xml version="1.0"?>
- <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
- <!-- Put site-specific property overrides in this file. -->
- <configuration>
- <property>
- <name>mapred.job.tracker</name>
- <value>namenode:9001</value>
- </property>
- <property>
- <name>mapred.local.dir</name>
- <value>/usr/local/hadoop/mapred/local</value>
- </property>
- <property>
- <name>mapred.system.dir</name>
- <value>/tmp/hadoop/mapred/system</value>
- </property>
- </configuration>
- #########################################
- [root@localhost conf]# vi masters
- #localhost
- namenode
- #########################################
- [root@localhost conf]# vi slaves
- #localhost
- datanode01
启动 hadoop
- #####################<span style="font-size: small;">格式化namenode##############</span>
- [root@localhost bin]# hadoop namenode -format
- 11/06/23 00:43:54 INFO namenode.NameNode: STARTUP_MSG:
- /************************************************************
- STARTUP_MSG: Starting NameNode
- STARTUP_MSG: host = localhost.localdomain/127.0.0.1
- STARTUP_MSG: args = [-format]
- STARTUP_MSG: version = 0.20.203.0
- STARTUP_MSG: build = http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-security-203 -r 1099333; compiled by 'oom' on Wed May 4 07:57:50 PDT 2011
- ************************************************************/
- 11/06/23 00:43:55 INFO util.GSet: VM type = 32-bit
- 11/06/23 00:43:55 INFO util.GSet: 2% max memory = 19.33375 MB
- 11/06/23 00:43:55 INFO util.GSet: capacity = 2^22 = 4194304 entries
- 11/06/23 00:43:55 INFO util.GSet: recommended=4194304, actual=4194304
- 11/06/23 00:43:56 INFO namenode.FSNamesystem: fsOwner=root
- 11/06/23 00:43:56 INFO namenode.FSNamesystem: supergroup=supergroup
- 11/06/23 00:43:56 INFO namenode.FSNamesystem: isPermissionEnabled=true
- 11/06/23 00:43:56 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
- 11/06/23 00:43:56 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
- 11/06/23 00:43:56 INFO namenode.NameNode: Caching file names occuring more than 10 times
- 11/06/23 00:43:57 INFO common.Storage: Image file of size 110 saved in 0 seconds.
- 11/06/23 00:43:57 INFO common.Storage: Storage directory /usr/local/hadoop/hdfs/name has been successfully formatted.
- 11/06/23 00:43:57 INFO namenode.NameNode: SHUTDOWN_MSG:
- /************************************************************
- SHUTDOWN_MSG: Shutting down NameNode at localhost.localdomain/127.0.0.1
- ************************************************************/
- [root@localhost bin]#
- ###########################################
- [root@localhost bin]# ./start-all.sh
- starting namenode, logging to /usr/local/hadoop/hadoop-0.20.203/bin/../logs/hadoop-root-namenode-localhost.localdomain.out
- datanode01: starting datanode, logging to /usr/local/hadoop/hadoop-0.20.203/bin/../logs/hadoop-root-datanode-localhost.localdomain.out
- namenode: starting secondarynamenode, logging to /usr/local/hadoop/hadoop-0.20.203/bin/../logs/hadoop-root-secondarynamenode-localhost.localdomain.out
- starting jobtracker, logging to /usr/local/hadoop/hadoop-0.20.203/bin/../logs/hadoop-root-jobtracker-localhost.localdomain.out
- datanode01: starting tasktracker, logging to /usr/local/hadoop/hadoop-0.20.203/bin/../logs/hadoop-root-tasktracker-localhost.localdomain.out
- [root@localhost bin]# jps
- 11971 TaskTracker
- 11807 SecondaryNameNode
- 11599 NameNode
- 12022 Jps
- 11710 DataNode
- 11877 JobTracker
查看集群状态
- [root@localhost bin]# hadoop dfsadmin -report
- Configured Capacity: 4055396352 (3.78 GB)
- Present Capacity: 464142351 (442.64 MB)
- DFS Remaining: 464089088 (442.59 MB)
- DFS Used: 53263 (52.01 KB)
- DFS Used%: 0.01%
- Under replicated blocks: 0
- Blocks with corrupt replicas: 0
- Missing blocks: 0
- -------------------------------------------------
- Datanodes available: 1 (1 total, 0 dead)
- Name: 127.0.0.1:50010
- Decommission Status : Normal
- Configured Capacity: 4055396352 (3.78 GB)
- DFS Used: 53263 (52.01 KB)
- Non DFS Used: 3591254001 (3.34 GB)
- DFS Remaining: 464089088(442.59 MB)
- DFS Used%: 0%
- DFS Remaining%: 11.44%
- Last contact: Thu Jun 23 01:11:15 PDT 2011
- [root@localhost bin]#
其他问题: 1
- ####################启动报错##########
- [root@localhost bin]# ./start-all.sh
- starting namenode, logging to /usr/local/hadoop/hadoop-0.20.203/bin/../logs/hadoop-root-namenode-localhost.localdomain.out
- The authenticity of host 'datanode01 (127.0.0.1)' can't be established.
- RSA key fingerprint is 41:c8:d4:e4:60:71:6f:6a:33:6a:25:27:62:9b:e3:90.
- Are you sure you want to continue connecting (yes/no)? y
- Please type 'yes' or 'no': yes
- datanode01: Warning: Permanently added 'datanode01' (RSA) to the list of known hosts.
- datanode01: starting datanode, logging to /usr/local/hadoop/hadoop-0.20.203/bin/../logs/hadoop-root-datanode-localhost.localdomain.out
- <strong><span style="color: #ff0000;">datanode01: Unrecognized option: -jvm
- datanode01: Could not create the Java virtual machine.</span>
- </strong>
- namenode: starting secondarynamenode, logging to /usr/local/hadoop/hadoop-0.20.203/bin/../logs/hadoop-root-secondarynamenode-localhost.localdomain.out
- starting jobtracker, logging to /usr/local/hadoop/hadoop-0.20.203/bin/../logs/hadoop-root-jobtracker-localhost.localdomain.out
- datanode01: starting tasktracker, logging to /usr/local/hadoop/hadoop-0.20.203/bin/../logs/hadoop-root-tasktracker-localhost.localdomain.out
- [root@localhost bin]# jps
- 10442 JobTracker
- 10533 TaskTracker
- 10386 SecondaryNameNode
- 10201 NameNode
- 10658 Jps
- ################################################
- [root@localhost bin]# vi hadoop
- elif [ "$COMMAND" = "datanode" ] ; then
- CLASS='org.apache.hadoop.hdfs.server.datanode.DataNode'
- if [[ $EUID -eq 0 ]]; then
- HADOOP_OPTS="$HADOOP_OPTS -jvm server $HADOOP_DATANODE_OPTS"
- else
- HADOOP_OPTS="$HADOOP_OPTS -server $HADOOP_DATANODE_OPTS"
- fi
- #http://javoft.net/2011/06/hadoop-unrecognized-option-jvm-could-not-create-the-java-virtual-machine/
- #改为
- elif [ "$COMMAND" = "datanode" ] ; then
- CLASS='org.apache.hadoop.hdfs.server.datanode.DataNode'
- # if [[ $EUID -eq 0 ]]; then
- # HADOOP_OPTS="$HADOOP_OPTS -jvm server $HADOOP_DATANODE_OPTS"
- # else
- HADOOP_OPTS="$HADOOP_OPTS -server $HADOOP_DATANODE_OPTS"
- # fi
- #或者换非root用户启动
- #启动成功
2,启动时要关闭防火墙
查看运行情况:
http://localhost:50070
- NameNode 'localhost.localdomain:9000'
- Started: Thu Jun 23 01:07:18 PDT 2011
- Version: 0.20.203.0, r1099333
- Compiled: Wed May 4 07:57:50 PDT 2011 by oom
- Upgrades: There are no upgrades in progress.
- Browse the filesystem
- Namenode Logs
- Cluster Summary
- 6 files and directories, 1 blocks = 7 total. Heap Size is 31.38 MB / 966.69 MB (3%)
- Configured Capacity : 3.78 GB
- DFS Used : 52.01 KB
- Non DFS Used : 3.34 GB
- DFS Remaining : 442.38 MB
- DFS Used% : 0 %
- DFS Remaining% : 11.44 %
- Live Nodes : 1
- Dead Nodes : 0
- Decommissioning Nodes : 0
- Number of Under-Replicated Blocks : 0
- NameNode Storage:
- Storage Directory Type State
- /usr/local/hadoop/hdfs/name IMAGE_AND_EDITS Active
http://localhost:50030
- namenode Hadoop Map/Reduce Administration
- Quick Links
- * Scheduling Info
- * Running Jobs
- * Retired Jobs
- * Local Logs
- State: RUNNING
- Started: Thu Jun 23 01:07:30 PDT 2011
- Version: 0.20.203.0, r1099333
- Compiled: Wed May 4 07:57:50 PDT 2011 by oom
- Identifier: 201106230107
- Cluster Summary (Heap Size is 15.31 MB/966.69 MB)
- Running Map Tasks Running Reduce Tasks Total Submissions Nodes Occupied Map Slots Occupied Reduce Slots Reserved Map Slots Reserved Reduce Slots Map Task Capacity Reduce Task Capacity Avg. Tasks/Node Blacklisted Nodes Graylisted Nodes Excluded Nodes
- 0 0 0 1 0 0 0 0 2 2 4.00 0 0 0
- Scheduling Information
- Queue Name State Scheduling Information
- default running N/A
- Filter (Jobid, Priority, User, Name)
- Example: 'user:smith 3200' will filter by 'smith' only in the user field and '3200' in all fields
- Running Jobs
- none
- Retired Jobs
- none
- Local Logs
- Log directory, Job Tracker History This is Apache Hadoop release 0.20.203.0
测试:
- ##########建立目录名称##########
- [root@localhost bin]# hadoop fs -mkdir testFolder
- ###############拷贝文件到文件夹中
- [root@localhost local]# ls
- bin etc games hadoop include lib libexec sbin share src SSH_key_file
- [root@localhost local]# hadoop fs -copyFromLocal SSH_key_file testFolder
- 进入web页面即可查看