这两天学习了一下hadoop,在虚机上做了伪部署,暂时把安装和测试过程记录下来备忘(未做整理也未作排版哦),等后面有时间了再补充完整吧。 ^^ 顺便庆祝下自己终于使用全了博客、微博和微信(是不是有点out了)
![疑问](http://static.blog.csdn.net/xheditor/xheditor_emot/default/doubt.gif)
tar -vxf hadoop-1.1.2.tar.gz
cd hadoop-1.1.2/conf
vi hadoop-env.sh
export JAVA_HOME=/usr/java/jdk1.6.0_26
export HADOOP_HOME=/home/centos/soft/hadoop-1.1.2
export PATH=$PATH:/home/centos/soft/hadoop-1.1.2/bin
. ./conf/hadoop-env.sh
vi core-site.xml
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/centos/soft/hadooptmp/hadoop-${user.name}</value>
</property>
vi hdfs-site.xml
<property>
<name>dfs.name.dir</name>
<value>/home/centos/soft/hadoop/hdfs/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/centos/soft/hadoop/hdfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
Vi mapred-site.xml
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
vi masters
127.0.0.1
vi slaves
127.0.0.1
vi /etc/hosts
127.0.0.1 localhost localhost.localdomin cxz.localdomain
127.0.0.1 master
127.0.0.1 slave
mkidr /home/centos/soft/hadoop/
mkidr /home/centos/soft/hadoop/name
mkidr /home/centos/soft/hadoop/data
mkidr /home/centos/soft/hadooptmp/
格式化
./bin/hadoop namenode –format
13/08/11 16:16:27 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = cxz.localdomain/127.0.0.1
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 1.1.2
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.1 -r 1440782; compiled by 'hortonfo' on Thu Jan 31 02:03:24 UTC 2013
************************************************************/
Re-format filesystem in /home/centos/soft/hadoop/hdfs/name ? (Y or N) Y
13/08/11 16:16:29 INFO util.GSet: VM type = 64-bit
13/08/11 16:16:29 INFO util.GSet: 2% max memory = 17.77875 MB
13/08/11 16:16:29 INFO util.GSet: capacity = 2^21 = 2097152 entries
13/08/11 16:16:29 INFO util.GSet: recommended=2097152, actual=2097152
13/08/11 16:16:29 INFO namenode.FSNamesystem: fsOwner=centos
13/08/11 16:16:29 INFO namenode.FSNamesystem: supergroup=supergroup
13/08/11 16:16:29 INFO namenode.FSNamesystem: isPermissionEnabled=true
13/08/11 16:16:29 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
13/08/11 16:16:29 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
13/08/11 16:16:29 INFO namenode.NameNode: Caching file names occuring more than 10 times
13/08/11 16:16:30 INFO common.Storage: Image file of size 112 saved in 0 seconds.
13/08/11 16:16:30 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/home/centos/soft/hadoop/hdfs/name/current/edits
13/08/11 16:16:30 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/home/centos/soft/hadoop/hdfs/name/current/edits
13/08/11 16:16:30 INFO common.Storage: Storage directory /home/centos/soft/hadoop/hdfs/name has been successfully formatted.
13/08/11 16:16:30 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at cxz.localdomain/127.0.0.1
************************************************************/
启动hadoop
$ ./bin/start-all.sh
starting namenode, logging to /home/centos/soft/hadoop-1.1.2/logs/hadoop-centos-namenode-cxz.localdomain.out
127.0.0.1: Warning: $HADOOP_HOME is deprecated.
127.0.0.1:
127.0.0.1: starting datanode, logging to /home/centos/soft/hadoop-1.1.2/logs/hadoop-centos-datanode-cxz.localdomain.out
127.0.0.1: Warning: $HADOOP_HOME is deprecated.
127.0.0.1:
127.0.0.1: starting secondarynamenode, logging to /home/centos/soft/hadoop-1.1.2/logs/hadoop-centos-secondarynamenode-cxz.localdomain.out
starting jobtracker, logging to /home/centos/soft/hadoop-1.1.2/logs/hadoop-centos-jobtracker-cxz.localdomain.out
127.0.0.1: Warning: $HADOOP_HOME is deprecated.
127.0.0.1:
127.0.0.1: starting tasktracker, logging to /home/centos/soft/hadoop-1.1.2/logs/hadoop-centos-tasktracker-cxz.localdomain.out
$ jps
13121 NameNode
13581 TaskTracker
13461 JobTracker
19761 Jps
13378 SecondaryNameNode
$ ./bin/hadoop namenode -report
13/08/11 17:57:04 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = cxz.localdomain/127.0.0.1
STARTUP_MSG: args = [-report]
STARTUP_MSG: version = 1.1.2
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.1 -r 1440782; compiled by 'hortonfo' on Thu Jan 31 02:03:24 UTC 2013
************************************************************/
Usage: java NameNode [-format [-force ] [-nonInteractive]] | [-upgrade] | [-rollback] | [-finalize] | [-importCheckpoint] | [-recover [ -force ] ]
13/08/11 17:57:04 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at cxz.localdomain/127.0.0.1
************************************************************/
$ ./bin/hadoop jar hadoop-examples-1.1.2.jar pi 4 2
Number of Maps = 4
Samples per Map = 2
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Starting Job
13/08/11 19:45:06 INFO mapred.FileInputFormat: Total input paths to process : 4
13/08/11 19:45:07 INFO mapred.JobClient: Running job: job_201308111944_0001
13/08/11 19:45:08 INFO mapred.JobClient: map 0% reduce 0%
13/08/11 19:45:33 INFO mapred.JobClient: map 25% reduce 0%
13/08/11 19:45:38 INFO mapred.JobClient: map 50% reduce 0%
13/08/11 19:46:15 INFO mapred.JobClient: map 50% reduce 16%
13/08/11 19:47:10 INFO mapred.JobClient: map 75% reduce 16%
13/08/11 19:47:18 INFO mapred.JobClient: map 100% reduce 16%
13/08/11 19:47:21 INFO mapred.JobClient: map 100% reduce 25%
13/08/11 19:47:25 INFO mapred.JobClient: map 100% reduce 100%
13/08/11 19:47:33 INFO mapred.JobClient: Job complete: job_201308111944_0001
13/08/11 19:47:33 INFO mapred.JobClient: Counters: 30
13/08/11 19:47:33 INFO mapred.JobClient: Job Counters
13/08/11 19:47:33 INFO mapred.JobClient: Launched reduce tasks=1
13/08/11 19:47:33 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=249989
13/08/11 19:47:33 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
13/08/11 19:47:33 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
13/08/11 19:47:33 INFO mapred.JobClient: Launched map tasks=4
13/08/11 19:47:33 INFO mapred.JobClient: Data-local map tasks=4
13/08/11 19:47:33 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=111499
13/08/11 19:47:33 INFO mapred.JobClient: File Input Format Counters
13/08/11 19:47:33 INFO mapred.JobClient: Bytes Read=472
13/08/11 19:47:33 INFO mapred.JobClient: File Output Format Counters
13/08/11 19:47:33 INFO mapred.JobClient: Bytes Written=97
13/08/11 19:47:33 INFO mapred.JobClient: FileSystemCounters
13/08/11 19:47:33 INFO mapred.JobClient: FILE_BYTES_READ=94
13/08/11 19:47:33 INFO mapred.JobClient: HDFS_BYTES_READ=964
13/08/11 19:47:33 INFO mapred.JobClient: FILE_BYTES_WRITTEN=290200
13/08/11 19:47:33 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=215
13/08/11 19:47:33 INFO mapred.JobClient: Map-Reduce Framework
13/08/11 19:47:33 INFO mapred.JobClient: Map output materialized bytes=112
13/08/11 19:47:33 INFO mapred.JobClient: Map input records=4
13/08/11 19:47:33 INFO mapred.JobClient: Reduce shuffle bytes=112
13/08/11 19:47:33 INFO mapred.JobClient: Spilled Records=16
13/08/11 19:47:33 INFO mapred.JobClient: Map output bytes=72
13/08/11 19:47:33 INFO mapred.JobClient: Total committed heap usage (bytes)=631963648
13/08/11 19:47:33 INFO mapred.JobClient: CPU time spent (ms)=34540
13/08/11 19:47:33 INFO mapred.JobClient: Map input bytes=96
13/08/11 19:47:33 INFO mapred.JobClient: SPLIT_RAW_BYTES=492
13/08/11 19:47:33 INFO mapred.JobClient: Combine input records=0
13/08/11 19:47:33 INFO mapred.JobClient: Reduce input records=8
13/08/11 19:47:33 INFO mapred.JobClient: Reduce input groups=8
13/08/11 19:47:33 INFO mapred.JobClient: Combine output records=0
13/08/11 19:47:33 INFO mapred.JobClient: Physical memory (bytes) snapshot=950931456
13/08/11 19:47:33 INFO mapred.JobClient: Reduce output records=0
13/08/11 19:47:33 INFO mapred.JobClient: Virtual memory (bytes) snapshot=5287145472
13/08/11 19:47:33 INFO mapred.JobClient: Map output records=8
Job Finished in 146.894 seconds
Estimated value of Pi is 3.50000000000000000000
$ ./bin/hadoop fs -mkdir input
$ ./bin/hadoop fs -ls
Found 2 items
drwxr-xr-x - centos supergroup 0 2013-08-11 18:02 /user/centos/input
drwxr-xr-x - centos supergroup 0 2013-08-11 18:02 /user/centos/output
$ ./bin/hadoop fs -put /home/centos/soft/hadoopdemo/*.txt input
$ ./bin/hadoop fs -ls input
Found 2 items
-rw-r--r-- 1 centos supergroup 31 2013-08-11 19:58 /user/centos/input/demo1.txt
-rw-r--r-- 1 centos supergroup 34 2013-08-11 19:58 /user/centos/input/demo2.txt
$ ./bin/hadoop jar /home/centos/soft/hadoop-1.1.2/hadoop-examples-1.1.2.jar wordcount input output
注:如output目录已存在,则应先删除
13/08/11 20:00:46 INFO input.FileInputFormat: Total input paths to process : 2
13/08/11 20:00:47 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/08/11 20:00:47 WARN snappy.LoadSnappy: Snappy native library not loaded
13/08/11 20:00:47 INFO mapred.JobClient: Running job: job_201308111944_0003
13/08/11 20:00:48 INFO mapred.JobClient: map 0% reduce 0%
13/08/11 20:01:11 INFO mapred.JobClient: map 50% reduce 0%
13/08/11 20:01:15 INFO mapred.JobClient: map 100% reduce 0%
13/08/11 20:01:40 INFO mapred.JobClient: map 100% reduce 100%
13/08/11 20:01:41 INFO mapred.JobClient: Job complete: job_201308111944_0003
13/08/11 20:01:41 INFO mapred.JobClient: Counters: 29
13/08/11 20:01:41 INFO mapred.JobClient: Job Counters
13/08/11 20:01:41 INFO mapred.JobClient: Launched reduce tasks=1
13/08/11 20:01:41 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=43019
13/08/11 20:01:41 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
13/08/11 20:01:41 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
13/08/11 20:01:41 INFO mapred.JobClient: Launched map tasks=2
13/08/11 20:01:41 INFO mapred.JobClient: Data-local map tasks=2
13/08/11 20:01:41 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=28852
13/08/11 20:01:41 INFO mapred.JobClient: File Output Format Counters
13/08/11 20:01:41 INFO mapred.JobClient: Bytes Written=58
13/08/11 20:01:41 INFO mapred.JobClient: FileSystemCounters
13/08/11 20:01:41 INFO mapred.JobClient: FILE_BYTES_READ=118
13/08/11 20:01:41 INFO mapred.JobClient: HDFS_BYTES_READ=293
13/08/11 20:01:41 INFO mapred.JobClient: FILE_BYTES_WRITTEN=174385
13/08/11 20:01:41 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=58
13/08/11 20:01:41 INFO mapred.JobClient: File Input Format Counters
13/08/11 20:01:41 INFO mapred.JobClient: Bytes Read=65
13/08/11 20:01:41 INFO mapred.JobClient: Map-Reduce Framework
13/08/11 20:01:41 INFO mapred.JobClient: Map output materialized bytes=124
13/08/11 20:01:41 INFO mapred.JobClient: Map input records=10
13/08/11 20:01:41 INFO mapred.JobClient: Reduce shuffle bytes=124
13/08/11 20:01:41 INFO mapred.JobClient: Spilled Records=18
13/08/11 20:01:41 INFO mapred.JobClient: Map output bytes=105
13/08/11 20:01:41 INFO mapred.JobClient: CPU time spent (ms)=10750
13/08/11 20:01:41 INFO mapred.JobClient: Total committed heap usage (bytes)=310378496
13/08/11 20:01:41 INFO mapred.JobClient: Combine input records=10
13/08/11 20:01:41 INFO mapred.JobClient: SPLIT_RAW_BYTES=228
13/08/11 20:01:41 INFO mapred.JobClient: Reduce input records=9
13/08/11 20:01:41 INFO mapred.JobClient: Reduce input groups=7
13/08/11 20:01:41 INFO mapred.JobClient: Combine output records=9
13/08/11 20:01:41 INFO mapred.JobClient: Physical memory (bytes) snapshot=465526784
13/08/11 20:01:41 INFO mapred.JobClient: Reduce output records=7
13/08/11 20:01:41 INFO mapred.JobClient: Virtual memory (bytes) snapshot=3175239680
13/08/11 20:01:41 INFO mapred.JobClient: Map output records=10
$ ./bin/hadoop fs -ls output
Found 3 items
-rw-r--r-- 1 centos supergroup 0 2013-08-11 20:01 /user/centos/output/_SUCCESS
drwxr-xr-x - centos supergroup 0 2013-08-11 20:00 /user/centos/output/_logs
-rw-r--r-- 1 centos supergroup 58 2013-08-11 20:01 /user/centos/output/part-r-00000
$ ./bin/hadoop fs -cat /user/centos/output/part-r-00000
hadoop 3
java 1
mongdodb 1
office 2
redis 1
text 1
word 1
$ ./bin/hadoop fs -copyToLocal /user/centos/output/part-r-00000 ~/soft/test.txt
$ cat soft/test.txt
hadoop 3
java 1
mongdodb 1
office 2
redis 1
text 1
word 1
写数据
启动客户端程序时,出现以下问题
13/08/15 11:35:27 INFO ipc.Client: Retrying connect to server: 192.168.21.133/192.168.21.133:9000. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
分析
core-site.xml文件中,默认配置了
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
此时,通过外部程序访问虚拟机时无法连接,改为如下的具体IP即可。需注意的是,如果是重启虚机,IP地址可能发生变化
<property>
<name>fs.default.name</name>
<value>hdfs://192.168.21.133:9000</value>
</property>
./bin/hadoop fs -mkdir tmp
$ ./bin/hadoop fs -ls
Found 3 items
drwxr-xr-x - centos supergroup 0 2013-08-11 19:58 /user/centos/input
drwxr-xr-x - centos supergroup 0 2013-08-11 20:01 /user/centos/output
drwxr-xr-x - centos supergroup 0 2013-08-15 11:04 /user/centos/tmp
执行客户端上传程序
org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create file/user/centos/tmp/test1.txt. Name node is in safe mode.
The reported blocks is only 0 but the threshold is 0.9990 and the total blocks 8. Safe mode will be turned off automatically.
注:需要解除安全模式,执行
$ ./bin/hadoop dfsadmin -safemode leave
继续执行
org.apache.hadoop.security.AccessControlException: org.apache.hadoop.security.AccessControlException: Permission denied: user=hy-cxz, access=WRITE, inode="tmp":centos:supergroup:rwxr-xr-x
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
…
注:需要开放文件夹的写权限
./bin/hadoop fs -chmod 777 tmp
$ ./bin/hadoop fs -ls
Found 3 items
drwxr-xr-x - centos supergroup 0 2013-08-11 19:58 /user/centos/input
drwxr-xr-x - centos supergroup 0 2013-08-11 20:01 /user/centos/output
drwxrwxrwx - centos supergroup 0 2013-08-15 11:04 /user/centos/tmp
当遇到类似如下问题时:
13/08/11 19:05:44 WARN hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/centos/PiEstimator_TMP_3_141592654/in/part0 could only be replicated to 0 nodes, instead of 1
可以检查datanode是否启动。在每次执行bin/hadoop namenode -format时,会为namenode生成namespaceID, 但是在tmp文件夹下的datanode还是保留上次的namespaceID,在启动时,由于namespaceID不一致,导致datanode无法启动。解决方法是在每次bin/hadoop namenode -format之前先删除"临时文件夹"就可以启动成功。即:
1)先停止所有服务
2)删除临时文件(hadoop.tmp.dir)和数据文件(dfs.name.dir、dfs.data.dir)
3)启动所有服务