本文说明hadoop安装之后验证安装和配置的方法,hadoop-1.2.1安装方法参考:hadoop-1.2.1安装方法详解
hadoop安装成功之后,要简单验证是否成功安装和配置,在hadoop-1.2.1安装方法详解教程中,已经通过jps命令简单的验证,这里通过执行MapReduce作业统计单词来进一步验证reduce的配置是否正确,下面是验证方法:
1、为方便运行hadoop命令,首先配置一下hadoop的环境变量
打开hadoop用户目录下的 .bashrc 文件,添加修改以下内容:
export HADOOP_HOME=/home/hadoop/hadoop-1.2.1
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin
配置之后注意要重新登录或者执行命令 source .bashrc 使配置生效
配置HADOOP_HOME之后,运行命令会报 ”Warning: $HADOOP_HOME is deprecated. “警告,
解决方法参考 hadoop1.2.1报Warning: $HADOOP_HOME is deprecated. 的解决方法
2、在本地创建测试文件,为上传方便,我们先创建一个input文件,然后建立test1.txt和test2.txt文件
[hadoop@mdw temp]$ mkdir input
[hadoop@mdw temp]$ cd input/
[hadoop@mdw input]$ echo "hello world hello hadoop" > test1.txt
[hadoop@mdw input]$ echo "hello hadoop" > test2.txt
[hadoop@mdw input]$ ll
total 8
-rw-rw-r-- 1 hadoop hadoop 25 May 29 01:19 test1.txt
-rw-rw-r-- 1 hadoop hadoop 13 May 29 01:20 test2.txt
[hadoop@mdw input]$ cat test1.txt
hello world hello hadoop
[hadoop@mdw input]$ cat test2.txt
hello hadoop
红色内容为两个文本文件的内容,我们要通过MapReduce作业统计单词的个数
3、上传这两个文本文件到hdfs文件系统(我这里上传到hdfs文件系统的 in 文件夹中)
[hadoop@mdw input]$ hadoop dfs -put ../input/ in
上传之后,查看hdfs文件系统
[hadoop@mdw input]$ hadoop dfs -ls
Found 1 items
drwxr-xr-x - hadoop supergroup 0 2015-05-29 01:31 /user/hadoop/in
我就可以看到,/user/hadoop文件夹中多了一个in文件夹,执行命令查看文件内容如下
[hadoop@mdw input]$ hadoop dfs -ls ./in
Found 2 items
-rw-r--r-- 2 hadoop supergroup 25 2015-05-29 01:31 /user/hadoop/in/test1.txt
-rw-r--r-- 2 hadoop supergroup 13 2015-05-29 01:31 /user/hadoop/in/test2.txt
如果上传时报以下错误:
put: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create directory /user/hadoop/in. Name node is in safe mode.
运行命令关闭安全模式即可:
[hadoop@mdw input]$ hadoop dfsadmin -safemode leave
4、运行hadoop自带的单词计数程序统计单词个数
[hadoop@mdw input]$ hadoop jar ~/hadoop-1.2.1/hadoop-examples-1.2.1.jar wordcount in out
15/05/29 01:41:13 INFO input.FileInputFormat: Total input paths to process : 2
15/05/29 01:41:13 INFO util.NativeCodeLoader: Loaded the native-hadoop library
15/05/29 01:41:13 WARN snappy.LoadSnappy: Snappy native library not loaded
15/05/29 01:41:14 INFO mapred.JobClient: Running job: job_201505290130_0001
15/05/29 01:41:15 INFO mapred.JobClient: map 0% reduce 0%
15/05/29 01:41:19 INFO mapred.JobClient: map 50% reduce 0%
15/05/29 01:41:20 INFO mapred.JobClient: map 100% reduce 0%
15/05/29 01:41:26 INFO mapred.JobClient: map 100% reduce 33%
15/05/29 01:41:27 INFO mapred.JobClient: map 100% reduce 100%
15/05/29 01:41:27 INFO mapred.JobClient: Job complete: job_201505290130_0001
15/05/29 01:41:27 INFO mapred.JobClient: Counters: 29
15/05/29 01:41:27 INFO mapred.JobClient: Job Counters
15/05/29 01:41:27 INFO mapred.JobClient: Launched reduce tasks=1
15/05/29 01:41:27 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=5374
15/05/29 01:41:27 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
15/05/29 01:41:27 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
15/05/29 01:41:27 INFO mapred.JobClient: Launched map tasks=2
15/05/29 01:41:27 INFO mapred.JobClient: Data-local map tasks=2
15/05/29 01:41:27 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=8142
15/05/29 01:41:27 INFO mapred.JobClient: File Output Format Counters
15/05/29 01:41:27 INFO mapred.JobClient: Bytes Written=25
15/05/29 01:41:27 INFO mapred.JobClient: FileSystemCounters
15/05/29 01:41:27 INFO mapred.JobClient: FILE_BYTES_READ=68
15/05/29 01:41:27 INFO mapred.JobClient: HDFS_BYTES_READ=254
15/05/29 01:41:27 INFO mapred.JobClient: FILE_BYTES_WRITTEN=165604
15/05/29 01:41:27 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=25
15/05/29 01:41:27 INFO mapred.JobClient: File Input Format Counters
15/05/29 01:41:27 INFO mapred.JobClient: Bytes Read=38
15/05/29 01:41:27 INFO mapred.JobClient: Map-Reduce Framework
15/05/29 01:41:27 INFO mapred.JobClient: Map output materialized bytes=74
15/05/29 01:41:27 INFO mapred.JobClient: Map input records=2
15/05/29 01:41:27 INFO mapred.JobClient: Reduce shuffle bytes=74
15/05/29 01:41:27 INFO mapred.JobClient: Spilled Records=10
15/05/29 01:41:27 INFO mapred.JobClient: Map output bytes=62
15/05/29 01:41:27 INFO mapred.JobClient: CPU time spent (ms)=1960
15/05/29 01:41:27 INFO mapred.JobClient: Total committed heap usage (bytes)=337780736
15/05/29 01:41:27 INFO mapred.JobClient: Combine input records=6
15/05/29 01:41:27 INFO mapred.JobClient: SPLIT_RAW_BYTES=216
15/05/29 01:41:27 INFO mapred.JobClient: Reduce input records=5
15/05/29 01:41:27 INFO mapred.JobClient: Reduce input groups=3
15/05/29 01:41:27 INFO mapred.JobClient: Combine output records=5
15/05/29 01:41:27 INFO mapred.JobClient: Physical memory (bytes) snapshot=324222976
15/05/29 01:41:27 INFO mapred.JobClient: Reduce output records=3
15/05/29 01:41:27 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1128259584
15/05/29 01:41:27 INFO mapred.JobClient: Map output records=6
注意:hadoop自带的hadoop-examples-1.2.1.jar这个jar文件在hadoop的安装目录下
通过上面的信息,我们可以看到,MapReduce任务成功执行,我们再次查看hdfs文件系统,会发现里面多了一个out文件夹
[hadoop@mdw input]$ hadoop dfs -ls
Found 2 items
drwxr-xr-x - hadoop supergroup 0 2015-05-29 01:31 /user/hadoop/in
drwxr-xr-x - hadoop supergroup 0 2015-05-29 01:41 /user/hadoop/out
执行ls命令查看out文件夹里的内容:
[hadoop@mdw input]$ hadoop dfs -ls ./out
Found 3 items
-rw-r--r-- 2 hadoop supergroup 0 2015-05-29 01:41 /user/hadoop/out/_SUCCESS
drwxr-xr-x - hadoop supergroup 0 2015-05-29 01:41 /user/hadoop/out/_logs
-rw-r--r-- 2 hadoop supergroup 25 2015-05-29 01:41 /user/hadoop/out/part-r-00000
MapReduce任务输出的数据主要存在part-r-00000文件中,我们可以执行命令查看这几个文件的具体内容:
[hadoop@mdw input]$ hadoop dfs -cat ./out/*
hadoop 2
hello 3
world 1
cat: File does not exist: /user/hadoop/out/_logs
这里红色部分就是MapReduce的执行结果,两个文本文件中的单词被正确统计出来,到此已能充分说明hadoop已经成功安装配置