Step1. 启动hadoop
进入hadoop安装目录,然后执行命令启动hadoop所有进程
$ bin/start-all.sh然后,查看所有进程
$ jps6112 Jps
5412 JobTracker
6452 NameNode
6534 DateNode
7552 TaskTracker
8653 Secondary NameNode
Step2. 输入文件
在hadoop安装目录下用mkdir命令新建一个文件夹input,进入input文件夹,分别在file0,file1文件里
写入内容:
$ echo "hello Kitty hello Hadoop hello World"
>file0$ echo "hello Hadoop hello oLHHo
hello Leon" >file1
Step3. 复制input文件夹到HDFS的根目录下
wordcount例子不能再本地运行,复制到hadoop分布式系统上才能运行,
$ bin/hadoop dfs -put input in //将文件夹input复制到HDFS下并重命名为in
$ bin/hadoop dfs -ls //查看已上传文件及路径
Found 1 items
drwxr-xr-x -
olhho\administrator supergroup 0
2014-10-1514:12
/user/olhho/administrator/in
Step4. 执行wordcount的例子
$ bin/hadoop jar hadoop-0.20.2-examples.jar wordcount in
out //输入文件为in,输出文件为out
14/10/15 14:14:40 INFO input.FileInputFormat:
Total input paths to process : 2
14/10/15 14:14:41 INFO mapred.JobClient: Running job:
job_201410151358_0001
14/10/15 14:14:42 INFO mapred.JobClient: map 0%
reduce 0%
14/10/15 14:14:55 INFO mapred.JobClient: map 66%
reduce 0%
14/10/15 14:14:58 INFO mapred.JobClient: map 100%
reduce 0%
14/10/15 14:15:07 INFO mapred.JobClient: map 100%
reduce 100%
14/10/15 14:15:09 INFO mapred.JobClient: Job complete:
job_201410151358_0001
14/10/15 14:15:09 INFO mapred.JobClient: Counters: 17
14/10/15 14:15:09 INFO
mapred.JobClient: Map-Reduce
Framework
14/10/15 14:15:09 INFO
mapred.JobClient: Combine output records=12
14/10/15 14:15:09 INFO
mapred.JobClient: Spilled Records=12
14/10/15 14:15:09 INFO
mapred.JobClient: Reduce input records=6
14/10/15 14:15:09 INFO
mapred.JobClient: Reduce output records=4
14/10/15 14:15:09 INFO
mapred.JobClient: Map input records=2
14/10/15 14:15:09 INFO
mapred.JobClient: Map output records=12
14/10/15 14:15:09 INFO
mapred.JobClient: Map output bytes=61
14/10/15 14:15:09 INFO
mapred.JobClient: Reduce shuffle bytes=91
14/10/15 14:15:09 INFO
mapred.JobClient: Combine input records=6
14/10/15 14:15:09 INFO
mapred.JobClient: Reduce input groups=4
14/10/15 14:15:09 INFO
mapred.JobClient: FileSystemCounters
14/10/15 14:15:09 INFO
mapred.JobClient: HDFS_BYTES_READ=37
14/10/15 14:15:09 INFO
mapred.JobClient: FILE_BYTES_WRITTEN=347
14/10/15 14:15:09 INFO
mapred.JobClient: FILE_BYTES_READ=160
14/10/15 14:15:09 INFO
mapred.JobClient: HDFS_BYTES_WRITTEN=33
14/10/15 14:15:09 INFO
mapred.JobClient: Job
Counters
14/10/15 14:15:09 INFO
mapred.JobClient: Launched map tasks=3
14/10/15 14:15:09 INFO
mapred.JobClient: Launched reduce tasks=1
14/10/15 14:15:09 INFO
mapred.JobClient: Data-local map tasks=3
$ bin/hadoop dfs -ls //查看out所在路径
Found 2 items
drwxr-xr-x -
olhho\administrator supergroup 0 2014-10-15 14:12
/user/olhho/administrator/in
drwxr-xr-x -
olhho\administrator supergroup 0 2014-10-15 14:15
/user/olhho/administrator/out
Step5. 查看输出文件内容
$ bin/hadoop dfs -cat
/user/olhho/administrator/out/*Kitty 1
Hadoop 2
hello 6
World 1
oLHHo 1
Leon 1
也可以把输出文件从HDFS上复制到本地文件系统下查看:
$ bin/hadoop dfs -get out output
//复制out并重命名为output
$ cat output/*Kitty 1
Hadoop 2
hello 6
World 1
oLHHo 1
Leon 1
说明:wordcount案例的功能是统计单词个数