1 进入MapReduce目录(如果没有则mkdir):
cd ~/MapReduce
2 创建MapReduce应用程序wordcount的目录
mkdir wordcount
3 编译生成.class(pwd为~/MapReduce)
javac -classpath ~/hadoop-1.1.2/hadoop-core-1.1.2.jar -d wordcount WordCount.java
或者进入wordcount目录(pwd为~/MapReduce/wordcount/),输入:
javac -classpath ~/hadoop-1.1.2/hadoop-core-1.1.2.jar WordCount.java
4 生成jar包(假设当前目录为~/MapReduce/wordcount/)
jar -cvf wordcount.jar -C ./ . 最后一个点不能忽略
或者:
jar -cvf wordcount.jar .
5 准备上传输入文件,先要建立文件夹input,input有点类似虚拟文件夹
~/hadoop-1.1.2/bin/hadoop dfs -mkdir input
6 上传文件file01,file02到input中(当前目录在wordcount中)
~/hadoop-1.1.2/bin/hadoop dfs -put ./input/file0* input
或者:
~/hadoop-1.1.2/bin/hadoop dfs -copyFromLocal ./input/ input
7 运行生成的jar文件(pwd为~/MapReduce/wordcount/)
~/hadoop-1.1.2/bin/hadoop jar wordcount.jar WordCount input output
显示输入文件
~/hadoop-1.1.2/bin/hadoop fs -ls input
显示输出文件
~/hadoop-1.1.2/bin/hadoop fs -ls output
结果如下:
xxl@xxl-pc:~/MapReduce/wordcount$ ~/hadoop-1.1.2/bin/hadoop fs -ls output/
Found 3 items
-rw-r--r-- 1 xxl supergroup 0 2013-06-22 21:07 /user/xxl/output/_SUCCESS
drwxr-xr-x - xxl supergroup 0 2013-06-22 21:06 /user/xxl/output/_logs
-rw-r--r-- 1 xxl supergroup 31 2013-06-22 21:06 /user/xxl/output/part-00000
输入cat命令查看结果:
~/hadoop-1.1.2/bin/hadoop fs -cat output/part-00000
输出结果:
Bye 2
Hadoop 2
Hello 2
World 2
删除已有的输出文件:
~/hadoop-1.1.2/bin/hadoop fs -rmr output/
***********************************************
****** Hadoop流 *******************************
***********************************************
如果没有建立input文件夹,则新建input:
~/hadoop-1.1.2/bin/hadoop dfs -mkdir input
将输入文件导入input:
~/hadoop-1.1.2/bin/hadoop dfs -put ./input/file0* input
如果有output文件夹则要先删除,否则运行程序时会出错:
~/hadoop-1.1.2/bin/hadoop fs -rmr output/
输入命令(pwd为~/hadoop-1.1.2/):
~/hadoop-1.1.2/bin/hadoop jar contrib/streaming/hadoop-streaming-1.1.2.jar -input input -output output -mapper /bin/cat -reducer /usr/bin/wc
结果如下:
packageJobJar: [/tmp/hadoop-xxl/hadoop-unjar2181143433612460840/] [] /tmp/streamjob6736477890325015350.jar tmpDir=null
13/06/22 22:45:55 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/06/22 22:45:55 WARN snappy.LoadSnappy: Snappy native library not loaded
13/06/22 22:45:55 INFO mapred.FileInputFormat: Total input paths to process : 2
13/06/22 22:45:55 INFO streaming.StreamJob: getLocalDirs(): [/tmp/hadoop-xxl/mapred/local]
13/06/22 22:45:55 INFO streaming.StreamJob: Running job: job_201306221602_0031
13/06/22 22:45:55 INFO streaming.StreamJob: To kill this job, run:
13/06/22 22:45:55 INFO streaming.StreamJob: /home/xxl/hadoop-1.1.2/libexec/../bin/hadoop job -Dmapred.job.tracker=localhost:9001 -kill job_201306221602_0031
13/06/22 22:45:55 INFO streaming.StreamJob: Tracking URL: http://localhost:50030/jobdetails.jsp?jobid=job_201306221602_0031
13/06/22 22:45:56 INFO streaming.StreamJob: map 0% reduce 0%
13/06/22 22:46:03 INFO streaming.StreamJob: map 100% reduce 0%
13/06/22 22:46:12 INFO streaming.StreamJob: map 100% reduce 33%
13/06/22 22:46:14 INFO streaming.StreamJob: map 100% reduce 100%
13/06/22 22:46:16 INFO streaming.StreamJob: Job complete: job_201306221602_0031
13/06/22 22:46:16 INFO streaming.StreamJob: Output: output
查看结果文件:
~/hadoop-1.1.2/bin/hadoop fs -cat output/p*
结果如下:
2 8 48
//******************* C++ Hadoop Pipes ************
makefile的内容:
HADOOP_INSTALL=/home/xxl/hadoop-1.1.2
PLATFORM=Linux-i386-32
SSL_INSTALL=/usr/local/ssl
CC=g++
CPPFLAGS=-m32 -I$(HADOOP_INSTALL)/c++/$(PLATFORM)/include -I$(SSL_INSTALL)/include
wordcount: wordcount.cpp
$(CC) $(CPPFLAGS) $< -Wall -L$(HADOOP_INSTALL)/c++/$(PLATFORM)/lib -lhadooppipes -lhadooputils \
-L$(SSL_INSTALL)/lib -lcrypto -lssl -ldl -lpthread -g -O2 -o $@
将可执行文件上传到bin文件夹内
~/hadoop-1.1.2/bin/hadoop fs -mkdir bin
~/hadoop-1.1.2/bin/hadoop dfs -put wordcount bin
运行这个wordcount程序
~/hadoop-1.1.2/bin/hadoop pipes -D hadoop.pipes.java.recordreader=true -D hadoop.pipes.java.recordwriter=true -input input -output output -program bin/wordcount