hadoop0.20.2
1.使用streaming命令(摘至hadoop开发文档):
除了纯文本格式的输出,你还可以生成gzip文件格式的输出,你只需设置streaming作业中的选项‘-jobconf mapred.output.compress=true -jobconf mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCode’。
2.使用程序:
输入文件:
$ bin/hadoop fs -ls /temp/in
Found 2 items
-rw-r--r-- 1 Administrator supergroup 52 2012-02-09 10:02 /temp/in/t1.txt
-rw-r--r-- 1 Administrator supergroup 35 2012-02-09 10:02 /temp/in/t2.txt
调试代码:
public class ZipFile {
public static class Map extends MapReduceBase implements Mapper {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value,
OutputCollector output, Reporter reporter)
throws IOException {
output.collect((