MapReduce经验积累

1.reduce中的循环写了两次,但是第二次不能执行。故只能写一次这个循环?

for (Text  val : values){……}

2.输入输出路径的设置:
可以设置多个输入文件,用逗号隔开

    String pathBus = new String("hdfs://hadoop0:9000/SmartCard/OriginData/ACAA01_BUS_201503.csv");
    String pathAFC = new String("hdfs://hadoop0:9000/MeteroTrafficData/subwayData6162/CARD_AFC_201503.csv");
    String path = new String(pathAFC+","+pathBus);
    FileInputFormat.setInputPaths(job, path);
    FileOutputFormat.setOutputPath(job, new Path("hdfs://hadoop0:9000/tanghe/subwayDeal/tradeLink_tmp_4"));

3.Map和Reduce的输入与输出格式必须设置正确,并且在主程序中也应当设置正确。如:

    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(Text.class);

4.计算程序运行时间:

    long start = System.currentTimeMillis();
    job.waitForCompletion(true);
    long end = System.currentTimeMillis();
    long costTime = end - start;
    long milliSeconds = costTime%1000;
    long seconds = costTime/1000%60;
    long minutes = costTime/1000/60%60;
    long hours = costTime/1000/60/60%24;
    long days = costTime/1000/60/60/24;
    System.out.println("Time cost:      "+days+" "+hours+":"+minutes+":"+seconds+"."+milliSeconds);

5.一个完整的MapReduce的主程序:

package org.bjut.traffic.tradelink;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import java.io.IOException;

public class tradeLinkGenerate1503{
    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException{

        Configuration conf = new Configuration();    
        @SuppressWarnings("deprecation")
        Job job = new Job(conf,"Main");    
        job.setJarByClass(tradeLinkGenerate1503.class);

        job.setMapperClass(tradeLinkMapper.class);
        job.setReducerClass(tradeLinkReducer.class);


        String pathBus = new String("hdfs://hadoop0:9000/SmartCard/OriginData/ACAA01_BUS_201503.csv");
        String pathAFC = new String("hdfs://hadoop0:9000/MeteroTrafficData/subwayData6162/CARD_AFC_201503.csv");

        String path = new String(pathAFC+","+pathBus);
        FileInputFormat.setInputPaths(job, path);
        FileOutputFormat.setOutputPath(job, new Path("hdfs://hadoop0:9000/tanghe/subwayDeal/tradeLink_tmp_4"));

        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);

        long start = System.currentTimeMillis();
        job.waitForCompletion(true);
        long end = System.currentTimeMillis();
        long costTime = end - start;
        long milliSeconds = costTime%1000;
        long seconds = costTime/1000%60;
        long minutes = costTime/1000/60%60;
        long hours = costTime/1000/60/60%24;
        long days = costTime/1000/60/60/24;
        System.out.println("Time cost:      "+days+" "+hours+":"+minutes+":"+seconds+"."+milliSeconds);

    }
}
阅读更多
换一批

没有更多推荐了,返回首页