1.reduce中的循环写了两次,但是第二次不能执行。故只能写一次这个循环?
for (Text val : values){……}
2.输入输出路径的设置:
可以设置多个输入文件,用逗号隔开
String pathBus = new String("hdfs://hadoop0:9000/SmartCard/OriginData/ACAA01_BUS_201503.csv");
String pathAFC = new String("hdfs://hadoop0:9000/MeteroTrafficData/subwayData6162/CARD_AFC_201503.csv");
String path = new String(pathAFC+","+pathBus);
FileInputFormat.setInputPaths(job, path);
FileOutputFormat.setOutputPath(job, new Path("hdfs://hadoop0:9000/tanghe/subwayDeal/tradeLink_tmp_4"));
3.Map和Reduce的输入与输出格式必须设置正确,并且在主程序中也应当设置正确。如:
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
4.计算程序运行时间:
long start = System.currentTimeMillis();
job.waitForCompletion(true);
long end = System.currentTimeMillis();
long costTime = end - start;
long milliSeconds = costTime%1000;
long seconds = costTime/1000%60;
long minutes = costTime/1000/60%60;
long hours = costTime/1000/60/60%24;
long days = costTime/1000/60/60/24;
System.out.println("Time cost: "+days+" "+hours+":"+minutes+":"+seconds+"."+milliSeconds);
5.一个完整的MapReduce的主程序:
package org.bjut.traffic.tradelink;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import java.io.IOException;
public class tradeLinkGenerate1503{
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException{
Configuration conf = new Configuration();
@SuppressWarnings("deprecation")
Job job = new Job(conf,"Main");
job.setJarByClass(tradeLinkGenerate1503.class);
job.setMapperClass(tradeLinkMapper.class);
job.setReducerClass(tradeLinkReducer.class);
String pathBus = new String("hdfs://hadoop0:9000/SmartCard/OriginData/ACAA01_BUS_201503.csv");
String pathAFC = new String("hdfs://hadoop0:9000/MeteroTrafficData/subwayData6162/CARD_AFC_201503.csv");
String path = new String(pathAFC+","+pathBus);
FileInputFormat.setInputPaths(job, path);
FileOutputFormat.setOutputPath(job, new Path("hdfs://hadoop0:9000/tanghe/subwayDeal/tradeLink_tmp_4"));
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
long start = System.currentTimeMillis();
job.waitForCompletion(true);
long end = System.currentTimeMillis();
long costTime = end - start;
long milliSeconds = costTime%1000;
long seconds = costTime/1000%60;
long minutes = costTime/1000/60%60;
long hours = costTime/1000/60/60%24;
long days = costTime/1000/60/60/24;
System.out.println("Time cost: "+days+" "+hours+":"+minutes+":"+seconds+"."+milliSeconds);
}
}