Hadoop-MapReduce编程

Map阶段

问题定义–SELECT子句

源代码
public class SelectClauseMRJob extends Configured implements Tool {

    public static class SelectClauseMapper
            extends Mapper<LongWritable,Text,NullWritable,Text>{
        @Override
        protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
            if (! AirlineDataUtils.isHeader(value)){
                StringBuilder output = AirlineDataUtils.mergeStringArray(
                        AirlineDataUtils.getSelectResultsPerRow(value),","
                );
                context.write(NullWritable.get(),new Text(output.toString()));
            }
        }
    }

    public int run(String[] strings) throws Exception {
        Job job = Job.getInstance(getConf());
        job.setJarByClass(SelectClauseMRJob.class);
        job.setInputFormatClass(TextInputFormat.class);     //逐行输入数据的格式
        job.setOutputFormatClass(TextOutputFormat.class);   //输出的数据格式

        job.setOutputKeyClass(NullWritable.class);          //因为是CSV文件,键为null
        job.setOutputValueClass(Text.class);                //默认情况是和输入一样的,不一样要自定义

        job.setMapperClass(SelectClauseMapper.class);
        job.setNumReduceTasks(0);                           //没有Reducer
        String[] args = new GenericOptionsParser(getConf(),strings).getRemainingArgs();

        FileInputFormat.setInputPaths(job,new Path(args[0]));
        FileOutputFormat.setOutputPath(job,new Path(args[1]));
        boolean status = job.waitForCompletion(true); //打印执行过程

        if (status){
            return 0;
        }else {
            return 1;
        }

    }

    public static void main(String...args) throws Exception {
        Configuration configuration = new Configuration();
        ToolRunner.run(new SelectClauseMRJob(),args);
    }
}
在集群中运行
hadoop jar '/home/atalisas/桌面/hadoop_final.jar'  /user/atalisas/sampledata  /user/atalisas/output/c5_select


17/09/16 21:39:26 INFO input.FileInputFormat: Total input files to process : 2
17/09/16 21:39:26 INFO mapreduce.JobSubmitter: number of splits:2
17/09/16 21:39:27 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1082158209_0001
17/09/16 21:39:27 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
17/09/16 21:39:27 INFO mapreduce.Job: Running job: job_local1082158209_0001
17/09/16 21:39:27 INFO mapred.LocalJobRunner: OutputCommitter set in config null
17/09/16 21:39:27 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
17/09/16 21:39:27 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
17/09/16 21:39:27 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
17/09/16 21:39:27 INFO mapred.LocalJobRunner: Waiting for map tasks
17/09/16 21:39:27 INFO mapred.LocalJobRunner: Starting task: attempt_local1082158209_0001_m_000000_0
17/09/16 21:39:27 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
17/09/16 21:39:27 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
17/09/16 21:39:27 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
17/09/16 21:39:27 INFO mapred.MapTask: Processing split: hdfs://0.0.0.0:9000/user/atalisas/sampledata/1988.csv.bz2:0+49499025
17/09/16 21:39:27 WARN bzip2.Bzip2Factory: Failed to load/initialize native-bzip2 library system-native, will use pure-Java version
17/09/16 21:39:27 INFO compress.CodecPool: Got brand-new decompressor [.bz2]
17/09/16 21:39:28 INFO mapreduce.Job: Job job_local1082158209_0001 running in uber mode : false
17/09/16 21:39:28 INFO mapreduce.Job:  map 0% reduce 0%
17/09/16 21:39:39 INFO mapred.LocalJobRunner: map > map
17/09/16 21:39:40 INFO mapreduce.Job:  map 11% reduce 0%
17/09/16 21:39:45 INFO mapred.LocalJobRunner: map > map
17/09/16 21:39:46 INFO mapreduce.Job:  map 17% reduce 0%
17/09/16 21:39:51 INFO mapred.LocalJobRunner: map > map
17/09/16 21:39:52 INFO mapreduce.Job:  map 23% reduce 0%
17/09/16 21:39:57 INFO mapred.LocalJobRunner: map > map
17/09/16 21:39:58 INFO mapreduce.Job:  map 28% reduce 0%
17/09/16 21:40:03 INFO mapred.LocalJobRunner: map > map
17/09/16 21:40:04 INFO mapreduce.Job:  map 34% reduce 0%
17/09/16 21:40:09 INFO mapred.LocalJobRunner: map > map
17/09/16 21:40:10 INFO mapreduce.Job:  map 40% reduce 0%
17/09/16 21:40:15 INFO mapred.LocalJobRunner: map > map
17/09/16 21:40:16 INFO mapreduce.Job:  map 46% reduce 0%
17/09/16 21:40:19 INFO mapred.LocalJobRunner: map > map
17/09/16 21:40:19 INFO mapred.Task: Task:attempt_local1082158209_0001_m_000000_0 is done. And is in the process of committing
17/09/16 21:40:19 INFO mapred.LocalJobRunner: map > map
17/09/16 21:40:19 INFO mapred.Task: Task attempt_local1082158209_0001_m_000000_0 is allowed to commit now
17/09/16 21:40:19 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1082158209_0001_m_000000_0' to hdfs://0.0.0.0:9000/user/atalisas/output/c5_select/_temporary/0/task_local1082158209_0001_m_000000
17/09/16 21:40:19 INFO mapred.LocalJobRunner: map
17/09/16 21:40:19 INFO mapred.Task: Task 'attempt_local1082158209_0001_m_000000_0' done.
17/09/16 21:40:19 INFO mapred.LocalJobRunner: Finishing task: attempt_local1082158209_0001_m_000000_0
17/09/16 21:40:19 INFO mapred.LocalJobRunner: Starting task: attempt_local1082158209_0001_m_000001_0
17/09/16 21:40:19 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
17/09/16 21:40:19 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
17/09/16 21:40:19 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
17/09/16 21:40:19 INFO mapred.MapTask: Processing split: hdfs://0.0.0.0:9000/user/atalisas/sampledata/1987.csv.bz2:0+12652442
17/09/16 21:40:20 INFO mapreduce.Job:  map 100% reduce 0%
17/09/16 21:40:31 INFO mapred.LocalJobRunner: map > map
17/09/16 21:40:32 INFO mapreduce.Job:  map 96% reduce 0%
17/09/16 21:40:32 INFO mapred.LocalJobRunner: map > map
17/09/16 21:40:32 INFO mapred.Task: Task:attempt_local1082158209_0001_m_000001_0 is done. And is in the process of committing
17/09/16 21:40:32 INFO mapred.LocalJobRunner: map > map
17/09/16 21:40:32 INFO mapred.Task: Task attempt_local1082158209_0001_m_000001_0 is allowed to commit now
17/09/16 21:40:32 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1082158209_0001_m_000001_0' to hdfs://0.0.0.0:9000/user/atalisas/output/c5_select/_temporary/0/task_local1082158209_0001_m_000001
17/09/16 21:40:32 INFO mapred.LocalJobRunner: map
17/09/16 21:40:32 INFO mapred.Task: Task 'attempt_local1082158209_0001_m_000001_0' done.
17/09/16 21:40:32 INFO mapred.LocalJobRunner: Finishing task: attempt_local1082158209_0001_m_000001_0
17/09/16 21:40:32 INFO mapred.LocalJobRunner: map task executor complete.
17/09/16 21:40:33 INFO mapreduce.Job:  map 100% reduce 0%
17/09/16 21:40:33 INFO mapreduce.Job: Job job_local1082158209_0001 completed successfully
17/09/16 21:40:33 INFO mapreduce.Job: Counters: 20
    File System Counters
        FILE: Number of bytes read=77331343
        FILE: Number of bytes written=78582382
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=111678140
        HDFS: Number of bytes written=528452993
        HDFS: Number of read operations=18
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=8
    Map-Reduce Framework
        Map input records=6513924
        Map output records=6513922
        Input split bytes=244
        Spilled Records=0
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=422
        Total committed heap usage (bytes)=567279616
    File Input Format Counters 
        Bytes Read=62169899
    File Output Format Counters 
        Bytes Written=293774629

#看看最后几行
hdfs dfs -tail output/c5_select/part-m-00000

12/10/1988,1325,2258,HNL,LAX,2556,453,309,0,144
12/11/1988,1325,2051,HNL,LAX,2556,326,309,0,17
12/12/1988,1325,2043,HNL,LAX,2556,318,309,0,9
12/13/1988,1325,2038,HNL,LAX,2556,313,309,0,4
12/14/1988,1325,2045,HNL,LAX,2556,320,309,0,11
12/01/1988,2027,2152,ATL,MCO,403,85,78,0,7
12/02/1988,2106,2229,ATL,MCO,403,83,78,39,44
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值