Azkaban调度mapreduce任务demo

之前我的一篇博客是模拟日志收集到hdfs上面(详情见:http://blog.csdn.net/qq_20641565/article/details/52807776)以及Azkaban的安装(详情见:http://blog.csdn.net/qq_20641565/article/details/52814048),现在用java编写mapreduce程序通过Azkaban进行调度,编写四个简单的mapreduce 程序,这样放到Azkaban上面去调度能较好的体现出依赖关系,其中PersonAvg、ZhenAvg、XianAvg依赖于FormatLog这个job。

ps:其中这几个mapreduce程序的输入输出路径以及调度的时间格式为名字的文件夹(eg:hdfs://lijie:9000/flume/20161013/ 这个20161013文件夹)都是需要从调度的时候传入参数,但是我为了简便直接写死到了程序里面。

图片后面我更换了几张,所以可能有的运行时间不对。

FormatLog:用于格式化搜集到的数据。
PersonAvg:用于统计所有人的平均积蓄
ZhenAvg:用于统计每个镇的个人平均积蓄
XianAvg:用于统计每个县的个人平均积蓄

1.mapreduce程序如下:

FormatLog:

package com.lijie.demo4azkaban.avg;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

public class FormatLog extends Configured implements Tool {

    public static void main(String[] args) throws Exception {
        String[] args1 = {  "hdfs://lijie:9000/flume/20161013/*",
                            "hdfs://lijie:9000/flume/format/20161013" };
        int run = ToolRunner.run(new Configuration(), new FormatLog(), args1);
        System.exit(run);
    }

    public static class FormatLogMap extends Mapper<LongWritable, Text, Text, Text> {
        @Override
        protected void map( LongWritable key, Text value,
                            Mapper<LongWritable, Text, Text, Text>.Context context) throws IOException,
                                                                                    InterruptedException {
            String[] split = value.toString().split("\\|");
            if (split.length == 2) {
                Text valueNew = new Text(split[1].trim());
                context.write(new Text(""), valueNew);
            }
        }
    }

    @Override
    public int run(String[] args) throws Exception {
        Configuration conf = new Configuration();
        Path dest = new Path(args[args.length - 1]);
        FileSystem fs = dest.getFileSystem(conf);
        if (fs.isDirectory(dest)) {
            fs.delete(dest, true);
        }

        Job job = new Job(conf, "formatLog");
        job.setJarByClass(FormatLog.class);
        job.setMapperClass(FormatLogMap.class);

        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(Text.class);

        FileInputFormat.addInputPath(job, new Path(args[args.length - 2]));
        FileOutputFormat.setOutputPath(job, dest);

        return job.waitForCompletion(true) ? 0 : 1;
    }

}

PersonAvg:

package com.lijie.demo4azkaban.avg;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

public class PersonAvg extends Configured implements Tool {

    public static void main(String[] args) throws Exception {
        String[] args1 = {  "hdfs://lijie:9000/flume/format/20161013/*",
                            "hdfs://lijie:9000/flume/format/20161013/personout" };
        int run = ToolRunner.run(new Configuration(), new PersonAvg(), args1);
        System.exit(run);
    }

    @Override
    public int run(String[] args) throws Exception {

        Configuration conf = new Configuration();
        Path dest = new Path(args[1]);
        FileSystem fs = dest.getFileSystem(conf);
        if (fs.isDirectory(dest)) {
            fs.delete(dest, true);
        }

        Job job = new Job(conf, "personAvg");
        job.setJarByClass(PersonAvg.class);
        job.setMapperClass(PersonAvgMap.class);
        job.setReducerClass(PersonAvgReduce.class);

        job.setMapOutputKeyClass(Text.class);//map 输出key类型
        job.setMapOutputValueClass(Text.class);//map 输出value类型

        job.setOutputKeyClass(Text.class);//输出结果 key类型
        job.setOutputValueClass(Text.class);//输出结果 value 类型

        FileInputFormat.addInputPath(job, new Path(args[0]));// 输入路径
        FileOutputFormat.setOutputPath(job, new Path(args[1]));// 输出路径

        return job.waitForCompletion(true) ? 0 : 1;//提交任务

    }

    public static class PersonAvgMap extends Mapper<LongWritable, Text, Text, Text> {

        @Override
        protected void map( LongWritable key, Text value,
                            Mapper<LongWritable, Text, Text, Text>.Context context) throws IOException,
                                                                                    InterruptedException {
            String[] split = value.toString().split("####");
            if (split.length == 4) {
                if (null == split[3] || "".equals(split[3])) {
                    split[3] = "0";
                }
                context.write(new Text("1"), new Text(split[3]));
            }
        }
    }

    public static class PersonAvgReduce extends Reducer<Text, Text, Text, Text> {
        private long count = 0;
        private double sum = 0;
        private double avg = 0;

        @Override
        protected void reduce(  Text key, Iterable<Text> values,
                                Reducer<Text, Text, Text, Text>.Context context)    throws IOException,
                                                                                    InterruptedException {
            for (Text text : values) {
                count = count + 1;
                sum = sum + Double.parseDouble(text.toString().trim());
            }
            avg = sum / count;

            context.write(key, new Text(avg + ""));
        }
    }

}

ZhenAvg:

package com.lijie.demo4azkaban.avg;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

public class ZhenAvg extends Configured implements Tool {

    public static void main(String[] args) throws Exception {
        String[] args1 = {  "hdfs://lijie:9000/flume/20161013/*",
                            "hdfs://lijie:9000/flume/format/20161013/zhenout" };
        int run = ToolRunner.run(new Configuration(), new ZhenAvg(), args1);
        System.exit(run);
    }

    @Override
    public int run(String[] args) throws Exception {

        Configuration conf = new Configuration();
        Path dest = new Path(args[1]);
        FileSystem fs = dest.getFileSystem(conf);
        if (fs.isDirectory(dest)) {
            fs.delete(dest, true);
        }

        Job job = new Job(conf, "ZhenAvg");
        job.setJarByClass(ZhenAvg.class);
        job.setMapperClass(ZhenAvgMap.class);
        job.setReducerClass(ZhenAvgReduce.class);

        job.setMapOutputKeyClass(Text.class);//map 输出key类型
        job.setMapOutputValueClass(Text.class);//map 输出value类型

        job.setOutputKeyClass(Text.class);//输出结果 key类型
        job.setOutputValueClass(Text.class);//输出结果 value 类型

        FileInputFormat.addInputPath(job, new Path(args[0]));// 输入路径
        FileOutputFormat.setOutputPath(job, new Path(args[1]));// 输出路径

        return job.waitForCompletion(true) ? 0 : 1;//提交任务

    }

    public static class ZhenAvgMap extends Mapper<LongWritable, Text, Text, Text> {

        @Override
        protected void map( LongWritable key, Text value,
                            Mapper<LongWritable, Text, Text, Text>.Context context) throws IOException,
                                                                                    InterruptedException {
            String[] split = value.toString().split("####");
            if (split.length == 4) {
                if (null == split[3] || "".equals(split[3])) {
                    split[3] = "0";
                }
                context.write(new Text(split[1] + "-" + split[2]), new Text(split[3]));
            }
        }
    }

    public static class ZhenAvgReduce extends Reducer<Text, Text, Text, Text> {
        private long count = 0;
        private double sum = 0;
        private double avg = 0;

        @Override
        protected void reduce(  Text key, Iterable<Text> values,
                                Reducer<Text, Text, Text, Text>.Context context)    throws IOException,
                                                                                    InterruptedException {
            for (Text text : values) {
                count = count + 1;
                sum = sum + Double.parseDouble(text.toString().trim());
            }
            avg = sum / count;

            context.write(key, new Text(avg + ""));
        }
    }

}

XianAvg:

package com.lijie.demo4azkaban.avg;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

public class XianAvg extends Configured implements Tool {

    public static void main(String[] args) throws Exception {
        String[] args1 = {  "hdfs://lijie:9000/flume/format/20161013/*",
                            "hdfs://lijie:9000/flume/format/20161013/xianout" };
        int run = ToolRunner.run(new Configuration(), new XianAvg(), args1);
        System.exit(run);
    }

    @Override
    public int run(String[] args) throws Exception {

        Configuration conf = new Configuration();
        Path dest = new Path(args[1]);
        FileSystem fs = dest.getFileSystem(conf);
        if (fs.isDirectory(dest)) {
            fs.delete(dest, true);
        }

        Job job = new Job(conf, "XianAvg");
        job.setJarByClass(XianAvg.class);
        job.setMapperClass(XianAvgMap.class);
        job.setReducerClass(XianAvgReduce.class);

        job.setMapOutputKeyClass(Text.class);//map 输出key类型
        job.setMapOutputValueClass(Text.class);//map 输出value类型

        job.setOutputKeyClass(Text.class);//输出结果 key类型
        job.setOutputValueClass(Text.class);//输出结果 value 类型

        FileInputFormat.addInputPath(job, new Path(args[0]));// 输入路径
        FileOutputFormat.setOutputPath(job, new Path(args[1]));// 输出路径

        return job.waitForCompletion(true) ? 0 : 1;//提交任务

    }

    public static class XianAvgMap extends Mapper<LongWritable, Text, Text, Text> {

        @Override
        protected void map( LongWritable key, Text value,
                            Mapper<LongWritable, Text, Text, Text>.Context context) throws IOException,
                                                                                    InterruptedException {
            String[] split = value.toString().split("####");
            if (split.length == 4) {
                if (null == split[3] || "".equals(split[3])) {
                    split[3] = "0";
                }
                context.write(new Text(split[1]), new Text(split[3]));
            }
        }
    }

    public static class XianAvgReduce extends Reducer<Text, Text, Text, Text> {
        private long count = 0;
        private double sum = 0;
        private double avg = 0;

        @Override
        protected void reduce(  Text key, Iterable<Text> values,
                                Reducer<Text, Text, Text, Text>.Context context)    throws IOException,
                                                                                    InterruptedException {
            for (Text text : values) {
                count = count + 1;
                sum = sum + Double.parseDouble(text.toString().trim());
            }
            avg = sum / count;

            context.write(key, new Text(avg + ""));
        }
    }

}

2.在Azkaban上面创建一个project:
创建project

3.编辑Azkaban工作流脚本(格式需要为unix的)
这里写图片描述
举一个例子(AzkabanDemo.jar是上面mr程序打的jar包,com.lijie.demo4azkaban.avg.FormatLog是具体的类名):
FormatLog.sh:

#!/bin/bash
hadoop jar AzkabanDemo.jar com.lijie.demo4azkaban.avg.FormatLog

FormatLog.job:

type=command
command=sh ./FormatLog.sh
dependencies=begin

3.打包上面Azkaban的脚本文件为zip包上传到project上面(依赖的名字不要写错包括大小写,上传的时候会校验依赖的)。

4.查看上传的任务依赖
依赖

5.通过Azkaban执行job(点击下图的execute)
这里写图片描述

可以查看执行流(蓝色为running,绿色为执行成功,红色为失败):
这里写图片描述

可以查看job list以及详细日志:
这里写图片描述
这里写图片描述

6.程序执行完成之后通过浏览器访问hdfs查看结果:

FormatLog执行完之后(格式化搜集到的数据):
这里写图片描述

PersonAvg执行完之后(所有人的平均积蓄):
这里写图片描述

ZhenAvg执行完之后(每个镇的个人平均积蓄):
这里写图片描述

XianAvg执行完之后(每个县的个人平均积蓄):
这里写图片描述

注意:

期间遇到问题报错ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried,发现是jobhistory服务没有启动,系统连接默认配置的0.0.0.0:10020地址导致连接失败需要在mapred-site.xml里面添加:


<property>  
        <name>mapreduce.jobhistory.address</name>  
        <value>lijie:10020</value>  
</property>

重启集群,然后还要在namenode节点启动jobhistory服务

[root@lijie sbin]# ./mr-jobhistory-daemon.sh start historyserver
starting historyserver, logging to /usr/java/hadoop/logs/mapred-root-historyserver-djt.out
  • 2
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值