大数据——MapReduce原理及编程

最新推荐文章于 2021-06-28 16:08:28 发布

蜂蜜柚子加苦茶

最新推荐文章于 2021-06-28 16:08:28 发布

阅读量420

点赞数 2

文章标签： hadoop mapreduce java 大数据

本文链接：https://blog.csdn.net/dsjia2970727/article/details/108520335

版权

Hadoop架构

HDFS——分布式文件系统
MapReduce——分布式计算框架
YARN——分布式资源管理系统
Common

MapReduce

什么是MapReduce

MapReduce是一个分布式计算框架

它将大型数据操作作业分解为可以跨服务器集群并行执行的单个任务

适用于大规模数据处理场景

每个节点处理存储在该节点的数据

每个job包含Map和Reduce两部分

MapReduce的设计思想

分而治之

简化并行计算的编程模型

构建抽象模型：Map和Reduce

开发人员专注于实现Mapper和Reduce函数

隐藏系统层xijie

开发人员专注于业务逻辑实现

MapReduce特点

优点

易于编程
可扩展性
高容错性
高吞吐量

不适用领域

难以实时计算
不适合流式计算

MapReduce实现WordCount

在这里插入图片描述

MapReduce执行过程

数据定义格式

map:(K1,V1)–>list(K2,V2)
reduce:(K2,list(V2))–>list(K3,V3)
MapReduce执行过程
Mapper
Combiner
Partitioner
Shuffle and Sort
Reduce

Hadoop V1 MapReduce引擎

Job Tracker

运行在Namenode
接受客户端Job请求
提交给Task Tracker

Task Tracker

从Job Tracker接受任务请求
执行map、reduce等操作
返回心跳给Job Tracker

Hadoop V2 YARN

YARN的变化

支持更多的计算引擎，兼容MapReduce
更好的资源管理，减少Job Tracker的资源消耗
将Job Tracker的资源管理分为ResourceManager
将Job Tracker的作业资源调度分为ApplicationMaster
NodeManager称为每个节点的资源和任务管理器

在这里插入图片描述

Hadoop及YARN架构

在这里插入图片描述

Hadoop2 MapReduce在YARN上运行流程

在这里插入图片描述

InputSplit（输入分片）

在Map之前，根据输入文件创建inputsplit

每个InputSplit对应一个Mapper任务
输入分片存储的是分片长度和记录数据位置的数组

block和split的区别

block是数据的物理表示
split是块中数据的逻辑表
split划分是在记录的边界处
split的数量应不大于block的数量（一般相等）

在这里插入图片描述

Shuffle阶段

数据从Map输出带Reduce输入的过程
在这里插入图片描述

Key&Value类型

必须可序列化

作用：网络传输以及持久化存储
IntWritable、LongWriteable、FloatWritable、Text、DoubleWritable, BooleanWritable、NullWritable等

都继承了Writable接口

并实现write()和readFields()方法】

Keys必须实现WritableComparable接口

Reduce阶段需要sort
keys需要可比较

MapReduce编程模型

在这里插入图片描述

InputFormat接口

定义了如何将数据读入Mapper

InputSplit[] getSplits

 InputSplit表示由单个Mapper处理的数据
 getSplits方法将一个大数据在逻辑上拆分为InputSplit

常用InputFormat接口实现类

TestInputFormat
FileInputFormat
KeyValueInputFormat

在这里插入图片描述

Mapper类

Mapper主要方法

void setup(Context context)

 org.apache.hadoop.mapreduce.Mapper.Context

void map(KEY key,VALUE value,Context context)

 为输入分片中的每个键/值对调用一次

void cleanup(Context context)

void run(Context context)

 可通过重写该方法对Mapper进行更完整控制

public class WCMapper extends Mapper<LongWritable, Text, Text, IntWritable>
{
    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();
    public void map(LongWritable key, Text value, Context ctx)
        throws IOException, InterruptedException
    {  
       StringTokenizer itr = new StringTokenizer( value.toString() );
        while ( itr.hasMoreTokens() )
        {  
	   word.set( itr.nextToken() );
            ctx.write( word, one);
        }
     } 
  }

Combiner类

Combiner相当于本地化的Reduce操作

在shuffle之前进行本地聚合
用于性能优化，可选项
输入和输出类型一致

Reduce可以被用作Combiner的条件

符合交换律和结合律

实现Combiner

job.setCombinerClass(WCReducer.class)

Partitioner类

用于在Map端对可以进行分区

默认使用的是HashPartitioner

 获取key的哈希值
 使用key的哈希值对Reduce任务数求模

决定每条记录应该送到哪个Reduce处理

自定义Partitioner

继承抽象类Partitioner，重写getPartition方法
job.setPartitionerClass(MyPartitioner.class)

Reducer类

Reducer主要方法

void setup(Context context)

 org.apache.hadoop.mapreduce.Reducer.Context

void reduce(KEY key,Iterable values,Context context)
```
 为每个key调用一次
```
void cleanup(Context context)

voiid run (Context context)

 可通过重写该方法来控制reduce任务的工作方式

public class WCReducer  extends Reducer<Text, IntWritable, Text, IntWritable>
{
    private IntWritable result = new IntWritable();
    public void reduce(Text key, Iterable<IntWritable> values, Context ctx)
        throws IOException, InterruptedException
    {
        int sum = 0;
        for ( IntWritable value : values )
        {
            sum += value.get();
        }
        result.set( sum );
        ctx.write( key, result );
     }
}

OutputFormat接口

定义了如何将数据从Reduce进行输出

RecordWriter<K,V> getRecordWriter

 将Reducerde <key,value>写入到目标文件

checkOutputSpecs
```
 判断输出目录是否存在
```

常用OutputFormat接口实现类

TextOutputFormat
SequenceFileOutputFormat
MapFileOutputFormat

编写M/R Job

Job job = Job.getInstance(getConf(), "WordCountMR" );
//InputFormat
job.setJarByClass(getClass());	
FileInputFormat.addInputPath(job,  new Path(args[0]) );
job.setInputFormatClass(TextInputFormat.class);	
//OutputFormat
FileOutputFormat.setOutputPath( job,  new Path(args[1]) );
job.setOutputFormatClass(TextOutputFormat.class);
//Mapper
job.setMapperClass( WCMapper.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
//Reducer
job.setReducerClass(WCReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);

使用MapReduce实现WordCount

在本地Windows系统下安装和配置Hadoop

解压文件hadoop-2.6.0-cdh5.14.2.tar.gz
把hadoopBin.rar中的内容解压到解压好的hadoop-2.6.0-cdh5.14.2的bin目录下（注意是把解压后的内容，不是解压后的文件夹）
把解压好的hadoopBin中的hadoop.dll复制到C:/windows/system32/目录下
配置hadoop环境变量

编写Java代码

Mapper
Reducer
Job

执行M/R Job

hadoop jar WCMR.jar cn.kgc.WCDriver /user/data  /user/out
//WCMR.jar指定jar包
//WCDriver指定job

设置M/R参数

java实现wordcount

下载maven的hadoop依赖包

<dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-common</artifactId>
      <version>2.6.0</version>
    </dependency>
    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-hdfs</artifactId>
      <version>2.6.0</version>
    </dependency>
    <!--<dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-core</artifactId>
      <version>1.2.0</version>
    </dependency>-->
    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-auth</artifactId>
      <version>2.6.0</version>
    </dependency>
    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-client</artifactId>
      <version>2.6.0</version>
    </dependency>
    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-mapreduce-client-core</artifactId>
      <version>2.6.0</version>
    </dependency>
    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-mapreduce-client-jobclient</artifactId>
      <version>2.6.0</version>
    </dependency>

在这里插入图片描述

a.txt，路径：‪D:/test/a.txt

i wish to wish the wish you wish to wish,but
if you wish the wish the wish wishes,i won't
wish the wish you wish to wish

WCMapper

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import java.io.IOException;
//实现word count的mapper过程
public class WCMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        String line = value.toString();
        String[] words = line.split(" ");
        for (String word : words) {
            context.write(new Text(word), new IntWritable(1));
        }
    }
}

WCReducer

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
import java.io.IOException;
public class WCReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
    @Override
    protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
        int total=0;
        for (IntWritable value : values) {
            total += value.get();
        }
        context.write(key,new IntWritable(total));
    }
}

WCPartitioner

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Partitioner;
public class WCPartitioner extends Partitioner<Text, IntWritable> {
    @Override
    public int getPartition(Text text, IntWritable intWritable, int i) {
        return Math.abs(text.hashCode())% i;
    }
}

WCDriver

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WCDriver {
    public static void main(String[] args)throws Exception {
        //1、建立连接
        Configuration cfg = new Configuration();
        Job job = Job.getInstance(cfg, "job_wc");
        job.setJarByClass(WCDriver.class);
        //2、指定mapper和reduce
        job.setMapperClass(WCMapper.class);
        job.setReducerClass(WCReducer.class);
        //指定mapper输出类型
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);
        //指定paritioner
        job.setNumReduceTasks(4);
        job.setPartitionerClass(WCPartitioner.class);
        //指定reduce输出类型
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        //指定输入输出路径
        FileInputFormat.setInputPaths(job, new Path("D:/test/a.txt"));
        FileOutputFormat.setOutputPath(job, new Path("D:/test/wcResult"));//这个文件夹得是没有创建过的
        //3、运行
        boolean result = job.waitForCompletion(true);
        System.out.println(result ? "成功" : "失败");
        System.exit(result ? 0 : 1);
    }
}

结果如下：
在这里插入图片描述

part-r-00000

part-r-00001

part-r-00002

part-r-00003

我们会发现运行结果后，虽然能出来成功，但是有警告。这不是错误，只是告诉你你没有log4j，也查看不了日志。这时我们只需要做个简单的操作就不会有这种警告了。

在工程下面创建一个package包，取名为resources。
然后给resources添加为资源包。
第一步：打开idea左上角的projects structure。
第二步：点击Modules。
第三步：Sources。
第四步：点击创建好的resources包。
第五步、选择上面的Resources添加成为资源包。
第六部、点击ok。
把log4j.properties文件放到资源包中
log4j.properties

### 设置###
log4j.rootLogger = debug,stdout,D,E

### 输出信息到控制台 ###
log4j.appender.stdout = org.apache.log4j.ConsoleAppender
log4j.appender.stdout.Target = System.out
log4j.appender.stdout.layout = org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern = [%-5p] %d{yyyy-MM-dd HH:mm:ss,SSS} method:%l%n%m%n

### 输出DEBUG 级别以上的日志到=D://logs/error.log ###
log4j.appender.D = org.apache.log4j.DailyRollingFileAppender
log4j.appender.D.File = D://logs/log.log
log4j.appender.D.Append = true
log4j.appender.D.Threshold = DEBUG 
log4j.appender.D.layout = org.apache.log4j.PatternLayout
log4j.appender.D.layout.ConversionPattern = %-d{yyyy-MM-dd HH:mm:ss}  [ %t:%r ] - [ %p ]  %m%n

### 输出ERROR 级别以上的日志到=E://logs/error.log ###
log4j.appender.E = org.apache.log4j.DailyRollingFileAppender
log4j.appender.E.File =D://logs/error.log 
log4j.appender.E.Append = true
log4j.appender.E.Threshold = ERROR 
log4j.appender.E.layout = org.apache.log4j.PatternLayout
log4j.appender.E.layout.ConversionPattern = %-d{yyyy-MM-dd HH:mm:ss}  [ %t:%r ] - [ %p ]  %m%n

再运行一次就不会出现警告，会出现一大堆的日志文件（运行之前先删掉之前的wcResult文件夹）。

HDFS实现wordcount

把WCDriver类更改一下：

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WCDriver {
    public static void main(String[] args)throws Exception {
        //1、建立连接
        Configuration cfg = new Configuration();
        Job job = Job.getInstance(cfg, "job_wc");
        job.setJarByClass(WCDriver.class);
        //2、指定mapper和reduce
        job.setMapperClass(WCMapper.class);
        job.setReducerClass(WCReducer.class);
        //指定mapper输出类型
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);
        //指定paritioner
        job.setNumReduceTasks(4);
        job.setPartitionerClass(WCPartitioner.class);
        //指定reduce输出类型
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        //指定输入输出路径
        FileInputFormat.setInputPaths(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        //3、运行
        boolean result = job.waitForCompletion(true);
        System.out.println(result ? "成功" : "失败");
        System.exit(result ? 0 : 1);
    }
}

打jar包。
打开idea左上角的projects structure
选择WCDriver。
build jar包
找到jar包放入linux的家目录下
在hdfs上创建一个test目录：

hdfs dfs -mkdir  /test/

创建一个a.txt：

vi a.txt

输入：

i wish to wish the wish you wish to wish,but
if you wish the wish the wish wishes,i won't
wish the wish you wish to wish
i love java
i love mysql
i love linux
i love python
i love hadoop
hadoop hdfs mapreduce yarn hbase hive
these things are very

把a.txt上传到hdfs的test目录下：

hdfs dfs -put a.txt /test/a.txt

对a.txt进行wordcount：

hadoop jar testhdfs.jar cn.kgc.kb09.mr.WCDriver /test/a.txt /test/result

查看一下文件：

hdfs dfs -cat /test/result/part-r-00000
hdfs dfs -cat /test/result/part-r-00001
hdfs dfs -cat /test/result/part-r-00002
hdfs dfs -cat /test/result/part-r-00003

在这里插入图片描述

使用MapReduce实现join操作

map端join

大文件+小文件

示例：
COJoinMapper

import cn.kgc.kb09.join.CustomOrder;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.net.URI;
import java.util.HashMap;
import java.util.Map;

//在mapper端进行文件join
public class COJoinMapper extends Mapper<LongWritable, Text, Text, CustomOrder> {
    Map<String,String> map = new HashMap();
    @Override
    protected void setup(Context context) throws IOException, InterruptedException {
        URI[] cacheFiles = context.getCacheFiles();
        if (cacheFiles != null) {
            String filePath = cacheFiles[0].getPath();
            FileReader fr = new FileReader(filePath);
            BufferedReader br = new BufferedReader(fr);
            String line;
            while ((line = br.readLine()) != null && !"".equals(line)) {
                String[] columns = line.split(" ");
                map.put(columns[0],columns[1]);
            }
        }
    }

    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        String line = value.toString();
        String[] columns = line.split(" ");
        CustomOrder co = new CustomOrder();
        String orderId = columns[0];
        String orderStatus = columns[2];
        String custId = columns[3];
        co.setCustomId(custId);
        String custName = map.get(custId);
        co.setCustomName(custName);
        co.setOrderId(orderId);
        co.setOrderStatus(orderStatus);
        //获取没关联到的用户map
        //map.rem
        // ove(custId);
        context.write(new Text(custId),co);
    }
 /*   @Override
    protected void cleanup(Context context) throws IOException, InterruptedException {
        Set<String> keys = map.keySet();
        for (String key : keys) {
            CustomOrder co = new CustomOrder();
            co.setCustomId(key);
            co.setCustomName(map.get(key));
            context.write(new Text(key),co);
        }
    }*/
}

COJoinDriver

import cn.kgc.kb09.join.CustomOrder;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.net.URI;

//map join对应的driver
public class COJoinDriver {
    public static void main(String[] args)throws Exception {
        Job job = Job.getInstance(new Configuration(), "mapjoinJob");
        job.setJarByClass(COJoinDriver.class);
        job.setMapperClass(COJoinMapper.class);
        job.setOutputKeyClass(Text.class);
        job.setMapOutputValueClass(CustomOrder.class);
        String inPath = "file:///D:/ideashuju/testhdfs/data/order.csv";
        String outPath = "file:///D:/test/b";
        String cachePath = "file:///D:/ideashuju/testhdfs/data/customers.csv";
        job.addCacheFile(new URI(cachePath));
        FileInputFormat.setInputPaths(job, new Path(inPath));
        FileOutputFormat.setOutputPath(job, new Path(outPath));
        boolean result = job.waitForCompletion(true);
        System.out.println(result ? "执行成功" : "执行失败");
        System.exit(result?0:1);
    }
}

reduce端join
结果展示：
在这里插入图片描述

示例：
CustomOrder

import org.apache.hadoop.io.Writable;

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;

public class CustomOrder implements Writable {
    private String customId;
    private String customName;
    private String orderId;
    private String orderStatus;
    private String tableFlag;//为0时是custom表，为1时是order表

    @Override
    public void write(DataOutput out) throws IOException {
        out.writeUTF(customId);
        out.writeUTF(customName);
        out.writeUTF(orderId);
        out.writeUTF(orderStatus);
        out.writeUTF(tableFlag);

    }

    @Override
    public void readFields(DataInput in) throws IOException {
        this.customId = in.readUTF();
        this.customName = in.readUTF();
        this.orderId = in.readUTF();
        this.orderStatus = in.readUTF();
        this.tableFlag = in.readUTF();

    }

    public String getCustomId() {
        return customId;
    }

    public void setCustomId(String customId) {
        this.customId = customId;
    }

    public String getCustomName() {
        return customName;
    }

    public void setCustomName(String customName) {
        this.customName = customName;
    }

    public String getOrderId() {
        return orderId;
    }

    public void setOrderId(String orderId) {
        this.orderId = orderId;
    }

    public String getOrderStatus() {
        return orderStatus;
    }

    public void setOrderStatus(String orderStatus) {
        this.orderStatus = orderStatus;
    }

    public String getTableFlag() {
        return tableFlag;
    }

    public void setTableFlag(String tableFlag) {
        this.tableFlag = tableFlag;
    }

    @Override
    public String toString() {
        return "customId='" + customId + '\'' +
                ", customName='" + customName + '\'' +
                ", orderId='" + orderId + '\'' +
                ", orderStatus='" + orderStatus + '\'';
    }
}

COMapperJoin

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;

public class COMapperJoin extends Mapper<LongWritable, Text,Text, CustomOrder> {
    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        String line = value.toString();
        String[] columns = line.split(",");
        for (int i = 0; i < columns.length; i++) {
            columns[i]=columns[i].split("\"")[1];
        }
        CustomOrder co = new CustomOrder();
        if (columns.length == 4) {//order表
            co.setCustomId(columns[2]);
            co.setCustomName("");
            co.setOrderId(columns[0]);
            co.setOrderStatus(columns[3]);
            co.setTableFlag("1");
        } else if (columns.length == 9) {//custom表
            co.setCustomId(columns[0]);
            co.setCustomName(columns[1] + "." + columns[2]);
            co.setOrderId("");
            co.setOrderStatus("");
            co.setTableFlag("0");
        }
        context.write(new Text(co.getCustomId()), co);
        //{1,{CustomOrder(1,xxx,,,0),CustomOrder(1,,20,closed,1)}}
    }
}

COReducerJoin

import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

public class COReducerJoin extends Reducer<Text, CustomOrder, CustomOrder, NullWritable> {
    //    List<CustomOrder> coList = new ArrayList<>();

    @Override
    protected void reduce(Text key, Iterable<CustomOrder> values, Context context) throws IOException, InterruptedException {
        StringBuffer orderIds = new StringBuffer();
        StringBuffer statuses = new StringBuffer();
        CustomOrder customOrder = new CustomOrder();
        for (CustomOrder co : values) {
            if (co.getCustomName().equals("")) {
                orderIds.append(co.getOrderId() + "|");
                statuses.append(co.getOrderStatus() + "|");
            } else {
                customOrder.setCustomId(co.getCustomId());
                customOrder.setCustomName(co.getCustomName());

            }
        }
        String orderId = "";
        String status = "";
        if(orderIds.length()>0) {
            orderId = orderIds.substring(0, orderIds.length() - 1);
        }
        if(statuses.length()>0) {
            status = statuses.substring(0, statuses.length() - 1);
        }
        customOrder.setOrderId(orderId);
        customOrder.setOrderStatus(status);
        context.write(customOrder, NullWritable.get());
    }
}

CODriver

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class CODriver {
    public static void main(String[] args) throws Exception {
        Configuration cfg = new Configuration();
        Job job = Job.getInstance(cfg, "co_job");
        job.setJarByClass(CODriver.class);
        job.setMapperClass(COMapperJoin.class);
        job.setReducerClass(COReducerJoin.class);
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(CustomOrder.class);
        job.setOutputKeyClass(CustomOrder.class);
        job.setOutputValueClass(NullWritable.class);
        FileInputFormat.setInputPaths(job, new Path("file:///D:/ideashuju/testhdfs/data"));
        FileOutputFormat.setOutputPath(job, new Path("file:///D:/test/coResult"));
        boolean result = job.waitForCompletion(true);
        System.out.println(result ? "执行成功" : "执行失败");
        System.exit(result?0:1);
    }
}