MapReduce 自定义输入输出(很有用)

最新推荐文章于 2020-01-29 12:41:22 发布

帅气的程序员

最新推荐文章于 2020-01-29 12:41:22 发布

阅读量2.2k

点赞数 1

分类专栏：大数据-Hadoop 文章标签： mapreduce

本文链接：https://blog.csdn.net/hr787753/article/details/78599861

版权

大数据-Hadoop 专栏收录该内容

10 篇文章 0 订阅

订阅专栏

As We Know : Mapper<输入key，输入Value，输出Key，输出Value>
Reduce<输入key，输入Value，输出Key，输出Value>
其中mapper的输出key，输出value 一定等于reduce的输入key，输入value

自定义输入：

其中：reader.readLine(tmp); 是读取下一行到tmp中
map的默认输入key是行的偏移值 value是每一行的数据
相对map的输入key value 以及读哪些文件我们都可以灵活控制：

输入的格式是有FileInputFormat控制的而对格式的控制是有RecordReader做到的所以要想控制输入格式首先重写FileInputFormat的RecordReader 方法，在重写的RecordReader 中new一个新类(继承FileInputFormat 实现五个方法)，达到控制

上代码：

//1.继承FileInputFormat 重写RecordReader  输入输出为map输入输出
public class AuthReader extends FileInputFormat<Text,Text>{
    @Override
    public RecordReader<Text, Text> createRecordReader(InputSplit split, TaskAttemptContext context) throws IOException, InterruptedException {
        return new InputFormat(); //new的新类
    }
}

//2.创建新类 继承RecordReader  输入输出为map输入输出
public class InputFormat extends RecordReader<Text,Text>{
    private FileSplit fs ;
    private Text key;
    private Text value;
    private LineReader reader;

    private String fileName;

    //初始化方法
    @Override
    public void initialize(InputSplit split, TaskAttemptContext context) throws IOException, InterruptedException {
        fs = (FileSplit) split;
        fileName = fs.getPath().getName();
        Path path = fs.getPath();
        Configuration conf = new Configuration();
        //获取文件系统
        FileSystem system = path.getFileSystem(conf);
        FSDataInputStream in = system.open(path);
        reader = new LineReader(in);
    }


     //知识点1:这个方法会被调用多次   这个方法的返回值如果是true就会被调用一次
     // 知识点2:每当nextKeyValue被调用一次 ，getCurrentKey，getCurrentValue也会被跟着调用一次
     //知识点3:getCurrentKey,getCurrentValue给Map传key,value
    @Override
    public boolean nextKeyValue() throws IOException, InterruptedException {
        //可以定义哪些文件不处理
        if(!fileName.startsWith("wo"))return false;
        Text tmp = new Text();
        int length = reader.readLine(tmp);
        if(length==0){
            return false;
        }else{
            value=new Text(tmp+"何睿");
            key = new Text("我是雷神托尔");
            return true;
        }



    }

    @Override
    public Text getCurrentKey() throws IOException, InterruptedException {
        return key;
    }

    @Override
    public Text getCurrentValue() throws IOException, InterruptedException {
        return value;
    }

    @Override
    public float getProgress() throws IOException, InterruptedException {
        return 0;
    }

    @Override
    public void close() throws IOException {
        if(reader!=null){
            reader.close();
        }
    }
}

最后在Driver中

          //自定义输入
        job.setInputFormatClass(AuthReader.class);

自定义输出：

//writer
public class AuthWriter<K,V> extends FileOutputFormat<K,V>{
    @Override
    public RecordWriter<K, V> getRecordWriter(TaskAttemptContext job) throws IOException, InterruptedException {
        Path path=super.getDefaultWorkFile(job, "");
        Configuration conf=job.getConfiguration();
        FileSystem fs=path.getFileSystem(conf);
        FSDataOutputStream out=fs.create(path);
        //新类 的键值分割符      行分割符
        return new NOutputFormat<K,V>(out,"#|#","\r\n");
    }

//实现类
public class NOutputFormat<K,V> extends RecordWriter<K,V>{
    private FSDataOutputStream out;
    private String keyValueSeparator;//键值分隔符
    private String lineSeparator; //行与行分隔符

    public NOutputFormat(FSDataOutputStream out,String keyValueSeparator,String lineSeparator){
        this.out=out;
        this.keyValueSeparator=keyValueSeparator;
        this.lineSeparator=lineSeparator;
    }



    @Override
    public void write(K key, V value) throws IOException, InterruptedException {
        out.write(key.toString().getBytes());//key
        out.write(keyValueSeparator.getBytes());//键值对分隔符
        out.write(value.toString().getBytes());//vale
        out.write(lineSeparator.getBytes());//行与行分隔符
    }

    @Override
    public void close(TaskAttemptContext context) throws IOException, InterruptedException {
        if(out!=null)out.close();
    }
}


//在Driver中 
        //自定义输出
        job.setOutputFormatClass(AuthWriter.class);

多输入源一个job执行

在Driver中

//对A目录 用A Mapper  A Reduce 执行
MultipleInputs.addInputPath(job, new Path("hdfs://xxx:9000/formatscore/format
score.txt"),AuthInputFormat.class,ScoreMapper.class);

//对B目录 用B Mapper  B Reduce 执行
MultipleInputs.addInputPath(job, new Path("hdfs://xxx:9000/formatscore/format
score-1.txt"),TextInputFormat.class,ScoreMapper2.class);

还有三个知识点:

多输出源暂且用不到用的时候再看文档吧。

针对于小文件开启uber模式

以及小文件合并成一个切片用到的时候在实际用

帅气的程序员

关注

1
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
MapReduce 自定义输入输出(很有用)

As We Know : Mapper<输入key，输入Value，输出Key，输出Value> Reduce<输入key，输入Value，输出Key，输出Value> 其中mapper的输出key，输出value 一定等于reduce的输入key，输入value自定义输入： map的默认输入key是行的偏移值 value是每一行的数据
复制链接

扫一扫

专栏目录