1.mapreduce

花开的那一天

已于 2023-12-27 10:23:54 修改

阅读量341

点赞数 10

文章标签： mapreduce

于 2023-12-27 10:18:19 首次发布

本文链接：https://blog.csdn.net/xyzzzH/article/details/135238655

版权

1.基础数据类型有哪些？

Writable接口->序列化

hadoop数据类型	java数据类型
Text	String
IntWritable	int
LongWritable	long
ByteWritable	byte
BytesWritable	byte[]
DoubleWritable	double
FloatWritable	float
NullWritable	null
ShortWritable	short

2.自定义数据类型

// 实现 Writable 接口, Hadoop中用于序列化和反序列化数据的接口
public class xxx implements Writable {
    private int id;
    private String name;
    
    // 将对象序列化为字节流
    public void write(DataOutput out) throws IOException {
        out.writeInt(this.id);
        out.writeUTF(this.name);
    }

    // 从字节流中反序列化对象
    public void readFields(DataInput in) throws IOException {
        this.id = in.readInt();
        this.name = in.readUTF();
    }
    
    // 无参构造
    public CombineBean() {
    }
}

map控制写, reduce控制读,读的顺序与写的顺序必须一致

当mapreduce无reduce函数时,job.setNumReduceTasks(0);必须设置
mapreduce join思路
- mapJoin条件,当小的数据集可以加载到内存中,可以完成在map端join运算
- 当join的两方数据集都很大时,只能走reduce join连接运算

map函数说明

// Map 函数由三部分组成
// setup 函数用于进行一些初始化操作
// map 函数是处理输入数据并生成中间键值对的核心部分
// cleanup 函数用于进行一些资源清理工作


// 在执行 map 函数之前，会先执行一次 setup 函数。这个函数通常用于进行一些初始化操作，例如读取配置文件、准备资源等。这个过程只会执行一次，对所有的输入数据都生效。
protected void setup(Context context) throws IOException, InterruptedException
    
// map 函数是处理输入数据并生成中间键值对的核心部分，具体的实现方式和语法会根据所使用的编程框架和语言而有所不同。
// 在这个函数中，你需要将输入数据按照需要的逻辑进行处理，并将处理后的结果输出为中间键值对。
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException
    
// 在执行完所有的输入数据的 map 函数之后，会执行一次 cleanup 函数。这个函数通常用于进行一些资源清理工作，例如关闭文件、释放内存等。这个过程只会执行一次，对所有的输入数据都生效。
protected void cleanup(Context context) throws IOException, InterruptedException {}//销毁函数

reduce函数说明

setup()、cleanup类似map函数
reduce
// 相同key会规约到一个集合，一批一批调用reduce函数,等同于group分组
protected void reduce(KEYIN key, Iterable<VALUEIN> values, Context context) throws IOException, InterruptedException

花开的那一天

关注

10
点赞
踩
6

收藏

觉得还不错? 一键收藏
0
评论
1.mapreduce

/ 实现 Writable 接口, Hadoop中用于序列化和反序列化数据的接口// 将对象序列化为字节流// 从字节流中反序列化对象// 无参构造map控制写, reduce控制读,读的顺序与写的顺序必须一致当mapreduce无reduce函数时,job.setNumReduceTasks(0);必须设置mapreduce join思路mapJoin条件,当小的数据集可以加载到内存中,可以完成在map端join运算。
复制链接

扫一扫