maprecude框架学习

最新推荐文章于 2023-09-08 13:12:16 发布

mambasmile

最新推荐文章于 2023-09-08 13:12:16 发布

阅读量376

点赞数

分类专栏： hadoop集群配置

本文链接：https://blog.csdn.net/qq_26890109/article/details/91973999

版权

hadoop集群配置专栏收录该内容

2 篇文章 0 订阅

订阅专栏

hadoop：分布式文件系统

MapReduce：分布式计算框架

Yarn：hadoop资源调度系统

使用MapReduce框架编写应用时，只用专注于业务逻辑的开发

wordcount业务逻辑的开发

/**
* Mapper阶段
* 默认InputFormat 格式下，KEYIN表示一行文本的起始偏移量，类型为LongWritable；VALUEIN表示一行文* 本，类型为Text；KEYOUT表示输出的词语，类型为Text；VALUEOUT表示计数，类型为IntWritable；
* Mapper<KEYIN, VALUEIN, KEYOUT, VALUEOUT>
* 
* hadoop 为了提高序列化效率（读写磁盘和网络传输），自定义了一套序列化框架
* Long -> LongWritable
* Int  -> IntWritable
* String -> Text
* Null -> NullWritable 
*/

private static class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable>{
    @Override
    protected void map(LongWritable key, Text value, Context context) throws   
        IOException, InterruptedException{
        String[] words = value.toString().split(" ");
        for(String word: words){
            context.write(new Text(word), new IntWritable(1));
        }
    }
}

/**
* Reduce阶段，由ReduceTask调用Reducer
* key.hashcode % reduceTask数 = reduceTask号
* reducer的输入是Mapper的输出
*/
private static class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritbale>{
    @Override
    protected void reduce(Text key, Inerable<IntWritable> values, Context context) throws IOException, InterruptedException{
    int sum = 0;
    for(IntWritable v: values){
        sum += v.get();
    }
    context.write(key, new IntWritable(sum));
}

/**
* main方法作为mapreduce程序运行的入口，用Job类对象管理程序运行时所需要的参数
*
*/
public static void main(String[] args) throws Exception{
    // 指定hdfs的相关参数
    Configuration conf = new Configuration();
    conf.set("fs.default", "")
    ......  省略 .....
}

运行方式：本地运行方式便于测试，集群运行方式为实际生产方式

本地运行方式：设置mapreduce.framework.name和yarn.resourcemanager.hostname参数决定在本地运行，mapreduce程序被提交给LocalJobRunner在本地以进行形式运行；同时在本地安装hadoop，就是hadoop的jar包

mapreduce核心程序运行机制

运行分布式的mapreduce程序，有两类实例进程；

MRAppMaster ：负责调度，运行在yarn节点上

Yarnchild：不同的Yarnchild进程分别负责map阶段和reduce阶段的数据处理流程

mapreduce运行流程：mapduce程序被提交，启动MRAppMaster，根据本次Job的描述信息，MRAppMaster计算出maptask实例数量，向集群机器申请相应数量的maptask的进程；

maptask进程利用用户指定的inputformat获取RecordReader读取数据，形成输入KV对；maptask调用map方法进行计算，输出KV对，并按照K分区排序输入到磁盘文件；

MRAppMaster监控到maptask进程完成后，启动相应数量的reducetask进程；reducetask进程调用reduce方法

mambasmile

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
maprecude框架学习

hadoop：分布式文件系统MapReduce：分布式计算框架Yarn：hadoop资源调度系统使用MapReduce框架编写应用时，只用专注于业务逻辑的开发wordcount业务逻辑的开发/*** Mapper阶段* 默认InputFormat 格式下，KEYIN表示一行文本的起始偏移量，类型为LongWritable；VALUEIN表示一行文* 本，类型为Text；K...
复制链接

扫一扫

专栏目录