MapReduce编程

最新推荐文章于 2024-01-20 20:30:00 发布

不再简简单单

最新推荐文章于 2024-01-20 20:30:00 发布

阅读量393

点赞数

分类专栏： Hadoop 文章标签： MapReduce编程 linux hadoop

本文链接：https://blog.csdn.net/simple_start/article/details/94484818

版权

Hadoop 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

JDK安装

https://blog.csdn.net/simple_start/article/details/94218246

hadoop的安装部署、 common模块、HDFS模块、Yarn模块、MapReduce模块安装

https://blog.csdn.net/simple_start/article/details/94412372

MapReduce编程

Maven工程导入

解压到不含中文的目录下

1下载解压本地库文件

链接：https://pan.baidu.com/s/1qgJZp4V55QrbsToidRS_lg

提取码：mh6c

2下载这个项目压缩包

链接：https://pan.baidu.com/s/1c5pcyJk68814reqQ7eQqHA

提取码：3jrz

自己创建Maven也可以，pom.xml可以参考一下

3 idea中导入项目（即Maven项目）

下面配置Maven的本地仓库文件

接下来都是点next

4项目导入后去linux 把下面2个xml文件放到idea下的resource文件夹下

我使用的notepad++ 工具

准备阶段完成

逻辑介绍单词计数为例

MapReduce处理数据流程

在整个MapReduce程序中，所有的数据的流程流式都是键值对（Key-value）

Input -> Map ->shuffle->Reduce ->Output

（1）针对于Input和Output来讲，正常情况下，不需要编写任何的代码，

只需要指定对应目录即可。

（2）核心关注shuffle map和reduce

MapReduce执行过程

hadoop java spring springMvc

java spring java

input环节：

输入：读取HDFS上数据

输出： Key value

0 hadoop java spring springMvc

28 java spring java

Mapper环节

class Mapper<KEYIN, VALUEIN, KEYOUT, VALUEOUT>

<输入Key，输入Value，输出Key，输出Value>

<行偏移量，行内容，XX,YY>

protected void map(KEYIN key, VALUEIN value, Context context)

map要干嘛：

通过空格分割，取出里面的单词

输出： key value

Hadoop 1

java 1

spring 1

springMvc 1

java 1

….

shuffle环节：

功能：

分区：

分组：会将相同Key 的value放到一个集合中

排序：按照字典顺序排序

输出：key value

Hadoop {1}

java {1,1}

spring {1}

Reduce环节：

class Reducer<KEYIN,VALUEIN,KEYOUT,VALUEOUT>

<单词，1，单词，频率>

void reduce(KEYIN key, Iterable<VALUEIN> values, Context context )

处理：将集合里面的值拿出来相加

输出： key value

单词频率（次数）

java 2

Hadoop 1

Output环节：

输入： key value

单词频率（次数）

java 2

Hadoop 1

…

输出：将内容写到HDFS文件中

代码介绍

package com.huadian.bigdata.mapreduce;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.IOException;

public class WordCountMapReduce {
    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {


        //（1）读取配置文件
        Configuration configuration = new Configuration();
        //(2)创建Job
        //Job getInstance(Configuration conf, String jobName)
        Job job = Job.getInstance( configuration, "WordCountMapReduce" );
        //设置Job运行的主类
        job.setJarByClass( WordCountMapReduce.class );
        //(3)设置job
        //(3.1)input
        Path inputPath  = new Path( args[0] );
        FileInputFormat.setInputPaths( job,inputPath );
        //(3.1)map
        job.setMapperClass( WordCountMapper.class );
        job.setMapOutputKeyClass( Text.class );
        job.setMapOutputValueClass( IntWritable.class );
        //(3.1)shuffle
        //(3.1)reduce
        job.setReducerClass( WordCountReducer.class );
        job.setOutputKeyClass(  Text.class );
        job.setOutputValueClass( IntWritable.class );
        //(3.1)output
        Path outputPath  = new Path( args[1] );
        FileOutputFormat.setOutputPath( job,outputPath );

        //(4)提交job，去运行

        //print the progress to the user
        boolean isSuccess = job.waitForCompletion( true );
        System.exit( isSuccess?0:1 );

    }


    /**
     * map方法
     * KeyIn:输入Key的类型
     *      文本行偏移量，使用Long类型表示
     * ValueIn:输入Value的类型
     *      文本中，每一行的内容，使用String表示
     * KeyOut：输出key的类型
     *      单词
     * ValueOut:输出Value的类型
     *      单词对应频率
     */
    private static class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable>{
       private Text mapOutKey   = new Text(  );
       private final  static IntWritable mapOutValue = new IntWritable( 1 );
        @Override
        protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
            //需要将行内容转成一个一个单词：
            String row = value.toString();//行内容hadoop java spring springMvc
            String[] strs = row.split( " " );
            for (String str:strs) {
                mapOutKey.set( str );
                //借助context将Map方法结果进行输出
                context.write( mapOutKey,mapOutValue );
            }
        }
    }

    private static class WordCountReducer extends Reducer<Text, IntWritable,Text, IntWritable>{
        private IntWritable outputValue = new IntWritable(  );
        @Override
        protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {

            int sum = 0;
            //将集合中 值相加
            for (IntWritable value:values) {
                sum += value.get();
            }
            outputValue.set( sum );
            context.write( key,outputValue );
        }
    }
}

在控制台中使用mvn clean清理构建生成的目录和文件

使用mvn package 将项目打包成jar包

放入到linux中的hadoop中

使用rz 或者notepad++

启动hdfs和yarn服务

sbin/hadoop-daemon.sh start namenode

sbin/hadoop-daemon.sh start datanode

sbin/yarn-daemon.sh start resourcemanager

sbin/yarn-daemon.sh start nodemanager

使用命令

bin/yarn jar 上传的jar包运行的主类处理的源文件处理后文件放在那里

运行的主类

成功结果

不再简简单单

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
MapReduce编程

JDK安装https://blog.csdn.net/simple_start/article/details/94218246hadoop的安装部署、 common模块、HDFS模块、Yarn模块、MapReduce模块安装https://blog.csdn.net/simple_start/article/details/94412372MapRed...
复制链接

扫一扫