hadoop 中的 “helloword” 代码 + 详解

最新推荐文章于 2023-02-21 17:37:06 发布

橙子

最新推荐文章于 2023-02-21 17:37:06 发布

阅读量540

点赞数

分类专栏： Hadoop 文章标签： hadoop

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/jgoodLucky/article/details/78237649

版权

Hadoop 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

package com.zhiyou.bd17.mapreduce;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.Reducer;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WordCount {

// 定义map . Mapper 的四个参数在这里第一个参数LongWritable key 代表偏移量，第二个参数Text 代表一条记录，第三个参数Text 代表输出的key，第四个参数IntWritable 代表value

public static class WordCountMap extends Mapper<LongWritable, Text, Text, IntWritable>{

private String[] infos ;

private Text oKey = new Text();

private final IntWritable oValue = new IntWritable(1);

@Override

protected void map(LongWritable key , Text value , Mapper<LongWritable, Text, Text, IntWritable>.Context context )

throws IOException, InterruptedException {

// 解析一行数据，转换成一个单词数组

infos = value .toString().split( "\\s" );

for (String i : infos ){

// 把单词形成一个 kv 对发送给 reducer (单词,1)

oKey .set( i );

context .write( oKey , oValue );

}

}

// 定义reducer 。 reducer 的四个参数第一个参数 Text 是 Mapper 传过来的key 第二个参数 IntWritable 是 Mapper 传过来的value，

// 第三个参数Text 代表reducer 输出的key 第四个参数IntWritable 代表reducer 输出的value

public static class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable>{

private int sum ;

private IntWritable oValue = new IntWritable(0);

@Override

protected void reduce(Text key , Iterable<IntWritable> values ,

Reducer<Text, IntWritable, Text, IntWritable>.Context context )

throws IOException, InterruptedException {

sum = 0;

for (IntWritable value : values ){

sum += value .get();

}

// 输出 kv (单词，单词的计数)

oValue .set( sum );

context .write( key , oValue );

}

}

// 组装一个job 到 mr 引擎上执行

public static void main(String[] args ) throws IOException, ClassNotFoundException, InterruptedException {

// 构建一个configuration ，用来配置 hdfs 位置和 mr 的各项参数

Configuration configuration = new Configuration();

// 创建job 对象

Job job = Job. getInstance ( configuration );

job .setJarByClass(WordCount. class );

job .setJobName( "第一个mr作业:wordCount" );

// 配置 mr 执行类

job .setMapperClass(WordCountMap. class );

job .setReducerClass(WordCountReducer. class );

// 设置 mr 的输出类型。如果 Mapper 和reducer 的输出类型一致，可以将设置 mapper 的输出类型省略

// job.setMapOutputKeyClass(Text.class);

// job.setMapOutputValueClass(IntWritable.class);

job .setOutputKeyClass(Text. class );

job .setOutputValueClass(IntWritable. class );

//设置数据源(等待被处理的数据)

// path 可以指定一个文件或者一个文件夹，如果是文件夹就处理该文件夹下的所有子文件

Path intputPath = new Path( "/test/README.ext" );

// 可以多次调用该方法，给 mrjob 设置多个处理文件的路径

FileInputFormat. addInputPath ( job , intputPath );

// 设置目标数据的存放位置，是一个目录，不是一个文件，而且当前 hdfs 上不能已有这个目录

Path outputPath = new Path( "/bd17/output/wordcount" );

outputPath .getFileSystem( configuration ).delete( outputPath , true );

// 设置 mrjob 的最终输出结果位置，一个 mrjob 只能有一个输出目录

FileOutputFormat. setOutputPath ( job , outputPath );

// 启动作业，分布式计算交给 mr 引擎. true：是否打印处理过程

boolean result = job .waitForCompletion( true );

System. exit ( result ? 0 : 1);

}

}

}

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
hadoop 中的 “helloword” 代码 + 详解

packagecom.zhiyou.bd17.mapreduce;importjava.io.IOException;importorg.apache.hadoop.conf.Configuration;importorg.apache.hadoop.fs.Path;importorg.apache.hadoop.io.IntWritable;imp
复制链接

扫一扫

专栏目录

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。