Hadop案例之WordCount

最新推荐文章于 2023-09-20 22:45:33 发布

liushahe2012

最新推荐文章于 2023-09-20 22:45:33 发布

阅读量736

点赞数

分类专栏：大数据 hadoop 文章标签： hadoop mapreduce

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/liushahe2012/article/details/53890486

版权

大数据同时被 2 个专栏收录

25 篇文章 0 订阅

订阅专栏

25 篇文章 0 订阅

订阅专栏

代码如下：

package hadopp_wordCount;

import java.io.IOException;

import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.Reducer;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import org.apache.hadoop.util.GenericOptionsParser;

public class WordCount {

//map

public static class Map extends Mapper<LongWritable, Text, Text, IntWritable>

{

private static final IntWritable one = new IntWritable(1);

private Text word = new Text();

@Override

protected void map(LongWritable key, Text value, Context context)

throws IOException, InterruptedException {

StringTokenizer iter = new StringTokenizer(value.toString());

while (iter.hasMoreTokens()) {

word.set(iter.nextToken());

context.write(word, one);

}

}

}

//reduce

public static class reduce extends Reducer<Text, IntWritable, Text, IntWritable>

{

private IntWritable result = new IntWritable();

@Override

protected void reduce(Text key, Iterable<IntWritable> value,

Context cont) throws IOException, InterruptedException {

int sum = 0;

for (IntWritable i : value) {

sum += i.get();

}

result.set(sum);

cont.write(key, result);

}

}

//main

public static void main(String args[]) throws Exception

{

Configuration conf = new Configuration();

String[] otherArgs = new GenericOptionsParser(conf,args).getRemainingArgs();

if (otherArgs.length < 2) {

System.out.println("Usage:wordcount <in> [<in>...] <out>");

System.exit(2);

}

Job job = new ~~Job~~(conf, "wordCount");

job.setJarByClass(WordCount.class);

job.setMapperClass(Map.class);

job.setCombinerClass(reduce.class);

job.setReducerClass(reduce.class);

job.setOutputKeyClass(Text.class);

job.setOutputValueClass(IntWritable.class);

for(int i = 0; i < otherArgs.length -1; i++)

{

FileInputFormat.addInputPath(job, new Path(otherArgs[i]));

}

FileOutputFormat.setOutputPath(job, new Path(otherArgs[otherArgs.length - 1]));

System.exit(job.waitForCompletion(true) ? 0 : 1);

}

}

代码比较简单，网上也有很多介绍，本文不再详细描述。

需要注意的一点是命名空间问题：

如果按照如下方式执行WordCount，会报错：

root@node1:/usr/local/hadoop/hadoop-2.5.2/myJar#hadoop jar WordCount.jar WordCount /usr/local/hadooptempdata/input/wc/usr/local/hadooptempdata/output/wc

Exception in thread "main"java.lang.ClassNotFoundException: WordCount

atjava.net.URLClassLoader.findClass(URLClassLoader.java:381)

atjava.lang.ClassLoader.loadClass(ClassLoader.java:424)

atjava.lang.ClassLoader.loadClass(ClassLoader.java:357)

atjava.lang.Class.forName0(Native Method)

atjava.lang.Class.forName(Class.java:348)

atorg.apache.hadoop.util.RunJar.main(RunJar.java:205)

原因是默认命名空间问题，本文中使用的包是package hadopp_wordCount;

按照如下方式执行就没问题：

hadoop jar WordCount.jarhadopp_wordCount.WordCount /usr/local/hadooptempdata/input/wc /usr/local/hadooptempdata/output/wc

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Hadop案例之WordCount

代码如下：package hadopp_wordCount; import java.io.IOException;import java.util.StringTokenizer;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.
复制链接

扫一扫

专栏目录

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。