Hadoop WordCount 单词计数示例详细演示

最新推荐文章于 2018-06-02 18:41:03 发布

土豆拍死马铃薯

最新推荐文章于 2018-06-02 18:41:03 发布

阅读量647

点赞数 1

分类专栏：大数据文章标签： Hadoop WorldCount 单词次数统计

本文链接：https://blog.csdn.net/csj941227/article/details/71427855

版权

大数据专栏收录该内容

81 篇文章 3 订阅

订阅专栏

1、安装Eclipse

新建工程，创建Java类

所用的jar包可以在hadoop的安装目录中找到

代码如下：

import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

public class WordCount {
	public static class WordCountMap extends
			Mapper<LongWritable, Text, Text, IntWritable> {
		private final IntWritable one = new IntWritable(1);
		private Text word = new Text();

		public void map(LongWritable key, Text value, Context context)
				throws IOException, InterruptedException {
			String line = value.toString();
			StringTokenizer token = new StringTokenizer(line);
			while (token.hasMoreTokens()) {
				word.set(token.nextToken());
				context.write(word, one);
			}
		}
	}

	public static class WordCountReduce extends
			Reducer<Text, IntWritable, Text, IntWritable> {
		public void reduce(Text key, Iterable<IntWritable> values,
				Context context) throws IOException, InterruptedException {
			int sum = 0;
			for (IntWritable val : values) {
				sum += val.get();
			}
			context.write(key, new IntWritable(sum));
		}
	}

	public static void main(String[] args) throws Exception {
		Configuration conf = new Configuration();
		Job job = new Job(conf);
		job.setJarByClass(WordCount.class);
		job.setJobName("wordcount");
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(IntWritable.class);
		job.setMapperClass(WordCountMap.class);
		job.setReducerClass(WordCountReduce.class);
		job.setInputFormatClass(TextInputFormat.class);
		job.setOutputFormatClass(TextOutputFormat.class);
		FileInputFormat.addInputPath(job, new Path(args[0]));
		FileOutputFormat.setOutputPath(job, new Path(args[1]));
		job.waitForCompletion(true);
	}
}

此后，运行一下。导出为Jar包。

即可找到该jar包。

为了方便将该jar包移动到当前用户根目录下

在主目录下创建文件，输入几句英语

所用的文件已经准备好。

如果是第一次运行，输入hadoop namenode -format格式化一下

否则，输入

运行hadoop。输入Jps查看进程

输入命令创建文件夹。记住名字为testin

输入命令将之前准备的两个文件发送到hadoop，并检查

然后启动。

结果保存在testout中

结果如图

土豆拍死马铃薯

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Hadoop WordCount 单词计数示例详细演示

1、安装Eclipse新建工程，创建Java类所用的jar包可以在hadoop的安装目录中找到代码如下：import java.io.IOException;import java.util.StringTokenizer;import org.apache.hadoop.conf.Configuration;import org.apache
复制链接

扫一扫

专栏目录