大数据之HADOOP之WORDCOUNT

最新推荐文章于 2022-03-10 11:26:34 发布

FRESHET

最新推荐文章于 2022-03-10 11:26:34 发布

阅读量236

点赞数

分类专栏：大数据文章标签：大数据 hadoop mapreduce log4j

本文链接：https://blog.csdn.net/FRESHET/article/details/107505338

版权

大数据专栏收录该内容

5 篇文章 0 订阅

订阅专栏

这里分了三个类，看得清楚：

1.mapper类

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class WordCountMapper extends Mapper<Object, Text, Text, IntWritable>{

   private IntWritable one = new IntWritable(1);
private Text word =new Text();

   @Override
   protected void map(Object key, Text value, Mapper<Object, Text, Text, IntWritable>.Context context)
           throws IOException, InterruptedException {
       // TODO Auto-generated method stub
//       super.map(key, value, context);

       String line=value.toString();

       StringTokenizer stringTokenizer=new StringTokenizer(line);
       while(stringTokenizer.hasMoreTokens()) {

           this.word.set(stringTokenizer.nextToken());
           context.write(word, one);
       }
   }
}

2.REDUCE类

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class WordCountReduce extends Reducer<Text, IntWritable, Text, IntWritable> {

private IntWritable result = new IntWritable();

   @Override
   protected void reduce(Text word, Iterable<IntWritable> nums,
           Reducer<Text, IntWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException {
       // TODO Auto-generated method stub
//       super.reduce(arg0, arg1, arg2);

       int sum = 0;
       for (IntWritable num : nums) {
           sum += num.get();
       }
       result.set(sum);
       context.write(word, result);

}
}

3.主类

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import com.lu.map.WordCountMapper;
import com.lu.red.WordCountReduce;

public class WordCount {

   public static void main(String[] args) throws Exception {

       Configuration conf = new Configuration();

//       Job job = new Job(conf, "wordcount"); // 过期方法
       Job job = Job.getInstance(conf,"wordcount"); // new Job(conf, "word count");
job.setJarByClass(WordCount.class);

       job.setMapperClass(WordCountMapper.class); // 固定格式
       job.setReducerClass(WordCountReduce.class); // 固定格式


//若mapper与reduce输出类型一致，可仅下面两行代码
       job.setOutputKeyClass(Text.class); // 固定格式
       job.setOutputValueClass(IntWritable.class); // 固定格式

//具体形式
       Path in= new Path("hdfs://192.168.1.27:9000/test/wordcnt/in/LICENSE.txt");
       Path out= new Path("hdfs://192.168.1.27:9000/test/wordcnt/out/1");
       FileInputFormat.addInputPath(job, in); // 固定格式
       FileOutputFormat.setOutputPath(job, out);// 固定格式
       System.exit(job.waitForCompletion(true) ? 0 : 1); // 基本固定格式
   }
}

运行错误报错，运行正常无提示，或看提示在src目录下加入log4j.properties

log4j.rootLogger=WARN,CONSOLE
log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender
log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout
log4j.appender.CONSOLE.layout.ConversionPattern=[frame] %d{yyyy-MM-dd HH:mm:ss,SSS} - %-4r %-5p [%t] %C:%L %x - %m%n

FRESHET

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
大数据之HADOOP之WORDCOUNT

这里分了三个类，看得清楚：1.mapper类import java.io.IOException;import java.util.StringTokenizer;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Mapper;public class WordCountMapper extends Mapper<O
复制链接

扫一扫

专栏目录