第一个Hadoop程序WordCount

首先说一下在eclipse下搭建Hadoop开发环境

  1. 准备工具:eclipse ,hadoop插件,注意版本要一致
    eclipse可以去官网下载
    插件我上传了一个,hadoop-eclipse-plugin-2.7.2.jar

  2. 将插件cp到eclipse/plugins下面

  3. Window -> Open Perspective -> Other 选择Map/Reduce
  4. 在eclipse下端,控制台旁边会多一个Tab,叫“Map/Reduce Locations”,在下面空白的地方点右键,选择“New Hadoop location…”在弹出的对话框中填写如下内容:

    Location name(取个名字)
    Map/Reduce Master(Job Tracker的IP和端口,根据mapred-site.xml中配置的mapred.job.tracker来填写)
    DFS Master(Name Node的IP和端口,根据core-site.xml中配置的fs.default.name来填写)

  5. 创建MapReduce工程

    5.1配置Hadoop路径

    Window -> Preferences 选择 “Hadoop Map/Reduce”,点击“Browse…”选择Hadoop文件夹的路径。
    这个步骤与运行环境无关,只是在新建工程的时候能将hadoop根目录和lib目录下的所有jar包自动导入。
    5.2创建工程

    File -> New -> Project 选择“Map/Reduce Project”,然后输入项目名称,创建项目。插件会自动把hadoop根目录和lib目录下的所有jar包导入。
    5.3创建Mapper或者Reducer

    File -> New -> Mapper 创建Mapper,自动继承mapred包里面的MapReduceBase并实现Mapper接口。
    注意:这个插件自动继承的是mapred包里旧版的类和接口,新版的Mapper得自己写。

    Reducer同理。
    下面是wordcount源码:

import java.io.IOException;
import java.util.Iterator;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;
import org.apache.hadoop.mapreduce.Job;

import sun.util.locale.StringTokenIterator;


public class WordCount {

    public static class MapClass extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable>{
        private final static IntWritable one=new IntWritable(1);
        private Text word=new Text();
        public void map(LongWritable key,Text value,OutputCollector<Text, IntWritable> output,Reporter reporter)throws IOException{
            String line=value.toString();
            StringTokenizer stringTokenizer=new StringTokenizer(line);
            while(stringTokenizer.hasMoreTokens()){
                word.set(stringTokenizer.nextToken());
                output.collect(word, one);
            }
        }

    }
    public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable>{
        public void reduce(Text key,Iterator<IntWritable> value,OutputCollector<Text, IntWritable> output,Reporter reporter)throws IOException{
            int sum=0;
            while(value.hasNext()){
                sum+=value.next().get();
            }
            output.collect(key, new IntWritable(sum));
        }
    }

    public static void main(String[] args) throws Exception{
        // TODO Auto-generated method stub
        Configuration conf=new Configuration();
        JobConf job=new JobConf(conf,WordCount.class);
        FileInputFormat.setInputPaths(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        job.setJobName("wordcount");
        job.setMapperClass(MapClass.class);
        job.setReducerClass(Reduce.class);
        job.setInputFormat(TextInputFormat.class);
        job.setOutputFormat(TextOutputFormat.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        JobClient.runJob(job);
    }

}

Hadoop第一个WordCount程序是一个非常简单的程序,它的主要目的是计算一个文本文件中每个单词出现的次数。 以下是一个基本的WordCount程序: 1. 创建一个Java项目并导入Hadoop库。 2. 创建一个Java类并实现以下Mapper和Reducer: Mapper类: ```java import java.io.IOException; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; public class WordCountMapper extends Mapper<LongWritable, Text, Text, LongWritable> { private final static LongWritable one = new LongWritable(1); private Text word = new Text(); @Override public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); String[] words = line.split(" "); for (String w : words) { word.set(w); context.write(word, one); } } } ``` Reducer类: ```java import java.io.IOException; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Reducer; public class WordCountReducer extends Reducer<Text, LongWritable, Text, LongWritable> { @Override public void reduce(Text key, Iterable<LongWritable> values, Context context) throws IOException, InterruptedException { long sum = 0; for (LongWritable val : values) { sum += val.get(); } context.write(key, new LongWritable(sum)); } } ``` 3. 在应用程序的main()方法中,创建一个Job并设置Mapper和Reducer类: ```java import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class WordCount { public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = Job.getInstance(conf, "word count"); job.setJarByClass(WordCount.class); job.setMapperClass(WordCountMapper.class); job.setCombinerClass(WordCountReducer.class); job.setReducerClass(WordCountReducer.class); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(LongWritable.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(LongWritable.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } } ``` 4. 在命令行中运行程序,并指定输入和输出文件路径: ```bash hadoop jar WordCount.jar WordCount /input /output ``` 其中,/input是输入文件路径,/output是输出文件路径。 这是最基本的WordCount程序,你可以根据需要进行修改和扩展。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值