hadoop3--编写简单的map reduce

最新推荐文章于 2022-04-24 10:18:00 发布

weixin_30852367

最新推荐文章于 2022-04-24 10:18:00 发布

阅读量81

点赞数

原文链接：http://www.cnblogs.com/ivywenyuan/p/4579365.html

版权

运行结果附图

本节课程主要内容为MapReduce基本原理，以及在MapReduce编程环境搭建。

实验内容为：在Eclipse中编写对文本的字母进行计数的MapReduce程序，在本地调试成功后，将java工程打成jar包放到Hadoop集群上运行。

在linux下安装eclipse，建立新的java工程，并在该工程建立 user library，将已下载的Hadoop文件夹内的所有jar包添加到该library中。

编写java代码，实现文本的字母计数功能
代码的主体部分如下：

 public class LetterCount {
//继承Mapper，设置map的输入类型为<Object,Text>,输出类型为<Text,IntWritable>
public static class Map extends Mapper<Object,Text,Text,IntWritable> {
    private final static IntWritable one  =  new IntWritable(1);//one 为字母计数为1
    private Text word  = new Text();
    public void map(Object key,Text value,Context context)throws IOException,InterruptedException{
        //value为文本，先将文本分割成一个个字符，对每一个字符判断是否为字母，并对其计数
        String str = value.toString();
        char[] ch = str.toCharArray();
        for(int i=0;i<ch.length;i++){
            if((ch[i]<=90&&ch[i]>=65)||(ch[i]<=122&&ch[i]>=97)){
                word.set(String.valueOf(ch[i]));
                context.write(word, one);
            }
        }   
    }
}
//继承Reducer，设置reduce的输入类型为<Text,IntWritable>,输出类型为<Text,IntWritable>
public static class Reduce extends Reducer <Text,IntWritable,Text,IntWritable>{
    private static IntWritable result = new IntWritable();//result记录单词的频数
        public void reduce(Text key, Iterable<IntWritable> values,Context context)throws IOException, InterruptedException {
            int sum = 0;
            for (IntWritable val : values) {
            sum += val.get();
            }
        result.set(sum);//将频数存到result中
        context.write(key, result);//记下此key对应的计数
    }   
}
public static void main(String[] args) throws Exception{
    //GenerciOptionsParser 用来明确namenode，jobtracker和其他的配置资源
    Configuration conf = new Configuration();
    String[] otherArgs = new GenericOptionsParser(conf,args).getRemainingArgs();
    if(otherArgs.length!=2){
        System.err.println("Usage LetterCount <int> <out>");
        System.exit(2);
    }
    //创建作业
    Job job  = new Job(conf,"letterCount");
    //配置作业各个类
    job.setJarByClass(LetterCount.class);
    job.setMapperClass(Map.class);
    job.setCombinerClass(Reduce.class);
    job.setReducerClass(Reduce.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);

    FileInputFormat.addInputPath(job, new Path(otherArgs[0]));//设置输入文件路径
    FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));//设置输出文件路径
    System.exit(job.waitForCompletion(true) ? 0 : 1);   
 }
}

编译程序运行结果

输入文本

输出结果
在本机打jar包，复制到docker中的master主机上

使用linux下的scp命令
hadoop对wordcount文件执行此jar，实现字母计数功能
运行结果如下

问题、心得与体会记录

在windows下MR编程环境搭建，需要设置环境变量，复制文件到os系统目录下等一系列工作，较为繁琐，而linux下直接可以在eclipse下编程，只要确保建立好java工程下的user library引用hadoop的一系列jar即可。
将本机的jar包文件复执到docker中的主机时，可以有多种方法：
- 使用scp 命令 scp username@ip:dir1 dir2
  
  dir1为复制文件的源端路径，dir为目的端路径（前提：开启ssh服务）
- 使用HUE，因为浏览器为本机的，所以在web页面，HUE可以访问本机下的文件，从而实现将本机文件上传到HDFS上
  
  然后可以在docker中使用hadoop fs -get命令将文件下载到master上。

转载于:https://www.cnblogs.com/ivywenyuan/p/4579365.html