Hadoop MapReduce Job 提交的多种方案

提交hadoop作业时我们遇到了许多的问题,在网上也查过许多的文章,有许多对hadoop提交作业原理进行分析的文章,却总看不到对具体操作过程讲解的文章,导致我们在eclipse提交的作业总是在eclipse虚拟的云环境中运行。慢慢摸索中,一个一个的作业提交方法被我们发现。

方案:

1. 用命令行方式提交。

2. 在Eclipse中提交作业, 并且读取的是本地文件系统的资源,Job的提交器为localcommiter。

在Linux的eclipse中直接启动Runner类的main方法,这种方式可以使job运行在本地,也可以运行在yarn集群。

究竟运行在本地还是在集群,取决于一个配置参数: mapreduce.framework.name == yarn (local)
如果确实需要在eclipse中提交到yarn执行,必须做好以下两个设置:
a、将mr工程打成jar包(wc.jar),放在工程目录下
b、在工程的main方法中,加入一个配置参数   conf.set("mapreduce.job.jar","wc.jar");


方案一的讲解:

方案二的讲解:

Mapper类如下:

package com.npf.hadoop;

import java.io.IOException;

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.util.StringUtils;

public class WordCountMapper extends Mapper<LongWritable, Text, Text, LongWritable> {

        @Override
        protected void setup(Mapper<LongWritable, Text, Text, LongWritable>.Context context)throws IOException, InterruptedException {
                System.out.println("WordCountMapper.setup()");
        }

        @Override
        protected void map(LongWritable key, Text value,Context context) throws IOException, InterruptedException {
                String[] words = StringUtils.split(value.toString(),' ');
                for (String word : words) {
                        context.write(new Text(word), new LongWritable(1L));
                }
        }

        @Override
        protected void cleanup(Mapper<LongWritable, Text, Text, LongWritable>.Context context)throws IOException, InterruptedException {
                System.out.println("WordCountMapper.cleanup()");
        }

}


Reducer类如下:

package com.npf.hadoop;

import java.io.IOException;
import java.util.Iterator;

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class WordCountReducer extends Reducer<Text, LongWritable, Text, LongWritable>{

        @Override
        protected void setup(Reducer<Text, LongWritable, Text, LongWritable>.Context context)throws IOException, InterruptedException {
                System.out.println("WordCountReducer.setup()");
        }

        @Override
        protected void reduce(Text word, Iterable<LongWritable> counts,Context context)throws IOException, InterruptedException {
                Iterator<LongWritable> iterator = counts.iterator();
                long count = 0L;
                while (iterator.hasNext()) {
                        LongWritable element = iterator.next();
                        count = count + element.get();
                }
                context.write(word, new LongWritable(count));
        }

        @Override
        protected void cleanup(Reducer<Text, LongWritable, Text, LongWritable>.Context context)throws IOException, InterruptedException {
                System.out.println("WordCountReducer.cleanup()");
        }
}

Runner类如下:

/wordcount/srcdata :这是linux下根目录下wordcount目录下的srcdata目录。

/wordcount/outputdata:  这是linux下根目录下wordcount目录下的outputdata目录。

package com.npf.hadoop;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

/**
 *
 * @author root
 *
 */
public class WordCountRunner {

        public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {

                Configuration conf = new Configuration();

                Job job = Job.getInstance(conf);
                job.setJarByClass(WordCountRunner.class);

                //mappper
                job.setMapperClass(WordCountMapper.class);
                job.setMapOutputKeyClass(Text.class);
                job.setOutputValueClass(LongWritable.class);

                //reducer
                job.setReducerClass(WordCountReducer.class);
                job.setOutputKeyClass(Text.class);
                job.setOutputValueClass(LongWritable.class);


                FileInputFormat.setInputPaths(job, "/wordcount/srcdata");
                FileOutputFormat.setOutputPath(job, new Path("/wordcount/outputdata"));

                job.waitForCompletion(true);
        }

}

运行会得到如下结果:




方案三的讲解:



  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值