java多线程与mapreduce,Mapreduce工作使用多线程

I am curious about whether mapreduce job is using multiple threading in a single machine. For example, I have 10 servers in the hadoop cluster, by default, if the input file is large enough, there will be 10 mappers. Is the single mapper using multiple threading in a single machine?

解决方案

Is the single mapper using multiple threading in a single machine?

YES. Mapreduce job can use multithreaded mapper(Multiple threads or thread pool running map method) .

I have used for better CPU utilization for Map only Hbase jobs...

MultiThreadedMapper is a good fit if your operation is highly CPU intensive, could increase the speed.

mapper class should extend org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper instead of regular org.apache.hadoop.mapreduce.Mapper .

The Multithreadedmapper has a different implementation of run()

method. like below.

run(org.apache.hadoop.mapreduce.Mapper.Context context)

Run the application's maps using a thread pool.

You can set the number of threads within a mapper in MultiThreadedMapper by

MultithreadedMapper.setNumberOfThreads(n); or you can set the property in loading from a property file mapred.map.multithreadedrunner.threads = n

and use above setter method(per job basis) to control jobs which are less cpu intensive.

The affect of doing this, you can see in mapreduce counters specially CPU related counters.

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.Reducer;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;

import org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import org.apache.hadoop.mapreduce.lib.output.MultipleOutputs;

import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

import java.io.IOException;

import java.util.regex.Pattern;

public class MultithreadedWordCount {

// class should be thread safe

public static class WordCountMapper extends Mapper {

public static enum PREPOST { SETUP, CLEANUP }

@Override()

protected void setup(Mapper.Context context) throws java.io.IOException, java.lang.InterruptedException {

// will be called several times

context.getCounter(PREPOST.SETUP).increment(1);

}

@Override

protected void map(LongWritable key, Text value,

Context context) throws IOException, InterruptedException {

String[] words = value.toString().toLowerCase().split("[\\p{Blank}[\\p{Punct}]]+");

for (String word : words) {

context.write(new Text(word), new LongWritable(1));

}

}

@Override()

protected void cleanup(Mapper.Context context) throws java.io.IOException, InterruptedException {

// will be called several times

context.getCounter(PREPOST.CLEANUP).increment(1);

}

}

public static class WordCountReducer extends Reducer {

@Override

protected void reduce(Text key, Iterable values, Context context

) throws IOException, InterruptedException {

long sum = 0;

for (LongWritable value: values) {

sum += value.get();

}

context.write(key, new LongWritable(sum));

}

}

public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {

Job job = new Job();

job.setJarByClass(WordCount.class);

FileInputFormat.addInputPath(job, new Path(args[0]));

FileOutputFormat.setOutputPath(job, new Path(args[1]));

MultithreadedMapper.setMapperClass(job, MultithreadedWordCount.WordCountMapper.class);

MultithreadedMapper.setNumberOfThreads(job, 10);

job.setMapperClass(MultithreadedMapper.class);

job.setCombinerClass(MultithreadedWordCount.WordCountReducer.class);

job.setReducerClass(MultithreadedWordCount.WordCountReducer.class);

job.setOutputKeyClass(Text.class);

job.setOutputValueClass(LongWritable.class);

/* begin defaults */

job.setInputFormatClass(TextInputFormat.class);

job.setOutputFormatClass(TextOutputFormat.class);

/* end defaults */

job.waitForCompletion(true);

}

}

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值