几年前,当我拥有一台8核计算机时,我为自己创建了一个一次性产品,但我对此并不感到十分满意。 我从来没有像我希望的那样使它简单到可以使用,而且内存密集型任务无法很好地扩展。
如果您没有真正的答案,我可以分享更多,但是它的核心是:
public class LocalMapReduce {
private int m_threads;
private Mapper m_mapper;
private Reducer m_reducer;
...
public TOutput mapReduce(Iterator inputIterator) {
ExecutorService pool = Executors.newFixedThreadPool(m_threads);
Set> futureSet = new HashSet>();
while (inputIterator.hasNext()) {
TMapInput m = inputIterator.next();
Future f = pool.submit(m_mapper.makeWorker(m));
futureSet.add(f);
Thread.sleep(10);
}
while (!futureSet.isEmpty()) {
Thread.sleep(5);
for (Iterator> fit = futureSet.iterator(); fit.hasNext();) {
Future f = fit.next();
if (f.isDone()) {
fit.remove();
TMapOutput x = f.get();
m_reducer.reduce(x);
}
}
}
return m_reducer.getResult();
}
}
编辑:根据评论,下面是没有sleep的版本。诀窍是使用CompletionService,这实际上提供了已完成Futures的阻止队列。
public class LocalMapReduce {
private int m_threads;
private Mapper m_mapper;
private Reducer m_reducer;
...
public TOutput mapReduce(Collection input) {
ExecutorService pool = Executors.newFixedThreadPool(m_threads);
CompletionService futurePool =
new ExecutorCompletionService(pool);
Set> futureSet = new HashSet>();
for (TMapInput m : input) {
futureSet.add(futurePool.submit(m_mapper.makeWorker(m)));
}
pool.shutdown();
int n = futureSet.size();
for (int i = 0; i < n; i++) {
m_reducer.reduce(futurePool.take().get());
}
return m_reducer.getResult();
}
我还将注意到,这是一个非常精简的map-reduce算法,其中包括一个同时执行reduce和merge操作的reduce工具。