MapReduce 是google发明的处理海量数据的一种方法。从本质上说,MapReduce 是在分布式环境下把海量数据的处理分解为多个小计算任务的过程,主要包括两个部分: 1、Map 操作: 2、Reduce 操作
以下是原文:
MapReduce: Simplified Data Processing on Large Clusters
Jeffrey Dean and Sanjay Ghemawat
Abstract
MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real world tasks are expressible in this model, as shown in the paper.
Programs written in this functional style are automatically parallelized and e