Google如何处理海量数据

最新推荐文章于 2022-05-29 19:23:11 发布

taito

最新推荐文章于 2022-05-29 19:23:11 发布

阅读量5.2k

点赞数

文章标签： google mapreduce system processing function resources

本文链接：https://blog.csdn.net/taito/article/details/1619951

版权

MapReduce是Google为处理海量数据而设计的一种编程模型，它将数据处理任务分解为Map和Reduce两个阶段。Map操作处理键值对生成中间键值对，Reduce操作则合并相同中间键的所有中间值。该模型适用于并行化执行，运行时系统自动处理数据分区、任务调度、机器故障管理和通信管理，使得没有分布式系统经验的程序员也能利用大型分布式系统资源。Google的MapReduce实现能在数千台机器上处理数TB的数据，并且每天执行上千个MapReduce作业。

摘要由CSDN通过智能技术生成

MapReduce 是google发明的处理海量数据的一种方法。从本质上说，MapReduce 是在分布式环境下把海量数据的处理分解为多个小计算任务的过程，主要包括两个部分： 1、Map 操作： 2、Reduce 操作

以下是原文:

MapReduce: Simplified Data Processing on Large Clusters
Jeffrey Dean and Sanjay Ghemawat

Abstract

MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real world tasks are expressible in this model, as shown in the paper.

Programs written in this functional style are automatically parallelized and e