1. Failure is the number one concern in distributed system design
- Hardware failure
- Software failure
- The network is reliable
- Latency is zero.
- Bandwidth is infinite.
- The network is secure.
- Topology doesn't change.
- There is one administrator.
- Transport cost is zero.
- The network is homogeneous.
2. MapReduce OSDI'04 Paper
- MapReduce
A programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key.
- Programming Model Example
- Count of URL frequency
- Reverse Web-Link Graph
- Inverted Index
- Distributed Sort
- Execution
- Distribute Map invocations across mutiple machines by automatically partitioning input data to M sets of split
- R invocations are distributed by partitioning intermediate key space to R pieces
- At start, several programs are started on cluster of machines and one of them is master which is responsible for distribute map/task to other workers
- The worker assigned map task read data from correspondig split and call user-defined Map function. Result are cached in memory, then periodically flushed to R different files on local disk and the locations are sent to Master which forward them to Reduce workers.
- Reduce worker will read these data with rpc and sort them after all intermediate data readed. Sorted keys will be passed ti Reduce function(Reduce function必须是可结合的?(结合律))
- Fault Tolerance
- Completed map tasks are re-executed on a failure because their output is stored on the local disk
- Completed reduce tasks do not need to be re-executed since their output is stored in a global file system
- Client has to check and try when Master fails
- Failure Semantics
- With deterministic functions, produces the same output as produced by a non-faulting sequential execution of the entire program
- Rely on atomic commits of map and reduce task outputs to achieve this
- Redundant Map results are ignored by Master.
- Reduce tasks keep result in temp files and rename it to indicated file. Rename is atomic.
- With deterministic functions, produces the same output as produced by a non-faulting sequential execution of the entire program
- Locality
GFS divides each file into 64 MB blocks, and stores several copies of each block (typically 3 copies) on different machines. The MapReduce master takes the location information of the input files into account and attempts to schedule a map task on a machine that contains a replica of the corresponding input data.
- Straggler(落后者) Problem
To alleviate the problem, when a MapReduce operation is close to completion, the master schedules backup executions of the remaining in-progress tasks
- Combiner function
- do partial merging of merge task output before it is sent over the network
- mostly same as reduce function
- run together with Map task