Distributed Systems (1)

这篇博客介绍了分布式系统的定义及其带来的并行计算能力增强、高可用性和克服地理距离等优势。同时,也提到了安全性、并发控制和部分故障等挑战。在基础设施方面,涉及了存储、通信和计算。分布式系统通过RPC和线程实现进程间通信。预期性能关注可扩展性和容错性,如使用非易失性存储确保故障恢复。MapReduce作为一种编程模型,简化了大型数据集的并行计算,例如在WordCount应用中进行数据处理。
摘要由CSDN通过智能技术生成

Lecture 1 Introduction

Definition
a group of independent computers presents a unified whole to the user as if it were a system. The system has a variety of general physical and logical resources, which can dynamically allocate tasks, and the scattered physical and logical resources realize information exchange through the computer network.

1. Advantages and disadvantages of Distributed Systems

Advantages:

  1. Parallelism of CPUs: Enhancing computing ability.
  2. High reliability: If one of the nodes fails, the others can continue. (Raft algorithm)
  3. Overcome physical distance: The nodes are interconnected through a communication network.

Disadvantages:

  1. Security: Easy data sharing also means that confidential data can be stolen easily.
  2. Concurrency: concurrent programmings mean complex interactions.
  3. Partial failure: Multiple pieces plus a network can have unexpected failure.

2. Infrastructure and Implementation

Infrastructure:
The following infrastructures need to be considered when building a distributed system:

  1. Storage
  2. Communication
  3. Computation

Implementation:

  1. RPC(Remote Procedure Call): it is one of the means of distributed inter-process communication in message transfer mode.
  2. Threads: a way of structuring concurrent operations that hopefully simplifies the programmer view of those concurrent operations.

3. Expected Performance

  1. Scalability: Improve through put by increasing the number of computers with little refactor.
  2. Fault Tolerance:
    - Availability: Under some certain kinds of failures, the system can will keep operation.
    - Recoverability: After the repaire, the system will be able to continue as if nothing bad gone wrong without any loss of correctness. ( One Solution: Using non-volatile store like hard drivers or flash or solid state driver)

4. MapReduce

Definition
MapReduce is a programming model for parallel computation of large data sets (larger than 1TB). It greatly facilitates programmers to run their programs on distributed systems without distributed parallel programming. Current software implementations specify a Map function that maps a set of key-value pairs to a new set of key-value pairs, and a concurrent Reduce function that ensures that each of all mapped key-value pairs shares the same key set.
Advice: Reading MapReduce paper.
在这里插入图片描述
Example: WordCount
In this case,
Map(k,v)
Split v into work
for each word w
emit(w,“1”)
Reduce(k,v)
emit(len(v))
在这里插入图片描述

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值