What is a distributed system? multiple cooperating computers storage for big web sites, MapReduce, peer-to-peer sharing, &c lots of critical infrastructure is distributed
P2P和分布式的关系?因为节点之间都是平等的?
Why do people build distributed systems? to increase capacity via parallelism to tolerate faults via replication to place computing physically close to external entities to achieve security via isolation
分布式能处理更多请求,所以增加了容量?
通过replication容错
因为是分布式,所以把一些计算设施放到某些entity附近?
安全,isolation?和分布式的关系是?
But: many concurrent parts, complex interactions must cope with partial failure tricky to realize performance potential
各节点之间需要通信,交互;
可能有partial failure
要实现很好的scale很难
Why take this course? interesting -- hard problems, powerful solutions used by real systems -- driven by the rise of big Web sites active research area -- important unsolved problems hands-on -- you'll build real systems in the labs
Course components: lectures papers two exams labs final project (optional)
Lectures: big ideas, paper discussion, and labs will be video-taped, available online
Papers: research papers, some classic, some new problems, ideas, implementation details, evaluation many lectures focus on papers please read papers before class! each paper has a short question for you to answer and we ask you to send us a question you have about the paper submit question&answer before start of lecture
Labs: goal: deeper understanding of some important techniques goal: experience with distributed programming first lab is due a week from Friday one per week after that for a while
Lab 1: MapReduce Lab 2: replication for fault-tolerance using Raft Lab 3: fault-tolerant key/value store Lab 4: sharded key/value store
This is a course about infrastructure for applications. * Storage. * Communication. * Computation.
The big goal: abstractions that hide the complexity of distribution.
infrastructure!存储,通信,计算
目标是屏蔽分布式的技术细节(专注于业务,框架的目的其实都差不多)
Topic: fault tolerance 1000s of servers, big network -> always something broken We'd like to hide these failures from the application. We often want: Availability -- app can make progress despite failures Recoverability -- app will come back to life when failures are repaired Big idea: replicated servers. If one server crashes, can proceed using the other(s). Very hard to get right server may not have crashed, but just unreach