Dynamo: Amazon\'s Highly Available Key-value Store

Dynamo使用到的技术:

1.动态哈希表(DHTs)
2.一致性哈希(Consistent Hashing)
3.版本(Versioning)
4.矢量时钟(Vector Clocks)
5.仲裁(Quorum)
6.基于反熵的恢复(Anti-Entropy Based Recovery)

Amazon Simple Storage Service(Amazon S3)

Dynamo

Dynamo相关内容:
1.Provides a simple primary-key only interface
2.Data is partitioned and replicated using Consistent Hashing
3.Consistency is facilitated by object versioning
4.The consistency among replicas during updates is maintained by a quorum-like technique
and a decentralized replica synchronization protocol.
5.Employs a gossip based distributed failure detection and membership protocol.
6.Eventally-Consistent
7.Has a simple key/value interface,highly available with a clearly defined consistency window

System Assumptions and Requirements
Query Model:
1.simple read and write operations to a data item that is uniquely identified by a key.
2.state is stored as binary objects identified by unique keys.(usually less than 1MB)
3.No operations span multiple data items
4.there is no need for releational schema

ACID Properties:
1.data stores that provide ACID guarantees tend to have poor availability
2.dynamo targets applications that operate with weaker consistency if this results in high availability

3.dynamo doest not provide any isolation guarantees that permits only single key updates.

Efficiency:

1.services have stringent latency requirements whiich are in general measured at the 99.9th percentile of the distribution.

2.the tradeoffs are in performance, cost efficiency, availability and durability guarantees.

Other Assumptions:

1.dynamo is used only by amazon's internal services.

2.its operation environment is assumed to be non-hosile

3.no security related requirements such as authentication and authorization

4.each service uses its distinct instance of Dyanmo

Service Level Agreements(SLA)

1.give services control over their system properties, such as durability and consistency

2.let services make their own tradeoffs between functionality, performance and cost-effetiveness.

Design Consisderations

1.commercial systems traditionally demand synchronous replica coordination

in order to provide a strongly consistent data access interface

2.Optimistic replication

1) suitable for systems prone to server and network failures

2)changes are allowd to propagate to replicas in the background

3)concurrent, disconnected work is tolerated

4)challenges:

a.it can lead to confilicting changes which must be detected and resolved

b.the change of (a) introduces two problems:

i) when to resolve them

ii) who resolves them

3.Dynamo is designed to be an eventually consistent data store

4.When to resolve update conflicts

1)Many traditional data stores execute conflict resolution during writes, so writes may be rejected

if the data store cannot reach all at a given time(W = N )

2)Dynamo is "always writeable"

5.Who resolves the conflicts

1)the application is the most suitable to resolve the conflicts

2)the data store can only perform simple oerations to resolve conflicts such as "the last wins"

Other key principles embraced in the design

Incremental scalability

Symmetry: every node in Dynamo should have the same set of responsibilities as its peers

Decentralization: the design should favor decentralized peer-to-peer techniques

Heterogeneity: the system needs to be able to exploit heterogeneity in the infrastructure it runs on.

Related Work

P2P Systems

use globally consisten protocol:

Freenet

Gnutella

use routing mechanisms:

Pastry

Chord

built on top of routing overlays

OceanStore

PAST

DFS and DataBases

Replicate files for high availability at the expense of consistency.

Ficus

Coda

DFS

Farsite: achieves high availability and scalability using replication

GFS

Bayou: Distributed relational database system that allows disconnected operations and provides eventual consistency

Antiquity: wide-area distributed storage system designed to handle multiple server failures.

use a secure log to preserve data integrity, replicates each log on multiple servers for durability.

use Byzantine fault tolerance protocols to ensure data consistency.

Bittable: managing structured data. it maintains a sparse, multi-dimensional sorted map

allows applications to access their data using multiple attributes

Dynamo target requirements

1.Dynamo is targeted mainly at applications that nned an "always writeable" data store

2.Dynamo is built for an infrastructure within a single administrative domain where all nodes are trusted

3.Applications that use Dyanmo do not require support for hierarchical namespaces or coplex relational schema

4.Dynamo is built for latency sensitive applications that require at least 99.9% of

read and write operations to be performed within a few undered milliseconds

System Architecture

core distributed systems techniques used in Dynamo:

partitioning

replication

versioning

membership

failure handling

scaling

Problem Technique AdvantagePartitioningConsistent HashingIncremental ScalabilityHigh Availability for writesVector clocks with reconciliation during readsVersion size is decoupled from update ratesHandling temporary failuresSloppy Quorum and hinted handoffProvides high availability and durability guarantee when some of the replicas are not availaleRecovering from permanent failuresAnti-entropy using Merkle treesSynchronizes divergent replicas in the backgroundMembership and failure detectionGossip-based membership protocol and failure detectionPreserves symmetry and avoids having a centralized registry for storing membership and node liveness information


Partitioning Algorithm

1.Dynamo's partitioning scheme relies on consistent hashing to distribute the load across multiple storage hosts

2.Use a variant of consistent hashing: in the ring, the node are all "virtual node",

and a real node can responsible for more than one virtual node

Replication

1.each data item is replicated at N hosts, N is configured "per-instance"

2.each key has a responsible node, the responsible node replicates these keys at the N-1 clockwise successor nodes in the ring

Data Versioning

1.Dynamo uses vector clocks in order to capture causality between different versions of the same object.

2.A vector clock is effectively a list of (node, counter) pairs

?Dynamo: Amazons Highly Available Key-value Store - talktonobody - talktonobody的博客Dynamo: Amazons Highly Available Key-value Store - talktonobody - talktonobody的博客

3.The list size are limited by removing the oldest pairs

Execution of get() and put() operations

Two strategies that a client can use to select a node:

1) route its request through a generic load balancer taht will select a node based on load information

2)use a partiotion-aware client library that routes requests directly to the appropriate coordinator nodes

Main Consistency

1)Dyanmo uses a consistency protocol similar to those used in quorum systems

2)Two key configurable values:R and W

R: the minimum number of nodes that must participate in a successful read operation

? W: the minimum number of nodes that must participate in a successful write operation

set R + W > N yields a quorum-like system

Handling Failures: Hinted Handoff

1.all read and write operations are performed on the first N healthy nodes from the preference list

2.if the target node(A) is down then send the replica to a node(D) next the preference list with hint

that it is tend to be sent to A, and upon detecting A has recovered, D will send the replica to A

Handling Permanent failures: Replica Synchronization

Merkle tree

? 1)a Merkle tree is a hash tree where leaves are hashes of the values of individual keys

? 2)parent nodes higher in the tree are hashes of their respective children

? 3)minimize the amount of data that needs to be transferred for synchroniation

Membership and Failure Detection

1.gossip-based protocol propagates membership changes and maintains an eventually consistent view of membership

2.each node contacts a peer chosen at random every second and

the two nodes efficiently reconcile their persisted membership change histories.

External Discovery

Seeds: Typically seeds are fully functional nodes in the Dynamo ring.

Failure Detection

1.a purely local notion of failure detection is entirely sufficient:

node A may consider node B failed if node B does not respond to node A's messages

2.a periodically retries B to check for the latter's recovery

Adding/Removing Storage Nodes

1. when node x is added into the system, it gets assigned a number of tokens that are randomly scattered on the ring

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值