Book reading: Distributed Systems(Principles and Paradigms)----Chapter 8

最新推荐文章于 2018-11-21 22:47:26 发布

tcwwh

最新推荐文章于 2018-11-21 22:47:26 发布

阅读量463

点赞数

分类专栏： Distributed system 文章标签： paradigms system transition server crash

本文链接：https://blog.csdn.net/tcwwh/article/details/7024411

版权

Distributed system 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

1. Fault Tolerance

Beingfault tolerant is strongly related what are called dependable system, thesystem have the following requirements:

a) Availability

b) Reliability

c) Safety

d) Maintainability

Availability

It isdefined as the property that a system is ready to be used immediately. Ingeneral, it refers to the probability that the system is operating correctly atany given moment and is available to perform its functions on behalf of itsusers. In other words, a highly available system is one that will most likelybe working at a given instant in time.

Reliability

Refersto the property that a system can run continuously without failure. In contrastto availability, reliability is defined in terms of a time interval instead ofan instant in time. A highly-reliable system is the one that will most likelycontinue to work without interruption during a relatively long period of time.

If asystem goes down for one millisecond every hour, it has an availability of over99.9999%, but it still highly unreliable.

Safety refersto the situation that when a system temporarily fails to operate correctly,nothing catastrophic happens.

Maintainabilityrefers to how easy a failed system can be repaired. A highly maintainablesystem may also show a high degree of availability, especially if failure canbe detected and repaired automatically.

2. Failure models

a) Crash failure

b) Omission failure

a) Receive omission

b) Send omission

c) Timing failure

d) Response failure

a) Value failure

b) State transition failure

e) Arbitrary failure

Crashfailure occurs when a server prematurely halts, but was working correctly untilit stopped. An important aspect of crash failures is that once the server hashalted, nothing is heard from it anymore.

Omissionfailure occurs when a server fails to responds to a request.

Timingfailures occur when the response lies outside a specified real –time interval.

Responsefailure refers to the situation that the server’s response is simply incorrect.One is value failure, and the other one is the state transition failure.

3. Failure masking by redundancy