Book reading: Distributed Systems(Principles and Paradigms)----Chapter 8

1.      Fault Tolerance

Beingfault tolerant is strongly related what are called dependable system, thesystem have the following requirements:

a)      Availability

b)      Reliability

c)      Safety

d)      Maintainability

 

Availability

It isdefined as the property that a system is ready to be used immediately. Ingeneral, it refers to the probability that the system is operating correctly atany given moment and is available to perform its functions on behalf of itsusers. In other words, a highly available system is one that will most likelybe working at a given instant in time.

 

Reliability

Refersto the property that a system can run continuously without failure. In contrastto availability, reliability is defined in terms of a time interval instead ofan instant in time. A highly-reliable system is the one that will most likelycontinue to work without interruption during a relatively long period of time.

If asystem goes down for one millisecond every hour, it has an availability of over99.9999%, but it still highly unreliable.

 

Safety refersto the situation that when a system temporarily fails to operate correctly,nothing catastrophic happens.

Maintainabilityrefers to how easy a failed system can be repaired. A highly maintainablesystem may also show a high degree of availability, especially if failure canbe detected and repaired automatically.

 

 

2.      Failure models

a)      Crash failure

b)      Omission failure

a)        Receive omission

b)        Send omission

c)      Timing failure

d)      Response failure

a)        Value failure

b)        State transition failure

e)      Arbitrary failure

 

Crashfailure occurs when a server prematurely halts, but was working correctly untilit stopped. An important aspect of crash failures is that once the server hashalted, nothing is heard from it anymore.

 

Omissionfailure occurs when a server fails to responds to a request.

Timingfailures occur when the response lies outside a specified real –time interval.

Responsefailure refers to the situation that the server’s response is simply incorrect.One is value failure, and the other one is the state transition failure.

 

3.      Failure masking by redundancy

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值