1. Fault Tolerance
Beingfault tolerant is strongly related what are called dependable system, thesystem have the following requirements:
a) Availability
b) Reliability
c) Safety
d) Maintainability
Availability
It isdefined as the property that a system is ready to be used immediately. Ingeneral, it refers to the probability that the system is operating correctly atany given moment and is available to perform its functions on behalf of itsusers. In other words, a highly available system is one that will most likelybe working at a given instant in time.
Reliability
Refersto the property that a system can run continuously without failure. In contrastto availability, reliability is defined in terms of a time interval instead ofan instant in time. A highly-reliable system is the one that will most likelycontinue to work without interruption during a relatively long period of time.
If asystem goes down for one millisecond every hour, it has an availability of over99.9999%, but it still highly unreliable.
Safety refersto the situation that when a system temporarily fails to operate correctly,nothing catastrophic happens.
Maintainabilityrefers to how easy a failed system can be repaired. A highly maintainablesystem may also show a high degree of availability, especially if failure canbe detected and repaired automatically.
2. Failure models
a) Crash failure
b) Omission failure
a) Receive omission
b) Send omission
c) Timing failure
d) Response failure
a) Value failure
b) State transition failure
e) Arbitrary failure
Crashfailure occurs when a server prematurely halts, but was working correctly untilit stopped. An important aspect of crash failures is that once the server hashalted, nothing is heard from it anymore.
Omissionfailure occurs when a server fails to responds to a request.
Timingfailures occur when the response lies outside a specified real –time interval.
Responsefailure refers to the situation that the server’s response is simply incorrect.One is value failure, and the other one is the state transition failure.
3. Failure masking by redundancy