软件质量属性中Reliability 和 Availability 的介绍

对软件质量属性中的Reliability和Availability总是有些混淆,查了查资料,发现关于Availability百分比的计算公式很不错,终于明白了,顺便把整页内容保存下来,供大家查阅。

 

由于图片无法显示,特将网址放于此:

http://www.eventhelix.com/RealtimeMantra/FaultHandling/reliability_availability_basics.htm

 

Reliability and Availability Basics

Realtime and embedded systems are now a central part of our lives. Reliable functioning of these systems is of paramount concern to the millions of users that depend on these systems everyday. Unfortunately most embedded systems still fall short of users expectation of reliability.

In this article we will discuss basic techniques for measuring and improving reliability of computer systems. The following topics are discussed:

Failure Characteristics

Hardware Failures

Hardware failures are typically characterized by a bath tub curve. An example curve is shown below. The chance of a hardware failure is high during the initial life of the module. The failure rate during the rated useful life of the product is fairly low. Once the end of the life is reached, failure rate of modules increases again. 

Bath tub curve characterizing hardware failure

Hardware failures during a products life can be attributed to the following causes:

Design failuresThis class of failures take place due to inherent design flaws in the system. In a well designed system this class of failures should make a very small contribution to the total number of failures.
Infant MortalityThis class of failures cause newly manufactured hardware to fail. This type of failures can be attributed to manufacturing problems like poor soldering, leaking capacitor etc. These failures should not be present in systems leaving the factory as these faults will show up in factory system burn in tests.
Random FailuresRandom failures can occur during the entire life of a hardware module. These failures can lead to system failures. Redundancy is provided to recover from this class of failures.
Wear OutOnce a hardware module has reached the end of its useful life, degradation of component characteristics will cause hardware modules to fail. This type of faults can be weeded out by preventive maintenance and routing of hardware.

Software Failures

Software failures can be characterized by keeping track of software defect density in the system. This number can be obtained by keeping track of historical software defect history. Defect density will depend on the following factors:

  • Software process used to develop the design and code (use of peer level design/code reviews, unit testing)
  • Complexity of the software
  • Size of the software
  • Experience of the team developing the software
  • Percentage of code reused from a previous stable project
  • Rigor and depth of testing before product is shipped. 

Defect density is typically measured in number of defects per thousand lines of code (defects/KLOC).

Reliability Parameters

MTBF

Mean Time Between Failures (MTBF), as the name suggests, is the average time between failure of hardware modules. It is the average time a manufacturer estimates before a failure occurs in a hardware module.

MTBF for hardware modules can be obtained from the vendor for off-the-shelf hardware modules. MTBF for inhouse developed hardware modules is calculated by the hardware team developing the board.

MTBF for software can be determined by simply multiplying the defect rate with KLOCs executed per second. 

FITS

FITS is a more intuitive way of representing MTBF. FITS is nothing but the total number of failures of the module in a billion hours (i.e. 1000,000,000 hours).

MTTR

Mean Time To Repair (MTTR), is the time taken to repair a failed hardware module. In an operational system, repair generally means replacing the hardware module. Thus hardware MTTR could be viewed as mean time to replace a failed hardware module. It should be a goal of system designers to allow for a high MTTR value and still achieve the system reliability goals. You can see from the table below that a low MTTR requirement means high operational cost for the system.

Estimating the Hardware MTTR
Where are hardware spares kept?How is site manned? Estimated MTTR
Onsite24 hours a day30 minutes
OnsiteOperator is on call 24 hours a day2 hours
OnsiteRegular working hours on week days as well as weekends and holidays14 hours
OnsiteRegular working hours on week days only3 days
Offsite. Shipped by courier when fault condition is encountered.Operator paged by system when a fault is detected.1 week
Offsite. Maintained in an operator controlled warehouseSystem is remotely located. Operator needs to be flown in to replace the hardware.2 week

MTTR for a software module can be computed as the time taken to reboot after a software fault is detected. Thus software MTTR could be viewed as the mean time to reboot after a software fault has been detected. The goal of system designers should be to keep the software MTTR as low as possible. MTTR for software depends on several factors:

Estimating Software MTTR
Software fault recovery mechanismSoftware reboot mechanism on fault detectionEstimate MTTR  
Software failure is detected by watchdog and/or health messagesProcessor automatically reboots from a ROM resident image. 30 seconds
Software failure is detected by watchdog and/or health messagesProcessor automatically restarts the offending tasks, without needing an operating system reboot30 seconds
Software failure is detected by watchdog and/or health messagesProcessor automatically reboots and the operating system reboots from disk image and restarts applications3 minutes
Software failure is detected by watchdog and/or health messagesProcessor automatically reboots and the operating system and application images have to be download from another machine10 minutes
Software failure detection is not supported.Manually operator reboot is required.30 minutes to 2 weeks (software MTTR is same as hardware MTTR)

Availability

Availability of the module is the percentage of time when system is operational. Availability of a hardware/software module can be obtained by the formula given below.

Availability calculation from MTBF and MTTR

Availability is typically specified in nines notation. For example 3-nines availability corresponds to 99.9% availability. A 5-nines availability corresponds to 99.999% availability.

Downtime

Downtime per year is a more intuitive way of understanding the availability. The table below compares the availability and the corresponding downtime.

AvailabilityDowntime
90% (1-nine)36.5 days/year
99% (2-nines)3.65 days/year
99.9% (3-nines)8.76 hours/year
99.99% (4-nines)52 minutes/year
99.999% (5-nines)5 minutes/year
99.9999% (6-nines)31 seconds/year !

 

 

Karen

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值