关闭

单元测试覆盖分析指标-翻译中

906人阅读 评论(0) 收藏 举报
分类:

Contents

Introduction

Code coverage analysis is the process of:

代码覆盖分析是如下过程

  • Finding areas of a program not exercised by a set of test cases,
  • 找出一系列测试用例没有覆盖到的代码区域
  • Creating additional test cases to increase coverage, and
  • 创建额外的测试用例增加覆盖率
  • Determining a quantitative measure of code coverage, which is an indirect measure of quality.
  • 确定代码覆盖率的定量度量,这是一个间接的质量度量

An optional aspect of code coverage analysis is:

一个代码覆盖分析可选的方面:

  • Identifying redundant test cases that do not increase coverage.
  • 识别多余的测试用例,这部分用例没有增加覆盖率。

A code coverage analyzer automates this process.

代码覆盖率分时自动化过程。

You use coverage analysis to assure quality of yourset of tests, not the quality of the actual product. You do not generally use a coverage analyzer when running your set of tests through your release candidate. Coverage analysis requires access to test program source code and often requires recompiling it with a special command.

你用覆盖率分析去确认你的测试质量,但并不是实际的产品质量。 通常在运行一组测试来测试待发布版本时你不用覆盖率分析。 覆盖分析通常要访问程序源码,并经常需要重新编译(测试注入)。

This paper discusses the details you should consider when planning to add coverage analysis to your test plan. Coverage analysis hascertain strengths and weaknesses. You must choose from a range of measurement methods. You should establish a minimum percentage of coverage, to determinewhen to stop analyzing coverage. Coverage analysis is one of many testingtechniques; you should not rely on it alone.

这篇文章讨论的工作细节,你应该考虑在做计划时把覆盖分析加入到测试计划中,覆盖分析的优势和弱点。你必须从度量方法范围,应该设定最低覆盖率来就决定什么时候停止覆盖分析。 覆盖分析式很多测试技术中的一种,你不该单独依赖他来保证质量。


Code coverage analysis is sometimes called testcoverage analysis. The two terms are synonymous. The academic world more often uses the term "test coverage" while practitioners more oftenuse "code coverage". Likewise, a coverage analyzer is sometimescalled a coverage monitor. I prefer the practitioner terms.

代码覆盖分析有时被称为测试覆盖分析。这两个词是同一意思。学术上经常成为测试覆盖,实践者经常用代码覆盖,同样,覆盖分析有时被称为覆盖监控,我比较喜欢实践者的术语。

Structural Testing and FunctionalTesting

结构测试和功能测试

Code coverage analysis is a structural testing technique (AKA glass box testing and white box testing). Structural testing compares test program behavior against the apparent intention of the source code. This contrasts with functional testing (AKA black-box testing), whichcompares test program behavior against a requirements specification. Structuraltesting examines how the program works, taking into account possible pitfalls inthe structure and logic. Functional testing examines what the program accomplishes, without regard to how it works internally.

代码覆盖分析是一个结构测试技术(白盒测试),结构测试与测试程序行为更倾向于源代码。  功能测试比较程序测试行为是更倾向去需求,结构测试是检测程序怎么工作,

考虑在结构和逻辑中可能出现的陷阱。 功能测试检测程序完成了什么功能,不关心他内部是怎么工作。


Structural testing is also called path testing since you choose test cases that cause paths to be taken through the structure of the program. Do not confuse path testing with the path coverage metric, explained later.

结构测试也被叫做路径测试,因为你选择测试用例来执行了程序结构的路径。不要混淆路径测试的路径覆盖度量,后面解释。

At first glance, structural testing seems unsafe.Structural testing cannot find errors of omission. However, requirementsspecifications sometimes do not exist, and are rarely complete. This is especially true near the end of the product development time line when the requirements specification is updated less frequently and the product itselfbegins to take over the role of the specification. The difference betweenfunctional and structural testing blurs near release time.

首先,结构测试看起来不安全,结构测试不是用来找到遗漏的错误。而是需求说明书有时没有或者不完整。尤其在产品开发快结束阶段,这时需求更新很少 。在临近发版时结构测试和功能测试的差别变得更加模糊。

The Premise

The basic assumptions behind coverage analysis tell us about the strengths and limitations of this testing technique. Some fundamentalassumptions are listed below.

覆盖率分析告诉我们背后的基本假设:这个测试技术的优点和局限性。下面列出了一些基本假设。

  • Bugs relate to control flow and you can expose Bugs by varying the control flow [Beizer1990 p.60]. For example, a programmer wrote "if (c)" rather than "if (!c)".
  • 与控制流关联的bug 和你通过不同控制流揭露的bug。 例如程序员写了if(c),实际应该是if(!c) .
  • You can look for failures without knowing what failures might occur and all tests arereliable, in that successful test runs imply program correctness [Morell1990]. The tester understands what a correct version of the program would do and can identify differences from the correct behavior.

  • Other assumptions include achievable specifications, no errors of omission, and no unreachable code.

Clearly, these assumptions do not always hold.Coverage analysis exposes some plausible bugs but does not come close toexposing all classes of bugs. Coverage analysis provides more benefit whenapplied to an application that makes a lot of decisions rather thandata-centric applications, such as a database application.

Basic Metrics

A large variety of coverage metrics exist. Thissection contains a summary of some fundamental metrics and their strengths,weaknesses and issues.

The U.S. Department of Transportation Federal AviationAdministration (FAA) has formal requirements for structural coverage in thecertification of safety-critical airborne systems [DO-178B]. Few other organizations have suchrequirements, so the FAA is influential in the definitions of these metrics.

StatementCoverage

This metric reports whether each executable statementis encountered. Declarative statements that generate executable code are consideredexecutable statements. Control-flow statements, such as if, for, and switch are covered if the expressioncontrolling the flow is covered as well as all the contained statements.Implicit statements, such as an omitted return, are not subject to statementcoverage.

Also known as: line coverage, segment coverage [Ntafos1988], C1 [Beizer1990 p.75] and basic block coverage. Basicblock coverage is the same as statement coverage except the unit of codemeasured is each sequence of non-branching statements.

I highly discourage using the non-descriptive name C1.People sometimes incorrectly use the name C1 to identifydecision coverage. Therefore this term has becomeambiguous.

The chief advantage of this metric is that it can beapplied directly to object code and does not require processing source code.Performance profilers commonly implement this metric.

The chief disadvantage of statement coverage is thatit is insensitive to some control structures. For example, consider thefollowing C/C++ code fragment:

int* p = NULL;

if (condition)

    p =&variable;

*p = 123;

Without a test case that causes condition to evaluate false, statementcoverage rates this code fully covered. In fact, if condition ever evaluates false, this codefails. This is the most serious shortcoming of statement coverage.If-statements are very common.

Statement coverage does not report whether loops reachtheir termination condition - only whether the loop body was executed. With C,C++, and Java, this limitation affects loops that contain break statements.

Since do-while loops always execute at least once, statementcoverage considers them the same rank as non-branching statements.

Statement coverage is completely insensitive to thelogical operators (|| and &&).

Statement coverage cannot distinguish consecutive switch labels.

Test cases generally correlate more to decisions thanto statements. You probably would not have 10 separate test cases for asequence of 10 non-branching statements; you would have only one test case. Forexample, consider an if-else statement containing one statement in thethen-clause and 99 statements in the else-clause. After exercising one of thetwo possible paths, statement coverage gives extreme results: either 1% or 99%coverage. Basic block coverage eliminates this problem.

One argument in favor of statement coverage over othermetrics is that bugs are evenly distributed through code; therefore thepercentage of executable statements covered reflects the percentage of faultsdiscovered. However, one of our fundamental assumptions is that faults arerelated to control flow, not computations. Additionally, we could reasonablyexpect that programmers strive for a relatively constant ratio of branches tostatements.

In summary, this metric is affected more bycomputational statements than by decisions.

DecisionCoverage

This metric reports whether Boolean expressions testedin control structures (such as the if-statement and while-statement) evaluated to bothtrue and false. The entire Boolean expression is considered one true-or-falsepredicate regardless of whether it contains logical-and or logical-oroperators. Additionally, this metric includes coverage of switch-statement cases, exceptionhandlers, and all points of entry and exit. Constant expressions controllingthe flow are ignored.

Also known as: branch coverage, all-edges coverage [Roper1994 p.58], C2 [Beizer1990 p.75], decision-decision-path testing[Roper1994 p.39]. I discourage using thenon-descriptive name C2 because of the confusion with the term C1.

The FAA makes a distinction between branch coverageand decision coverage, with branch coverage weaker than decision coverage [SVTAS2007]. The FAA definition of a decision is,in part, "A Boolean expression composed of conditions and zero or moreBoolean operators." So the FAA definition of decision coverage requiresall Boolean expressions to evaluate to both true and false, even those that donot affect control flow. There is no precise definition of "Booleanexpression." Some languages, especially C, allow mixing integer andBoolean expressions and do not require Boolean variables be declared asBoolean. The FAA suggests using context to identify Boolean expressions, includingwhether expressions are used as operands to Boolean operators or tested tocontrol flow. The suggested definition of "Boolean operator" is abuilt-in (not user-defined) operator with operands and result of Boolean type.The logical-not operator is exempted due to its simplicity. The C conditionaloperator (?:) is considered a Boolean operator if all three operands are Booleanexpressions.

This metric has the advantage of simplicity withoutthe problems ofstatement coverage.

A disadvantage is that this metric ignores brancheswithin Boolean expressions which occur due to short-circuit operators. Forexample, consider the following C/C++/Java code fragment:

if (condition1 && (condition2 || function1()))

    statement1;

else

    statement2;

This metric could consider the control structurecompletely exercised without a call to function1. The test expression is truewhen condition1 is true and condition2 is true, and the test expression is false when condition1 is false.In this instance, the short-circuit operators preclude a call to function1.

The FAA suggests that for the purposes of measuringdecision coverage, the operands of short-circuit operators (including the Cconditional operator) be interpreted as decisions [SVTAS2007].

ConditionCoverage

Condition coverage reports the true or false outcomeof each condition. A condition is an operand of a logical operator that doesnot contain logical operators. Condition coverage measures the conditionsindependently of each other.

This metric is similar to decision coverage but has better sensitivity tothe control flow.

However, full condition coverage does not guaranteefull decision coverage. For example, consider thefollowing C++/Java fragment.

bool f(bool e) { return false; }

bool a[2] = { false, false };

if (f(a && b)) ...

if (a[int(a && b)]) ...

if ((a && b) ? false : false) ...

All three of the if-statements above branch falseregardless of the values of a and b. However if you exercise this code with a and b having all possible combinationsof values, condition coverage reports full coverage.

Multiple Condition Coverage

Multiple condition coverage reports whether everypossible combination of conditions occurs. The test cases required for fullmultiple condition coverage of a decision are given by the logical operatortruth table for the decision.

For languages with short circuit operators such as C,C++, and Java, an advantage of multiple condition coverage is that it requiresvery thorough testing. For these languages, multiple condition coverage is verysimilar tocondition coverage.

A disadvantage of this metric is that it can betedious to determine the minimum set of test cases required, especially forvery complex Boolean expressions. An additional disadvantage of this metric isthat the number of test cases required could vary substantially among conditionsthat have similar complexity. For example, consider the following twoC/C++/Java conditions.

a && b && (c || (d && e))

((a || b) && (c || d)) && e

To achieve full multiple condition coverage, the firstcondition requires 6 test cases while the second requires 11. Both conditionshave the same number of operands and operators. The test cases are listedbelow.

   a && b&& (c || (d && e))

1. F    -    -     -    -

2. T    F    -     -    -

3. T    T    F     F    -

4. T    T    F     T    F

5. T    T     F    T    T

6. T    T    T     -    -

 

   ((a || b) && (c || d)) && e

 1.  F    F      -   -      -

 2.  F    T      F   F      - 

 3.  F    T      F   T      F

 4.  F    T      F   T      T

 5.  F    T      T   -      F

 6.  F    T      T   -      T

 7.  T    -      F   F      -

 8.  T    -      F   T      F

 9.  T    -      F   T      T

10.   T    -     T    -      F

11.   T    -     T    -      T

As with condition coverage, multiple condition coveragedoes not includedecision coverage.

For languages without short circuit operators such asVisual Basic and Pascal, multiple condition coverage is effectivelypath coverage (described below) for logicalexpressions, with the same advantages and disadvantages. Consider the followingVisual Basic code fragment.

If a And b Then

...

Multiple condition coverage requires four test cases,for each of the combinations of a and b both true and false. As withpath coverage each additional logical operatordoubles the number of test cases required.

Condition/Decision Coverage

Condition/Decision Coverage is a hybrid metriccomposed by the union ofcondition coverage anddecision coverage.

It has the advantage of simplicity but without theshortcomings of its component metrics.

BullseyeCoverage measures condition/decisioncoverage.

Modified Condition/Decision Coverage

The formal definition of modified condition/decisioncoverage is:

Every point of entry and exit in the program has beeninvoked at least once, every condition in a decision has taken all possibleoutcomes at least once, every decision in the program has taken all possibleoutcomes at least once, and each condition in a decision has been shown toindependently affect that decisions outcome. A condition is shown toindependently affect a decisions outcome by varying just that condition whileholding fixed all other possible conditions [DO-178B].

Also known as MC/DC and MCDC. This metric is strongerthan condition/decision coverage, requiring more test cases for full coverage.

This metric is specified for safety critical aviationsoftware by RCTA/DO-178B and has been the subject of much study, debate andclarification for many years. Two difficult issues with MCDC are:

  • short circuit operators
  • multiple occurrences of a condition

There are two competing ideas of how to handleshort-circuit operators. One idea is to relax the requirement that conditionsbe held constant if those conditions are not evaluated due to a short-circuitoperator [Chilenski1994]. The other is to consider thecondition operands of short-circuit operators as separate decisions [DO-248B].

A condition may occur more than once in a decision. Inthe expression "A or (not A and B)", the conditions "A" and"not A" are coupled - they cannot be varied independently as requiredby the definition of MCDC. One approach to this dilemma, called Unique CauseMCDC, is to interpret the term "condition" to mean "uncoupledcondition." Another approach, called Masking MCDC, is to permit more thanone condition to vary at once, using an analysis of the logic of the decisionto ensure that only the condition of interest influences the outcome.

Path Coverage

This metric reports whether each of the possible pathsin each function have been followed. A path is a unique sequence of branchesfrom the function entry to the exit.

Also known as predicate coverage. Predicate coverageviews paths as possible combinations of logical conditions [Beizer1990 p.98].

Since loops introduce an unbounded number of paths,this metric considers only a limited number of looping possibilities. A largenumber of variations of this metric exist to cope with loops. Boundary-interiorpath testing considers two possibilities for loops: zero repetitions and morethan zero repetitions [Ntafos1988]. For do-while loops, the twopossibilities are one iteration and more than one iteration.

Path coverage has the advantage of requiring verythorough testing. Path coverage has two severe disadvantages. The first is thatthe number of paths is exponential to the number of branches. For example, afunction containing 10 if-statements has 1024 paths to test. Adding just onemore if-statement doubles the count to 2048. The second disadvantage is that manypaths are impossible to exercise due to relationships of data. For example,consider the following C/C++ code fragment:

if (success)

    statement1;

statement2;

if (success)

    statement3;

Path coverage considers this fragment to contain 4paths. In fact, only two are feasible: success=false and success=true.

Researchers have invented many variations of pathcoverage to deal with the large number of paths. For example, n-length sub-pathcoverage reports whether you exercised each path of length n branches. Basispath testing selects paths that achieve decision coverage, with each pathcontaining at least one decision outcome differing from the other paths [Roper1994 p.48]. Others variations include linear code sequence and jump (LCSAJ) coverageanddata flow coverage.

Other Metrics

Here is a description of some variations of thefundamental metrics and some less commonly use metrics.

FunctionCoverage

This metric reports whether you invoked each functionor procedure. It is useful during preliminary testing to assure at least somecoverage in all areas of the software. Broad, shallow testing finds grossdeficiencies in a test suite quickly.

BullseyeCoverage measures function coverage.

Call Coverage

This metric reports whether you executed each functioncall. The hypothesis is that bugs commonly occur in interfaces between modules.

Also known as call pair coverage.

LinearCode Sequence and Jump (LCSAJ) Coverage

This variation of path coverage considers only sub-paths that caneasily be represented in the program source code, without requiring a flowgraph [Woodward1980]. An LCSAJ is a sequence of sourcecode lines executed in sequence. This "linear" sequence can containdecisions as long as the control flow actually continues from one line to thenext at run-time. Sub-paths are constructed by concatenating LCSAJs.Researchers refer to the coverage ratio of paths of length n LCSAJs as the testeffectiveness ratio (TER) n+2.

The advantage of this metric is that it is morethorough than decision coverage yet avoids the exponentialdifficulty ofpath coverage. The disadvantage is that it doesnot avoid infeasible paths.

Data FlowCoverage

This variation of path coverage considers only the sub-paths fromvariable assignments to subsequent references of the variables.

The advantage of this metric is the paths reportedhave direct relevance to the way the program handles data. One disadvantage isthat this metric does not includedecision coverage. Another disadvantage iscomplexity. Researchers have proposed numerous variations, all of whichincrease the complexity of this metric. For example, variations distinguishbetween the use of a variable in a computation versus a use in a decision, andbetween local and global variables. As with data flow analysis for codeoptimization, pointers also present problems.

ObjectCode Branch Coverage

This metric reports whether each machine languageconditional branch instruction both took the branch and fell through.

This metric gives results that depend on the compilerrather than on the program structure since compiler code generation and optimizationtechniques can create object code that bears little similarity to the originalsource code structure.

Since branches disrupt the instruction pipeline,compilers sometimes avoid generating a branch and instead generate anequivalent sequence of non-branching instructions. Compilers often expand thebody of a function inline to save the cost of a function call. If suchfunctions contain branches, the number of machine language branches increasesdramatically relative to the original source code.

You are better off testing the original source codesince it relates to program requirements better than the object code.

Loop Coverage

This metric reports whether you executed each loopbody zero times, exactly once, and more than once (consecutively). For do-whileloops, loop coverage reports whether you executed the body exactly once, andmore than once.

The valuable aspect of this metric is determiningwhether while-loops and for-loops execute more than once, information notreported by other metrics.

As far as I know, only GCTimplements this metric.

Race Coverage

This metric reports whether multiple threads executethe same code at the same time. It helps detect failure to synchronize accessto resources. It is useful for testing multi-threaded programs such as in anoperating system.

As far as I know, only GCTimplements this metric.

RelationalOperator Coverage

This metric reports whether boundary situations occurwith relational operators (<, <=, >, >=). The hypothesis is thatboundary test cases find off-by-one mistakes and uses of the wrong relationaloperators such as < instead of <=. For example, consider the followingC/C++ code fragment:

if (a < b)

    statement;

Relational operator coverage reports whether thesituation a==b occurs. If a==b occurs and the program behaves correctly, youcan assume the relational operator is not suppose to be <=.

As far as I know, only GCTimplements this metric.

WeakMutation Coverage

This metric is similar to relational operator coverage but much moregeneral [Howden1982]. It reports whether test cases occurwhich would expose the use of wrong operators and also wrong operands. It worksby reporting coverage of conditions derived by substituting (mutating) theprogram's expressions with alternate operators, such as "-"substituted for "+", and with alternate variables substituted.

This metric interests the academic world mainly.Caveats are many; programs must meet special requirements to enablemeasurement.

As far as I know, only GCTimplements this metric.

TableCoverage

This metric indicates whether each entry in aparticular array has been referenced. This is useful for programs that arecontrolled by a finite state machine.

Comparing Metrics

You can compare relative strengths when a strongermetric includes a weaker metric.

Academia says the stronger metric subsumes theweaker metric.

Coverage metrics cannot be compared quantitatively.

Coverage Goal for Release

Each project must choose a minimum percent coveragefor release criteria based on available testing resources and the importance ofpreventing post-release failures. Clearly, safety-critical software should havea high goal. You might set a higher coverage goal for unit testing than forsystem testing since a failure in lower-level code may affect multiplehigh-level callers.

Using statement coverage, decision coverage, or condition/decision coverage you generally want toattain 80%-90% coverage or more before releasing. Some people feel that settingany goal less than 100% coverage does not assure quality. However, you expend alot of effort attaining coverage approaching 100%. The same effort might findmore bugs in a different testing activity, such as formal technical review.Avoid setting a goal lower than 80%.

Intermediate Coverage Goals

Choosing good intermediate coverage goals can greatlyincrease testing productivity.

Your highest level of testing productivity occurs whenyou find the most failures with the least effort. Effort is measured by thetime required to create test cases, add them to your test suite and run them.It follows that you should use a coverage analysis strategy that increasescoverage as fast as possible. This gives you the greatest probability offinding failures sooner rather than later. Figure 1 illustrates the coveragerates for high and low productivity. Figure 2 shows the corresponding failurediscovery rates.

One strategy that usually increases coverage quicklyis to first attain some coverage throughout the entire test program beforestriving for high coverage in any particular area. By briefly visiting each ofthe test program features, you are likely to find obvious or gross failuresearly. For example, suppose your application prints several types of documents,and a bug exists which completely prevents printing one (and only one) of thedocument types. If you first try printing one document of each type, youprobably find this bug sooner than if you thoroughly test each document typeone at a time by printing many documents of that type before moving on to thenext type. The idea is to first look for failures that are easily found byminimal testing.

The sequence of coverage goals listed belowillustrates a possible implementation of this strategy.

  1. Invoke at least one function in 90% of the source files (or classes).
  2. Invoke 90% of the functions.
  3. Attain 90% condition/decision coverage in each function.
  4. Attain 100% condition/decision coverage.

Notice we do not require 100% coverage in any of theinitial goals. This allows you to defer testing the most difficult areas. Thisis crucial to maintaining high testing productivity; achieve maximum resultswith minimum effort.

Avoid using a weaker metric for an intermediate goalcombined with a stronger metric for your release goal. Effectively, this allowsthe weaknesses in the weaker metric to decide which test cases to defer.Instead, use the stronger metric for all goals and allow the difficulty of theindividual test cases help you decide whether to defer them.

Summary

Coverage analysis is a structural testing techniquethat helps eliminate gaps in a test suite. It helps most in the absence of adetailed, up-to-date requirements specification.Condition/decision coverage is the best general-purposemetric for C, C++, and Java. Setting an intermediate goal of 100% coverage (ofany type) can impede testing productivity. Before releasing, strive for 80%-90%or more coverage of statements, branches, or conditions.

References

Beizer1990 Beizer, Boris, "Software Testing Techniques", 2nd edition, New York: Van NostrandReinhold, 1990

Chilenski1994 John Joseph Chilenski and Steven P. Miller, "Applicability ofModified Condition/Decision Coverage to Software Testing", SoftwareEngineering Journal, September 1994, Vol. 9, No. 5, pp.193-200.

DO-178B, "Software Considerations in Airborne Systems and EquipmentCertification",RCTA,December 1992, pp.31, 74.

DO-278B, "Final Annual Report For Clarification Of DO-178B SoftwareConsiderations In Airborne Systems And Equipment Certification", October2001

SVTAS2007 Software Verification Tools Assessment Study, FAA, June 2007

Howden1982 "Weak Mutation Testing and Completeness of Test Sets",IEEETrans. Software Eng., Vol.SE-8, No.4, July 1982, pp.371-379.

McCabe1976 McCabe, Tom, "A Software ComplexityMeasure",IEEE Trans. Software Eng., Vol.2, No.6, December 1976,pp.308-320.

Morell1990 Morell, Larry, "A Theory of Fault-Based Testing",IEEETrans. Software Eng., Vol.16, No.8, August 1990, pp.844-857.

Ntafos1988 Ntafos, Simeon,"A Comparison of Some Structural TestingStrategies",IEEE Trans. Software Eng., Vol.14, No.6, June 1988,pp.868-874.

Roper1994 Roper, Marc, "Software Testing", London, McGraw-Hill Book Company,1994

Woodward1980 Woodward, M.R., Hedley, D. and Hennell, M.A., "Experience with PathAnalysis and Testing of Programs", IEEE Transactions on SoftwareEngineering, Vol. SE-6, No. 3, pp. 278-286, May 1980. 

0
0

查看评论
* 以上用户言论只代表其个人观点,不代表CSDN网站的观点或立场
    个人资料
    • 访问:23504次
    • 积分:530
    • 等级:
    • 排名:千里之外
    • 原创:27篇
    • 转载:2篇
    • 译文:4篇
    • 评论:0条