A Summary of Jon Louis Bentley's ``Writing Efficient Programs''-CSDN博客

A Summary of Jon Louis Bentley's
``Writing Efficient Programs''

Lawrence A. Crowl
29 August 1990

This summary of Jon Louis Bentley's book ``Writing Efficient Programs''consists of selected text from that book with editorial changes,primarily removing references to specific examples.

Methodology

The following steps are essential parts of a methodologyof building efficient computing systems.

The most important properties of a large system area clean design and implementation, useful documentation,and a maintainable modularity.The first steps in the programming processshould therefore be the design of a solid systemand the clean implementation of that design.
If the overall system performance is not satisfactory,then the programmer should monitor the programto identify where the scarce resources are being consumed.This usually reveals that most of the timeis used by a few percent of the code.
Proper data structure selection and algorithm designare often the key to large reductions in the running timeof the expensive parts of the program.The programmer should therefore tryto revise the data structures and algorithmsin the critical modules of the system.
If the performance of the critical parts is still unsatisfactory,then use the techniques below to recode them.The original code should usually be left in the program as documentation.
If additional speed is still needed,then the programmer should work at lower design levels,including hand-written assembly code, operating system modifications,microcode, and special-purpose hardware design.

Applying the Rules

Once the hot spots of a system have been isolated,there are four critical steps in applying the following rules.

Identify the code to be changed.

We should identify the code to be changed by monitoring the program,and then concentrate on the parts that are taking the majority of the time.

Choose a rule and apply it.

Once we know what code to change,we see if any of our rules can be used to change it.The rules are presented in groupsas they relate to different parts of programs;we should therefore identify whether the expense of our programis going to data structures, loops, logic, procedures or expressions,and then search in the appropriate list for a candidate rule.When we apply a rule, we should make surethat the application preserves the correctness of the program;this is usually done by applying the spirit,if not the actual formulas, of program verification.

Measure the effect of the modification.

Removing a common subexpression in a traveling salesman tour programis typical of many changes we make:it appears that it would increase the program's speed by a factor of twobut in fact it gives less than a three precent improvement.Even if we believe that we understand the effect of a transformationby reasoning alone,it is absolutely necessary to support that analysis with observation;we often find that we are quite mistaken.

Document the resulting program.

The final modified program should include a description of the clean codeand of the modification that was incorporated for efficiency.That description can range from a brief commentto including a copy of the original code enclosed within comment charactersand a thorough discussion of the rule used to modify it(with an appropriate reference to this book, of course).

Fundamental Rules

These fundamental rules underlie any applicationof the more detailed rules below.

Code Simplification:

Most fast programs are simple.Therefore, keep code simple to make it faster.

Problem Simplification:

To increase efficiency of a program, simplify the problem it solves.

Relentless Suspicion:

Question the necessity of each instruction in a time-critical piece of codeand each field in a space-critical data structure.

Early Binding:

Move work forward in time.Specifically, do work now just oncein hope of avoiding doing it many times later.

Space for Time Rules

Introducing redundant informationcan decrease run time at the cost of increasing space used.(The complements of these rules may apply.)

Data Structure Augmentation:

The time required for common operations on data can often be reducedby augmenting the structure with extra informationor by changing the information within the structureso that it can be accessed more easily.Examples are reference counters and hints.

Store Precomputed Results:

The cost of recomputing an expensive functioncan be reduced by computing the function only once and storing the results.Subsequent requests for the function are then handledby table lookup rather than by computing the function.

Caching:

Data that is accessed most often should be the cheapest to access.(Caching can ``backfire'' and increase the run time of a programif locality is not present in the underlying data.)

Lazy Evaluation:

The strategy of never evaluating an item until it is neededavoids evaluations of unnecessary items.

Time for Space Rules

Recomputing information can decrease a program's space requirements,at the cost of increased execution time.(The complements of these rules may apply.)

Packing:

Dense storage representations can decrease storage costsby increasing the time required to store and retrieve data.(Decreasing data space may increase code space.)

Overlaying:

Storing data items that are never simultaneously activein the same memory space reduces data space.Code overlaying reduces code spaceby using the same storage for routines that are never simultaneously needed.Many operating systems provide this service automaticallyin their virtual memory systems.

Interpreters:

The space required to represent a program can often be decreasedby the use of interpretersin which common sequences of operations are represented compactly.(Finite state machines are simple, compact interpreters.)

Loop Rules

The efficiency rules used most frequently deal with loopsfor the simple reason that the hot spots in most programs involve loops.

Code Motion Out of Loops:

Instead of performing a certain computation in each iteration of a loop,it is better to perform it only once, outside the loop.(Code cannot be moved out of loops if it has side effectsthat are desired on every iteration.)

Combining Tests:

An efficient inner loop should contain as few tests as possible,and preferably only one.The programmer should therefore try to simulatesome of the exit conditions of the loop by other exit conditions. Sentinels are a common application of this rule:we place a sentinel at the boundary of a data structureto reduce the cost of testing whether our search has exhausted the structure.

Loop Unrolling:

A large cost of some short loops is in modifying the loop indices.That cost can often be reduced by unrolling the loop.

Transfer-Driven Loop Unrolling:

If a large cost of an inner loop is devoted to trivial assignments,then those assignments can often be removedby repeating the code and changing the use of variables.Specifically, to remove the assignment I:=J,the subsequent code must treat J as though it were I.

Unconditional Branch Removal:

A fast loop should contain no unconditional branches.An unconditional branch at the end of a loop can be removedby ``rotating'' the loop to have a conditional branch at the bottom.

Loop Fusion:

If two nearby loops operate on the same set of elements,then combine their operational partsand use only one set of loop control operations.

Logic Rules

These rules deal with efficiency problemsthat arise when evaluating the program state by making various kinds of tests.They sacrifice clarity and robustness for speed of execution.

Exploit Algebraic Identities:

If the evaluation of a logical expression is costly,replace it by an algebraically equivalent expressionthat is cheaper to evaluate.

Short-Circuiting Monotone Functions:

If we wish to test whethersome monotone nondecreasing function of several variablesis over a certain threshold,then we need not evaluate any of the variablesonce the threshold has been reached.Short-circuit evaluation of boolean expressions is an example of this rule.(A more complex application of this ruleexits from a loop as soon as the purpose of the loop has been accomplished.)

Reordering Tests:

Logical tests should be arrangedsuch that inexpensive and often successful testsprecede expensive and rarely successful tests.

Precompute Logical Functions:

A logical function over a small finite domain can be replacedby a lookup in a table that represents the domain.

Boolean Variable Elimination:

We can remove boolean variables from a programby replacing the assignment to a boolean variable Vby an if-then-else statementin which one branch represents the case that V is trueand the other represents the case that V is false.(This generalizes to case statementsand other logical control structures.)This rule usually decreases time slightly (say, less than 25 percent),but greatly increases code space.More complex applications of this ruleremove boolean variables from data structuresby keeping separate structures for the true and false records.

Procedure Rules

Procedures form good abstraction boundaries,but a given boundary may have an unacceptable execution cost.

Collapsing Procedure Hierarchies:

The run times of the elements of a set of proceduresthat (nonrecursively) call themselvescan often be reducedby rewriting procedures in line and binding the passed variables.

Exploit Common Cases:

Procedures should be organized to handle all cases correctlyand common cases efficiently.

Coroutines:

A multiple-pass algorithm can often be turned into a single-pass algorithmby use of coroutines.

Transformations on Recursive Procedures:

The run time of recursive procedures can often be reducedby applying the following transformations:

Code the recursion explicitly by use of a program stack.
If the final action of a procedure P is to call itself recursively,replace that call by a goto to its first statement;this is usually known as removing tail recursion.That goto can often be transformed into a loop.
If a procedure contains only one recursive call on itself,then it is not necessary to store the return address on the stack.
It is often more efficient to solve small subproblemsby use of an auxiliary procedure,rather than by recurring down to problems of size zero or one.

Parallelism:

A program should be structured to exploitas much of the parallelism as possible in the underlying hardware.

Expression Rules

Be careful that applying these rules does not work against your compiler.

Compile-Time Initialization:

As many variables as possible should be initialized before program execution.

Exploit Algebraic Identities:

If the evaluation of an expression is costly,replace it by an algebraically equivalent expressionthat is cheaper to evaluate.

An algebraic identity yields a fast range testthat compiler writers can use on two's complement architectures.
We can often multiply or divide by powers of two by shifting left or right.
Strength reduction on a loop that iterates through the elements of an arrayreplaces a multiplication by an addition.This technique generalizes to a large class of incremental algorithms.

Common Subexpression Elimination:

If the same expression is evaluated twicewith none of its variables altered between evaluations,then the second evaluation can be avoidedby storing the result of the first and using that in place of the second.(We cannot eliminate the common evaluationof an expression with important side-effects.)

Pairing Computation:

If two similar expressions are frequently evaluated together,then we should make a new procedure that evaluates them as a pair.Examples include sine/cosine and minimum/maximum.

Exploit Word Parallelism:

Use the full word width of the underlying computer architectureto evaluate expensive expressions.The bitwise representation of sets exploits this rule.

System Dependent Optimization

Some optimizations require an understanding of their effecton the underlying system.

The High-Level Language:

What are the relative costs of the various constructs in the language?What are the cheapest possible ways to accomplish common operations(i.e., input and output, or iterating through the elements of a vector)?

The Compiler:

What optimizations does the compiler perform,and what optimization should the programmer perform?Which language constructs encourage the compiler to perform optimizations,and which constructs block optimizations?Current optimizing compilers canmove code out of loops,unroll loops,unroll transfer-driven loops,remove unconditional branches,exploit algebraic identities,short-circuit simple monotone functions,collapse procedure hierarchies,eliminate tail recursion,and eliminate common subexpressions.

The Machine Architecture:

Which efficient operations in the underlying hardware are easy to access,and how can the compiler give access to them?Is there an instruction buffering schemethat makes certain instruction organizations more efficient than others?What paging or caching schemes does the machine support,and how can the program's locality structure be organizedto exploit the schemes?Issues include the following.

Instruction Costs:

Different instructions may have different relative costsfor different implementations of an architecture.

Storing a variable in a register increases a program's efficiency in two ways.The running time is usually decreasedbecause registers are usually realized by a faster technology than main memoryand are architecturally (and usually physically)closer to the arithmetic and logical unit.A variable in a register also requires fewer bits to address;this means that the instruction is faster to accessand that the overall program will require less space.

Instruction Caching:

Many high-speed computers have a cache for the recently accessed instructions;if a tight loop fits in that cache then it will run more quicklythan if the instructions must be fetched repeatedly from main memory.Unrolling a loop such that it exceeds the size of the cachemay reduce performance.

Storage Layout:

The layout of a program's storage in memorycan have a significant effect on the execution speed of the program.Placing records of a length that is a power of two in arrays enables the multiplication of the array index by a left shift(which is often cheaper than multiplication by an arbitrary integer).However, banked memories are often implemented in powers of two,which interferes badly with this approach.

Virtual Memory Caching:

Knowledge of the interaction of the paging structure and the storage layoutcan have a profound effect on efficiency.Traverse row major matrices by row rather than column.