MSDN 关于编译优化的一篇文章 PGO in VS 2005

写得真心不错,内容如下,其中图没办法copy过来,请链接到原文查看。


Profile-Guided Optimization with Microsoft Visual C++ 2005

 

Kang Su Gatlin
Microsoft Corporation

March 2004

Applies to:
   Microsoft® Visual C++® 2005

Summary: Discussesprofile-guided optimization in Microsoft Visual C++ 2005 (formerly known asVisual C++ "Whidbey"), a powerful new feature that will allowapplications to be tuned for actual customer scenarios. Real-world performancegains of over 20% are not uncommon. (13 printed pages)

Contents

Introduction
How Traditional C++ Compilers Work
Whole Program Optimization with Link-Time Code Generation
Profile Guided Optimization
More PGO Tools
PGO and the Visual Studio IDE
Some Tips for PGO Use
Conclusion

Introduction

There areseveral reasons to program in C++, and one of the most important ones is theincredible performance that one can obtain. With the release of Microsoft®Visual C++® 2005, you will not only get great performance from all of thetraditional methods of optimization, but we've added a new technique, whichallows users to get even more out of their application. In this article we showthe user how to use profile-guided optimization (PGO) to achieve thisincredible performance.

How Traditional C++ Compilers Work

To get a fullappreciation of profile-guided optimization, let's start with a discussion ofhow traditional compilers decide which optimizations to do.

Traditionalcompilers perform optimizations based on static source files. That is, theyanalyze the text of the source file, but use no knowledge about potential userinput that is not directly obtainable from the source code. For example,consider the following function in a .cpp file:

1.       int setArray(int a, int *array)

2.       {

3.          for(int x = 0; x < a; ++x)

4.               array[x] = 0;

5.           return x;

6.       }

From this filethe compiler knows nothing about the potential values for "a"(besides they must be of type int), nor does it know anything about the typicalalignment of array.

Thiscompiler/linker model is not particularly bad, but it misses two majoropportunities for optimization: First, it doesn't exploit information that itcould gain from analyzing all source files together; secondly it does not makeany optimizations based upon expected/profiled behavior of the application.With WPO and PGO, we'll do both of these things.

Whole Program Optimization with Link-Time Code Generation

With VisualC++ 7.0 and beyond, including all recent versions of the Itanium® compiler,Visual C++ has supported a mechanism known as link-time code generation (LTCG).I won't spend too much time in this section, as Matt Pietrek wrote a goodarticle on LTCG in his Under the Hood column (May 2002), which is freelyavailable from MSDN. But here are the basics...

LTCG is atechnology that allows the compiler to effectively compile all the source filesas a single translation unit. This is done in a two-step process:

1.       The compilercompiles the source file and emits the result as an intermediate language (IL)into the generated .obj file rather than a standard object file. It's worthnoting that this IL is not the same thing as MSIL (which is used by theMicrosoft® .NET Framework).

2.       When thelinker is invoked with /LTCG, the linker actually invokes the backend tocompile all the code compiled with WPO. All of the IL from the WPO .obj filesare aggregated and a call graph of the complete program can be generated. Fromthis the compiler backend and linker compiles the whole-program and links itinto an executable image.

With WPO thecompiler now has more information about the structure of the entire program.Thus it can be more effective in performing certain types of optimizations. Forexample, when doing traditional compilation/linking, the compiler could notinline a function from source file foo.cpp to source file bar.cpp. Whencompiling bar.cpp, the compiler does not have any info about foo.cpp. With WPOthe compiler now has both bar.cpp and foo.cpp (in IL form) available to it, andcan make optimizations that ordinarily would not be possible (like crosstranslation unit inlining).

How do youcompile an application to make use of LTCG? There are two steps:

1.       First compilethe source code with the Whole Program Optimization compiler switch (/GL):

1.       cl.exe /GL /O2 /c test.cpp foo.cpp

2.       Then link allof the object files in the program with the /LTCG switch.

1.       link /LTCG test.obj foo.obj /out:test.exe

That's it. Youcan now run the generated executable and usually it will be faster. We've seena lot of benefit with LTCG, but it does not come for free. There are increasedmemory requirements at compile/link time. This is because all of the IL must beaddressable, which potentially can be tens or hundreds of compilation units.This can increase the memory requirements needed to build a project, andfurther can increase the total time to build.

Profile Guided Optimization

LTCG canobviously give you some performance benefit, but we have only just begunimproving the performance of your application. Another new technology used inconjunction with LTCG can give an additional performance boost, and in manycases this boost can be very significant. This technology is calledprofile-guided optimization (PGO).

The ideabehind PGO is simple: Generate profiles from running the executable/dll onreal-world inputs, which are then used to assist the compiler in generatingoptimized code for the particular executable. (Note that PGO can be applied tooptimizing unmanaged executables or DLLs, but not to .NET/managed images. Forthe rest of the article, I'll simply refer to the optimized image as anexecutable or application, although the information applies equally to DLLs.)Really that's about all there is to it, but there are details worthinvestigating.

There arethree general phases to creating a PGO application:

1.       Compile intoinstrumented code.

2.       Traininstrumented code.

3.       Re-compileinto optimized code.

We explaineach of these three phases in more depth below. See Figure 1 for a graphicalrepresentation of the process.

Figure 1. ThePGO build process

Compile Instrumented Code

The firstphase is instrumenting the executable. To do this, you first compile the sourcefiles with WPO (/GL). After this take all of the source files from theapplication and link them with the /LTCG:PGINSTRUMENT switch (this canabbreviated as /LTCG:PGI). Note that not all files need be compiled /GL for PGOto work on the application as a whole. PGO will instrument those files compiledwith /GL and won't instrument those that aren't.

Theinstrumentation is done by strategically placing different types of probes inthe code. You can break the types of probes up into two very rough types: thosethat are collecting flow information and those that are collecting valueinformation. I won't go into detail as to how we decide what to use and whereto use it, but we do go through painstaking effort to make efficient use of theprobes. It's also worth noting that the instrumented code may not be asoptimized with /O2 as the same un-instrumented /O2 code. (Although we do asmany optimizations as we can without interfering with the probes we are placingin the instrumented code.) So with the combination of the instrumentation andun-optimized code, expect your application to run slower. (Of course theoptimized code will be optimized without the probes in the code.)

The result oflinking /LTCG:PGI will be an executable or DLL and a PGO database file (.PGD).By default the PGD file takes the name of the generated executable, but theuser can specify the name of the PGD file when linking with the /PGD:filename linkeroption.

Table 1 belowlists the files that will be generated after each step given in the leftcolumn. Note that no files are removed.

Table 1.Generated files after each step

Step

File Generated

At the start of compilation

MyApp.cpp foo.cpp

After compiling with /c /GL

MyApp.obj foo.obj

After linking with /LTCG:PGI /PGD:MyApp.pgd /out:MyApp.inst.exe

MyApp.inst.exe MyApp.pgd

After training the instrumented application with three scenarios.

MyApp1.pgc MyApp2.pgc MyApp3.pgc

After relinking with /LTCG:PGO./PGD:MyApp.pgd

MyApp.opt.exe

Train Instrumented Code

After creatingthe instrumented executable, the next step is to train the executable. You dothis by running the executable with scenarios that reflect how they'll be usedin real life. The output of each scenario run is a PGO count file (.PGC). Thesefiles take on the same name as the .PGD file with the a number appended on theend (starting with "1" and increasing with subsequent runs). A given.PGC file can be removed if the user decides that the particular scenariowasn't useful.

CompileOptimized Code

The last stepis to relink the executable with the profile information collected from runningthe scenarios. This time when you link the application, you use the linkerswitch /LTCG:PGOPTIMIZE (or /LTCG:PGO). This will use the generated profiledata to create an optimized executable. Prior to this optimization the linkerwill automatically invoke pgomgr. pgomgr will, by default, merge all of the.PGC files in the current directory whose name matches the .PGD file into the.PGD file.

·        Updating source code. It's important to note that if the sourcecode of the compiled application changed after the .PGD files were generated,then /LTCG:PGO will revert to simply doing an /LTCG build, and not use any ofthe profile information. So what do you do if you've spent considerablegenerating profile from your instrumented code, and then realize that you needto make a small change to the code, but would like to reuse the profiles thatyou've generated? In this case you can specify /LTCG:PGUPDATE (or /LTCG:PGU).PGUPDATE allows the linker to compile modified source code, while using theoriginal .PGD file

What PGO Can Do

·        We now have an understanding of how to generate PGOapplications, so now the question is, what does PGO do for us? Whatoptimizations does it enable? Here we give a partial list and we expect the setof optimizations we currently do to expand as we find new optimizations andlearn better heuristics.

·        Inlining. As described earlier, WPO givesthe application the ability to find more inlining opportunities. With PGO thisis supplemented with additional information to help make this determination.For example, examine the call graph in Figures 2, 3, and 4 below.

In Figure 2.we see that a, foo, and bat all call bar, which in turn calls baz.

Figure 2. Theoriginal call graph of a program

Figure 3. Themeasured call frequencies, obtained with PGO

Figure 4. Theoptimized call-graph based on the profile obtained in Figure 3

·        Partial Inlining. Next is an optimizationthat is at least partially familiar to most programmers. In many hot functions,there exist paths of code within the function that are not so hot; some aredownright cold. In Figure 5 below, we will inline the purple sections of code,but not the blue.

Figure 5. Acontrol flow graph, where the purple nodes get inlined, while the blue nodedoes not

·        Cold Code Separation. Codeblocks that are not called during profiling, cold code, are moved to the end ofthe set of sections. Thus pages in the working set usually consist ofinstructions that will be executed, according to the profile information.

Figure 6.Control flow graph showing how the optimized layout moves basic blocks togetherthat are used more often, and cold basic blocks further away.

·        Size/Speed Optimization. Functionsthat are called more often can be optimized for speed while those that arecalled less frequently get optimized for size. This tends to be the righttradeoff.

·        Block Layout. In this optimization, weform the hottest paths through a function, and lay them out such that hot pathsare spatially located closer together. This can increase the utilization of theinstruction cache and decrease the working set size and number of pages used.

·        Virtual Call Speculation. Virtualcalls can be expensive due to the jumping through the vtable to invoke method.With PGO, the compiler can speculate at the call site of a virtual call andinline the method of the speculated object into the virtual call site; the datato make this decision is gathered with the instrumented application. In theoptimized code, the guard around the inlined function is a check to ensure thatthe type of the speculated object matches the derived object.

The followingpseudocode shows a base class, two derived classes, and a function invoking avirtual function.

1.       class Base{

2.       ...

3.       virtual void call();

4.       }

5.        

6.       class Foo:Base{

7.       ...

8.       void call();

9.       }

10.     

11.    class Bar:Base{

12.    ...

13.    void call();

14.    }

15.     

16.    // This is the Func function before PGO has optimized it. 

17.    void Func(Base *A){

18.     ...

19.     while(true) {

20.     ...

21.     A->call();

22.     ...

23.     }

24.    }

25.     

The code belowshows the result of optimizing the above code, given that the dynamic type of"A" is almost always Foo.

26.    // This is the Func function call after PGO has optimized it.

27.    void Func(Base *A){

28.     ...

29.     while(true) {

30.     ...

31.     if(type(A) == Foo:Base) { 

32.     // inline of A->call();

33.     }

34.     else

35.     A->call();

36.     ...

37.     }

38.    }

DLL Use

A short noteon PGO and DLLs: You train/profile the DLLs by running the executable, whichlinks the DLL on a set of representative scenarios. You can further use differentexecutables for different scenarios and merge all of the scenarios into asingle .PGD file. It is important to know that PGO technology currently doesnot support static libraries.

General Effectiveness

The currentimplementation of PGO has proven to be extremely effective in gettingreal-world performance. For example, we've seen 30%+ improvement on largereal-world applications such as Microsoft® SQL Server, and 4-15% gains on theSPEC benchmarks (depending on the architecture). See Figure 7 for performancespeedup of PGO over the best static compilation setting (Link Time CodeGeneration) using the SPEC benchmarks.

Figure 7. SPECperformance improvement with PGO over that of Link Time Code Generation, forall three platforms

More PGO Tools

PGO support inMicrosoft® Visual C++ comes with a couple of tools to help the user doprecisely what they need to do. This section describes each of these includedtools.

·        pgomgr. pgomgr is a tool to dopost-processing on .pgd files generated by PGO. (Note that for PSDK Itaniumcompiler users, pgomgr in Whidbey replaces pgmerge and part of pgopt from theold PSDK compiler. These two tools are no longer available, as theirfunctionality has been subsumed.) The .PGC files need to be merged into the.PGD to be used for the optimize phase of PGO. The pgomgr tool does thismerging. The syntax for this statement is:

1.       pgomgr [options] [Profile-Count paths] <Profile-Database>

By default, if/LTCG:PGI is run in a directory with .PGC files, those .PGC files will bemerged if they match the .PGD file for the program being linked (so you needn'talways use pgomgr). The list of options is given here:

·        /?Gets help

·        /helpSame as /?

·        /clear Remove all merge data from the specified pgd

·        /detail Display verbose program statistics

·        /merge[:n] Merge the given PGC file(s), with optionalinteger weight

·        /summary Display program statistics

·        /unique Display decorated function names

·        pgosweep. The pgosweep tools interruptsa running program that was built with PGO instrumentation, writes the currentcounts to a new .PGC file, and then clears the counts from the runtime datastructures. This program has two main intended uses: First, if PGO is being usedon code that never ends. (For instance, things like the OS kernel.) Second, toobtain precise profile information about a certain part of the program. Forexample, you may not want to profile the night-time scenarios of anapplication, so you use pogosweep in those situation and then delete the .PGCfiles from that part of the scenario.

The usage forpogosweep is:

0.          pogosweep <instrumented image> <.PGC file to be created>

PGO and the Visual Studio IDE

Command-linetools are great to use, but if you're working within the Microsoft® VisualStudio® Integrated Development Environment (IDE), then you may want to fullyleverage functionality, such as PGO, from within the Visual Studio IDE. VisualStudio 2005 (formerly known as Visual Studio "Whidbey") offers supportfor PGO through a set of menu items, which allow the programmer to do aninstrumented build, run scenarios, do an optimized build, or do an updatebuild. The instrumented build, optimized build, and update build all produce an.exe or .dll file as the output. The optimized builds and the update buildrequire a .pgd file to be available. This .pgd file can be generated by runningthe Run Profiling Scenario menu item.

Figure 8. Ascreen shot of PGO support in the Visual Studio 2005 IDE

Some Tips for PGO Use

Here are somebasic tips that can improve your PGO experience.

1.       The scenariosused to generate the profile data should resemble the real-world scenarios theapplication will see when deployed. The scenarios are NOT and attempt at doingcode coverage.

2.       Usingscenarios to train with that are not representative of real-world use canresult in code that performs worse than if PGO was not used.

3.       Name theoptimized code something different from the instrumented code, for example,app.opt.exe and app.inst.exe. This way you can rerun the instrumentedapplication to supplement your set of scenario profiles without rerunningeverything again.

4.       To tweakresults, use the /clear option of pgomgr to clear out a .PGD files.

5.       If you havetwo scenarios that run for different amounts of time, but would like them to beweighted equally, you can use the weight switch (/merge:weight in pgomgr) on.PGC files to adjust them.

6.       You can usethe speed switch to change the speed/size thresholds.

7.       Use the inlinethreshold switch with great caution. The values from 0-100 aren't linear.

Conclusion

In closing,the basic steps for generating a PGO application are:

1.       Compile withWhole Program Optimization (/GL) on the files that you would like PGO to workon. The user can selectively choose files that don't get optimized with wholeprogram optimization and PGO, by not compiling them with /GL.

2.       Link theapplication with Link Time Code Generation using /LTCG:PGINSTRUMENT. Thisgenerates an instrumented executable.

3.       Train theapplication with scenarios generating .pgc files.

4.       Re-compile theapplication (although you're invoking the linker) with /LTCG:PGOPTIMIZE. Thiswill optimize the executable based on the profile data.

The end resultis a program or library that is optimized for the real-world situations thatyour program or library will run under.

About the Author

Kang Su Gatlin is a ProgramManager at Microsoft in the Visual C++ group. He received his PhD from UC SanDiego. His focus is on high-performance computation andoptimization—essentially he enjoys making code run fast.



转自:http://msdn.microsoft.com/zh-cn/aa289170(VS.90).aspx#profileguidedoptimization_topic1

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值