fdpr Command 参考

fdpr Command

Purpose

       A performance tuning utility for improving execution time and real memory utilization of user-level post-link application programs.

Syntax

Most Common Usage:

       fdpr -p ProgramFile -x WorkloadCommand

Detailed Usage:

       fdpr -p ProgramFile [ -M Segnum ] [ -fd Fdesc ] [ -o OutputFile ] [ -armember ArchiveMemberList ] [ OptimizationFlags ] [ -map ] [ -disasm ] [
       -disasm_data] [ -disasm_bss] [ -profcount ] [ -quiet] [ -v ] [ -1 | -2 | -3 | -12 | -23 | -123] [ -x WorkloadCommand ]

Optimization Flags

       [ -tb ] [ -pc ] [ -pp ] [ -O ][ -O2 ] [ -O3 ] [ -O4 ] [ -selective_inline] [ -sid_fac percent] [ -inline_small_funcs size] [ -inline_hot_funcs
       percent] [ -hco_resched] [ -killed_regs ] [ -lr_opt] [ -align bytes] [ -RD ] [ -dpnf factor] [ -dpht threshold] [ -build_dcg] [ -tocload ] [-
       ptrgl_opt ] [ -no_ptrgl_r11] [ -dcbt_opt ] [ -ignore_info] [ -dead_code_removal] [ -bt_csect_anchor_removal] [ -strip] [-analyse_asm_csects] [-
       extra_safe_analysis] [-inline] [-reduce_toc removal_factor]

Description

       The fdpr command (Feedback Directed Program Restructuring) is a performance-tuning utility that may help improve the execution time and the real
       memory utilization of user-level application programs. The fdpr program optimizes the executable image of a program by collecting information on the
       behavior of the program while the program is used for some typical workload, and then creating a new version of the program that is optimized for
       that workload. The new program generated by fdpr typically runs faster and uses less real memory. Attention: The fdpr command applies advanced
       optimization techniques to a program which may result in programs that do not behave as expected; programs which are optimized using this tool
       should be used with due caution and should be rigorously retested with, at a minimum, the same test suite used to test the original program in order
       to verify expected functionality. The optimized program is not supported.

       The fdpr command builds an optimized executable program in 3 distinct phases:
       *    Phase 1 (-1 flag): Creates an instrumented executable program and an empty template profile file.
       *    Phase 2 (-2 flag): Runs the instrumented program and updates the profile data.
       *    Phase 3 (-3 flag): Generates the optimized executable program file.

       These phases can be run separately or in partial or full combination, but must be run in order (i.e., -1 then -2 then -3 or -12 then -3). The
       default is to run all three phases. Note: The instrumented executable, created in phase 1 and run in phase 2, typically runs several times slower
       than the original program. Due to the increased execution time required by the instrumented program, the executable should be invoked in such a way
       as to minimize execution duration, while still fully exercising the desired code areas. The fdpr command user should also attempt to eliminate,
       where feasible, any time dependent aspects of the program.

Flags

       -1,-2, -3
            Specifies the phase to run. The default is all 3 phases (-123). The -s flag must be used when running separate phases so that the succeeding
            phases can access the required intermediate files. The phases must be run in order (for example, -1, then -2, then -3, or -1, then -23). The -2
            flag must be used along with the invocation flag -x.
       -M SegNum
            Specifies where to map shared memory for profiling. The default is 0x30000000. Specify an alternate shared memory address if the program to be
            optimized or any of the workload command strings invoked with the -x flag use conflicting shared-memory addresses. Typical alternative values
            are 0x40000000, 0x50000000, ... up to 0xC0000000).
       -fd Fdesc

            Specifies which file descriptor number is to be used for the profile file that is mapped to the above shared memory area. The default of Fdesc
            is set to 1999.
       -o OutFile
            Specifies the name of the output file from the optimizer. The default is program.fdpr
       -p ProgramFile
            Contains the name of the executable program file or shared object file or shared library containing shared objects/executables, to optimize.
            This program must be an unstripped executable.
       -armember ArchiveMemberList
            List of archive members to be optimized, within a shared archive file specified by the -p flag. If -armember is not specified, all members of
            the archive file are optimized.
       -map
            Print a map of basic blocks and static variables with their respective old -> new addresses into a suffixed .mapper file.
       -disasm
            Prints the disassembled text section of the output optimized and instrumented program into a suffixed .dis_text file.
       -disasm_data
            Prints the disassembled data section of the output optimized and instrumented program into a suffixed .dis_data file.
       -disasm_bss
            Prints the disassembled bss section of the output optimized and instrumented program into a suffixed .dis_bss file.
       -profcount
            Prints the profiling counters into a suffixed .ncounts file.
       -quiet
            Quiet output mode.
       -v
            Verbose output.
       -x WorkloadCommand
            Specifies the command used for invoking the instrumented program. All the arguments after the -x flag are used for the invocation. Therefore,
            the -x flag must appear last in the command line. The -x flag is required when the -2 flag is used.

Optimization Flags
       -analyse_asm_csects
            Analyze csects written in assembly (when used, must be specified at both the -1 and -3 phases).
       -extra_safe_analysis
            Do not attempt to analyze unconventional csects containing hand-written assembly code (when used, must be specified at both the -1 and -3
            phases).
       -ignore_info
            Ignore .info sections produced with the -qfdpr option during compile time (when used, must be specified at both -1 and -3 phases).
       -align bytes
            Align frequently executed code according to given number of bytes, for improving code prefetch buffer ratio. If this option is omitted, the
            fdpr command aligns the code with variable default number of bytes.
       -lr_opt
            Eliminate stores and restores of the link register in frequently executed procedures.
       -bt_csect_anchor_removal
            Eliminate load instructions related to the usage of branch tables in the code.
       -dead_code_removal
            Remove unreachable code.
       -selective_inline
            Perform selective inlining for functions that are frequently called from a single dominant call site.
       -sid_fac percent
            Set a dominant factor percentage for selective inline optimization. The allowed range is between 50 - 100 (applicable only with the
            -selective_inline flag).
       -inline_small_funcs size
            Inline all functions that are smaller or equal to the given size in bytes.
       -inline_hot_funcs percent
            Inline all functions with an execution frequency equals or greater than the given percentage. The input percent range is between 0 - 100.
       -inline
            Perform -inline_small_funcs 12 with -selective_inline.
       -hco_resched
            Relocate instructions from frequently executed code to rarely executed code area, when possible.

       -dcbt_opt
            Insert dcbt instructions to improve data-cache performance.
       -killed_regs
            Eliminate stores and restores of registers that are killed (overwritten) after frequently executed function calls.
       -tb
            Force the restructuring of traceback tables in reordered code. If -tb option is omitted, traceback tables are automatically restored for C++
            applications using Try & Catch mechanism.
       -pc
            Preserve csects' boundaries in reordered code.
       -pp
            Preserve functions' boundaries in reordered code.
       -RD
            Perform static data reordering.
       -dpnf factor
            Data Placement Normalization Factor between 0 - 1; where 0 causes static variables to be reordered regardless of their size, whereas 1 will
            locate only small sized variables first (applicable only with the -RD flag).
       -dpht threshold
            Data Placement Hotness Threshold between 0 - 1; where 0 reorders the static variables in large groups based on the control flow, and whereas 1
            will reorder the variables in very small groups based on their access frequency (applicable only with the -RD flag).
       -build_dcg
            Build DCG (Data Connectivity Graph) for enhanced data reordering (applicable only with the -RD flag).
       -tocload
            Perform tocload optimization.
       -reduce_toc removal_factor
            Perform TOC entries removal accordingly to removal factor between 0 - 1, where 0 removes only non-accessed TOC entries and 1 removes all non-
            exported TOC entries.
       -strip
            Strip the output file (if any is produced).
       -ptrgl_opt
            Perform optimization of indirect call instructions by way of registers by replacing them with direct jumps.
       -no_ptrgl_r11
            Do not perform removal of R11 load instruction in _ptrgl csect (the -ptrgl_r11 optimization is applied by default).
       -O
            Perform code reordering with branch prediction bit setting, branch folding and NOOP instructions removal. The -O flag is applied by default.
       -O2
            Switch on all less aggressive optimization flags.
       -O3
            Switch on all aggressive optimization flags.
       -O4
            Switch on all aggressive optimization flags.

Optimization

       The fdpr command performs, by default, the highest possible level of code reordering optimization together with the optimizations of branch
       prediction bit setting, branch folding, code alignment and removal of redundant NOOP instructions. The -pc flag reorders the entire code while
       preserving csects' boundaries and therefore, may result in less performance improvement than the default code reordering. Similarly, the -pp flag
       reorders the entire code while preserving procedures' boundaries.

       Additional optimizations performed on the entire executable program file are available by the optimization flags above.

       Executables built with the -qfdpr IBM xl compiler flag contain information to assist fdpr in producing reordered programs. Modules which are not
       compiled with the -qfdpr option, are reordered based on the compiler signatures in the symbol table.

       Additional performance enhancements may be realized by using static linking when building the program to be reordered. Since the fdpr program only
       reorders the instructions within the executable program specified, any dynamically linked shared library routines called by the program are not
       optimized. Statically linking these library routines to the executable allows for optimizing both the instructions in the program and all library
       routines used by the program. There are other advantages as well as disadvantages to building a statically linked program. See the Performance
       management for further information.

Output Files

       All files created by the fdpr command are stored in the current directory with the exception of any files which may be created by running the
       workload command specified in the -x flag. During the optimization process, the original program is saved by renaming the program, and is only
       restored to the original program name upon successful completion of the final phase.

       The profile file created by the fdpr command explicitly uses the full name of the current directory since scripts used to run the program may change
       the working directory before executing the program.

       The files created and/or used by the fdpr command are:
       program
            Name of the unstripped executable to be optimized.
       program.save
            Saved version of the original executable program.
       program.nprof
            Name of the profile file.
       program.instr
            Name of the instrumented version of program.
       program.fdpr
            Default name of optimized executable output file.
       program.instr.dis_text
            Default disassembly file in ASCII format produced by -disasm flag after instrumentation phase.
       program.fdpr.dis_text
            Default disassembly file in ASCII format produced by -disasm flag after optimization phase.
       program.instr.dis_data
            Default disassembly file in ASCII format produced by -disasm_data flag after instrumentation phase.
       program.fdpr.dis_data
            Default disassembly file in ASCII format produced by -disasm_data flag after optimization phase.
       program.instr.dis_bss
            Default disassembly file in ASCII format produced by -disasm_bss flag after instrumentation phase.
       program.fdpr.dis_bss
            Default disassembly file in ASCII format produced by -disasm_bss flag after optimization phase.
       program.instr.mapper
            Default mapping file in ASCII format produced by -map flag after instrumentation phase.
       program.fdpr.mapper
            Default mapping file in ASCII format produced by -map flag after optimization phase.
       program.ncounts
            Default profile counters file in ASCII format produced by -profcount flag.

Enhanced Debugging Capabilities

       In order to enable a certain degree of debugging capability for optimized programs, FDPR updates the Symbol Table to reflect the changes that were
       made in the .text section.

       Entry fields in the Symbol Table that specify addresses of symbols that were relocated during the reordering of FDPR, are modified to point to their
       new addresses in the .text section.

       In addition, in the case where functions or files are split during reordering, FDPR creates new entries in the Symbol Table for each new part of the
       split function/file. These new parts of the same function are given new symbol names in the Symbol Table according to the following naming
       convention:

       <original function name>__fdpr_<function's part number>

       After code reordering all the new entries are suffixed with the __fdpr_ string.

       Example: Originally, function "main" had the following entry in the Symbol Table:

         [Index] m   Value       Scn     Aux   Sclass    Type    Name
          [456]  m  0x00000230    2       1     0x02    0x0000   .main

       If after code reordering, function main was split into 3 parts, then it would have 3 entries in the Symbol Table; one for each part as follows:

         [Index] m   Value       Scn     Aux   Sclass    Type    Name
          [456]  m  0x00000304    2       1     0x02    0x0000   .main
         [1447]  m  0x00003328    2       1     0x02    0x0000   .main__fdpr_1
         [1453]  m  0x000033b4    2       1     0x02    0x0000   .main__fdpr_2

Examples

       The following are typical usage examples of the fdpr command.
       1    This example allows the user to run all three phases. In this example, test1 is the unstripped executable and test2 is a shell script that
            invokes test1. The current working directory is /tmp/fdpr.

            test2 script file:
            # code to exercise test1
            test1 -expand 100 -root $PATH file.jpg -quit
            # the end of test2

            Execute the fdpr command (using the default optimization):

            fdpr -p test1 -x test2

            This results in the new reordered executable test1.fdpr.
       2    To run one phase at a time, execute phase one of fdpr.

            fdpr -1 -p test1

            This command string creates an instrumented version with the name test1.instr and the empty template profile file test1.nprof.

            To execute phase two:

            fdpr -2 -p test1 -x test2

            This command string executes the script file test2 that runs the instrumented version of test1 to collect the profile data.

            To execute phase three:

            fdpr -3 -p test1

            Again, this results in the new reordered executable test1.fdpr.
       3    To run the first two phases followed by phase three, execute phase one and two.

            fdpr -12 -p test1 -x test2

            Execute phase three using optimization level three.

            fdpr -3 -O3 -p test1
       4    If an error occurs while running an fdpr optimized program, the dbx command can be used to determine what procedure the error occurred in as
            follows:

            dbx program.fdpr

            which produces the output similar to the following:

            Type 'help' for help.
            reading symbolic information ...warning: no source compiled with -g

            [using memory image in core]

            Segmentation fault in proc_d at 0x10000634
            0x10000634 (???) 98640000        stb   r3,0x0(r4)
            (dbx)

            A stack traceback, which is used to determine how the program arrived at the current location, is produced as follows:

            (dbx) where

            which produces the following output:

            proc_d(0x0) at 0x10000634
            proc_c(0x0) at 0x10000604
            proc_b(0x0) at 0x100005d0
            proc_a(0x0) at 0x1000059c
            main(0x2, 0x2ff7fba4) at 0x1000055c
            (dbx)
       5    The dbx subcommand stepi may also be used to single step through the instructions of a reordered executable program as follows:

            (dbx) stepi

            which produces the following output:

            stopped in proc_d at 0x1000061c
            0x1000061c (???) 9421ffc0       stwu   r1,-64(r1)
            (dbx)

            In this example, dbx indicates that the program stopped in routine proc_d at address 0x1000061c in the reordered text section.

Implementation Specifics

       Software Product/Option: AIX Performance Aide/ Local Performance Analysis & Control Commands.

       Standards Compliance: None.

Files

       /usr/bin/fdpr
            Contains the fdpr command.
       program
            Name of the unstripped executable to be optimized.
       program.save
            Saved version of the original executable program.
       program.nprof
            Name of the profile file.
       program.instr
            Name of the instrumented version of program.
       program.fdpr
            Default name of optimized executable output file.
       program.instr.dis_text
            Default disassembly file in ASCII format produced by -disasm flag after instrumentation phase.
       program.fdpr.dis_text
            Default disassembly file in ASCII format produced by -disasm flag after optimization phase.
       program.instr.dis_data
            Default disassembly file in ASCII format produced by -disasm_data flag after instrumentation phase.
       program.fdpr.dis_data
            Default disassembly file in ASCII format produced by -disasm_data flag after optimization phase.
       program.instr.dis_bss
            Default disassembly file in ASCII format produced by -disasm_bss flag after instrumentation phase.

       program.fdpr.dis_bss
            Default disassembly file in ASCII format produced by -disasm_bss flag after optimization phase.
       program.instr.mapper
            Default mapping file in ASCII format produced by -map flag after instrumentation phase.
       program.fdpr.mapper
            Default mapping file in ASCII format produced by -map flag after optimization phase.
       program.ncounts
            Default profile counters file in ASCII format produced by -profcount flag.

Related Information

       The dbx command.

       Restructuring executable programs with the fdpr program in Performance management.

       The xlC compiler.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值