Parallel Studio 2011 中的Intel c++ 命令行

最新推荐文章于 2022-01-29 10:22:42 发布

G_Spider

最新推荐文章于 2022-01-29 10:22:42 发布

阅读量3.1k

点赞数

分类专栏： C/C++ 文章标签： parallel c++ compiler profiling optimization generation

本文链接：https://blog.csdn.net/G_Spider/article/details/6207376

版权

C/C++ 专栏收录该内容

36 篇文章 1 订阅

订阅专栏

@echo off
rem call "D:/Microsoft Visual Studio 10.0/VC/bin/VCVARS32.BAT"
call "C:/Program Files/Intel/Parallel Studio 2011/ips-vars.cmd"
icl /c /o3 fftsg_h.c currTime.c

icl /c /o3 test_speedFFT.cpp

xilink /subsystem:console test_speedFFT.obj fftsg_h.obj currTime.obj

rem icl /help

del *.obj

pause

test_speedFFT.exe

用Intel C++编译器可以提升性能。

使用Intel C++编译得到的fft结果：

     len fft time         time/(n*log2(n)
      64 0.00000057 0.000000001481
     128 0.00000138 0.000000001544
     256 0.00000269 0.000000001313
     512 0.00000778 0.000000001688
    1024 0.00001426 0.000000001393
    2048 0.00003525 0.000000001565
    4096 0.00007433 0.000000001512
    8192 0.00016798 0.000000001577
   16384 0.00033520 0.000000001461
   32768 0.00079489 0.000000001617
   65536 0.00160501 0.000000001531
131072 0.00372449 0.000000001672
262144 0.00771416 0.000000001635
524288 0.01943485 0.000000001951
1048576 0.05234874 0.000000002496
2097152 0.10692816 0.000000002428

使用vs2010中的cl编译的结果：

     len    fft time          time/(n*log2(n)
      64 0.00000295 0.000000007686
     128 0.00000363 0.000000004052
     256 0.00000418 0.000000002039
     512 0.00001109 0.000000002407
    1024 0.00002207 0.000000002155
    2048 0.00005774 0.000000002563
    4096 0.00011116 0.000000002262
    8192 0.00027207 0.000000002555
   16384 0.00053441 0.000000002330
   32768 0.00125360 0.000000002550
   65536 0.00258079 0.000000002461
131072 0.00585856 0.000000002629
262144 0.01220177 0.000000002586
524288 0.03064302 0.000000003076
1048576 0.07136638 0.000000003403
2097152 0.14600080 0.000000003315

命令：

icl /help 得到命令行参考

Intel(R) C++ Compiler Help
==========================

usage: icl [options] file1 [file2 ...] [/link linker_options]

where options represents zero or more compiler options

     fileN is a C/C++ source (.c .cc .cpp .cxx .i), assembly (.asm),
     object (.obj), static library (.lib), or other linkable file
     linker_options represents zero or more linker options

Notes
-----
1. Most Microsoft* Visual C++* compiler options are supported; a warning is
printed for most unsupported options. The precise behavior of performance
options does not always match that of the Microsoft Visual C++ compiler.

2. Intel C++ compiler options may be placed in your icl.cfg file.

3. Most options beginning with /Q are specific to the Intel C++ compiler:
(*I) indicates other options specific to the Intel C++ compiler
(*M) indicates /Q options supported by the Microsoft Visual C++ compiler

   Some options listed are only available on a specific system
[press RETURN to continue]
   i32    indicates the feature is available on systems based on IA-32
          architecture
   i64em indicates the feature is available on systems using Intel(R) 64
          architecture

Compiler Option List
--------------------

Optimization
------------

/O1       optimize for maximum speed, but disable some optimizations which
          increase code size for a small speed benefit
/O2       optimize for maximum speed (DEFAULT)
/O3       optimize for maximum speed and enable more aggressive optimizations
          that may not improve performance on some programs
/Ox       enable maximum optimizations (same as /O2)
/Os       enable speed optimizations, but disable some optimizations which
          increase code size for small speed benefit (overrides /Ot)
/Ot       enable speed optimizations (overrides /Os)
/Od       disable optimizations
/Oi[-]    enable/disable inline expansion of intrinsic functions
/Oy[-]    enable/disable using EBP as a general purpose register (no frame
          pointer) (i32 only)
[press RETURN to continue]
/fast     enable /QxHOST /O3 /Qipo /Qprec-div-
          options set by /fast cannot be overridden with the exception of
          /QxHOST, list options separately to change behavior
/Oa[-]    assume no aliasing in program
/Ow[-]    assume no aliasing within functions, but assume aliasing across calls

Code Generation
---------------

/Qx<code>
          generate specialized code to run exclusively on processors
          indicated by <code> as described below
            Host generate instructions for the highest instruction set and
                 processor available on the compilation host machine
            SSE2 Intel Pentium 4 and compatible Intel processors. Enables new
                 optimizations in addition to Intel processor-specific
                 optimizations
            SSE3    Intel(R) Core(TM) processor family with Streaming SIMD
                    Extensions 3 (Intel(R) SSE3) instruction support
            SSSE3   Intel(R) Core(TM)2 processor family with Supplemental
                    Streaming SIMD Extensions 3 (SSSE3)
            SSE4.1 Intel(R) 45nm Hi-k next generation Intel Core(TM)
                    microarchitecture with support for Streaming SIMD
                    Extensions 4 (Intel(R) SSE4) Vectorizing
[press RETURN to continue]
                    Compiler and Media Accelerator instructions
            SSE4.2 Can generate Intel(R) SSE4 Efficient Accelerated String
                    and Text Processing instructions supported by Intel(R)
                    Core(TM) i7 processors. Can generate Intel(R) SSE4
                    Vectorizing Compiler and Media Accelerator, Intel(R) SSSE3,
                    SSE3, SSE2, and SSE instructions and it can optimize for
                    the Intel(R) Core(TM) processor family.
            AVX     Enable Intel(R) Advanced Vector Extensions instructions
            SSE3_ATOM Can generate MOVBE instructions for Intel processors and
                      can optimize for the Intel(R) Atom(TM) processor.
/Qax<code1>[,<code2>,...]
          generate code specialized for processors specified by <codes>
          while also generating generic IA-32 instructions.
          <codes> includes one or more of the following:
            SSE2 Intel Pentium 4 and compatible Intel processors. Enables new
                 optimizations in addition to Intel processor-specific
                 optimizations
            SSE3    Intel(R) Core(TM) processor family with Streaming SIMD
                    Extensions 3 (Intel(R) SSE3) instruction support
            SSSE3   Intel(R) Core(TM)2 processor family with Supplemental
                    Streaming SIMD Extensions 3 (SSSE3)
            SSE4.1 Intel(R) 45nm Hi-k next generation Intel Core(TM)
                    microarchitecture with support for Streaming SIMD
                    Extensions 4 (Intel(R) SSE4) Vectorizing
[press RETURN to continue]
                    Compiler and Media Accelerator instructions
            SSE4.2 Can generate Intel(R) SSE4 Efficient Accelerated String
                    and Text Processing instructions supported by Intel(R)
                    Core(TM) i7 processors. Can generate Intel(R) SSE4
                    Vectorizing Compiler and Media Accelerator, Intel(R) SSSE3,
                    SSE3, SSE2, and SSE instructions and it can optimize for
                    the Intel(R) Core(TM) processor family.
            AVX     Enable Intel(R) Advanced Vector Extensions instructions
/arch:<code>
          generate specialized code to optimize for processors indicated by
          <code> as described below
            SSE Intel Pentium III and compatible Intel processors
            SSE2 Intel Pentium 4 and compatible Intel processors. Enables new
                 optimizations in addition to Intel processor-specific
                 optimizations
            SSE3 Intel(R) Core(TM) processor family. Code is expected to run
                 properly on any processor that supports SSE3, SSE2 and SSE
                 instruction sets
            SSSE3   Intel(R) Core(TM)2 processor family with Supplemental
                    Streaming SIMD Extensions 3 (SSSE3)
            SSE4.1 Intel(R) 45nm Hi-k next generation Intel Core(TM)
                    microarchitecture with support for Streaming SIMD
                    Extensions 4 (Intel(R) SSE4) Vectorizing
                    Compiler and Media Accelerator instructions
[press RETURN to continue]
            IA32    generate generic IA-32 architecture code for Intel Pentium
                    III and compatible Intel processors. Disables any default
                    or previously set extended instruction setting
/Qinstruction:<keyword>
          Refine instruction set output for the selected target processor

[no]movbe - Do/do not generate MOVBE instructions with SSE3_ATOM
(requires /QxSSE3_ATOM)

/GR[-]    enable/disable C++ RTTI
/Qcxx-features
          enable standard C++ features (/GX /GR)
/EHa      enable asynchronous C++ exception handling model
/EHs      enable synchronous C++ exception handling model
/EHc      assume extern "C" functions do not throw exceptions
/Qsafeseh[-]
          Registers exceptions for safe exception handling (DEFAULT)
/Gd       make __cdecl the default calling convention
/Gr       make __fastcall the default calling convention
/Gz       make __stdcall the default calling convention
/Qregcall
          make __regcall the default calling convention
/hotpatch[:n]
          generate padding bytes for function entries to enable image
[press RETURN to continue]
          hotpatching. If specified, use 'n' as the padding.

Interprocedural Optimization (IPO)
----------------------------------

/Qip[-]   enable(DEFAULT)/disable single-file IP optimization
          within files
/Qipo[n] enable multi-file IP optimization between files
/Qipo-c   generate a multi-file object file (ipo_out.obj)
/Qipo-S   generate a multi-file assembly file (ipo_out.asm)
/Qip-no-inlining
          disable full and partial inlining
/Qip-no-pinlining
          disable partial inlining
/Qipo-separate
          create one object file for every source file (overrides /Qipo[n])
/Qipo-jobs<n>
          specify the number of jobs to be executed simultaneously during the
          IPO link phase

Advanced Optimizations
----------------------

/Qunroll[n]
[press RETURN to continue]
          set maximum number of times to unroll loops. Omit n to use default
          heuristics. Use n=0 to disable the loop unroller
/Qunroll-aggressive[-]
          enables more aggressive unrolling heuristics
/Qscalar-rep[-]
          enable(DEFAULT)/disable scalar replacement (requires /O3)
/Qansi-alias[-]
          enable/disable(DEFAULT) use of ANSI aliasing rules optimizations;
          user asserts that the program adheres to these rules
/Qansi-alias-check[-]
          enable(DEFAULT)/disable ANSI alias checking when using /Qansi-alias
/Qcomplex-limited-range[-]
          enable/disable(DEFAULT) the use of the basic algebraic expansions of
          some complex arithmetic operations. This can allow for some
          performance improvement in programs which use a lot of complex
          arithmetic at the loss of some exponent range.
/Qalias-const[-]
          enable/disable(DEFAULT) a heuristic stating that if two arguments to
          a function have pointer type, a pointer to const does not alias a
          pointer to non-const. Also known as the input/output buffer rule, it
          assumes that input and output buffer arguments do not overlap.
/Qalias-args[-]
          enable(DEFAULT)/disable C/C++ rule that function arguments may be
          aliased; when disabling the rule, the user asserts that this is safe
[press RETURN to continue]
/Qopt-multi-version-aggressive[-]
          enables more aggressive multi-versioning to check for pointer
          aliasing and scalar replacement
/Qopt-ra-region-strategy[:<keyword>]
          select the method that the register allocator uses to partition each
          routine into regions
            routine - one region per routine
            block   - one region per block
            trace   - one region per trace
            loop    - one region per loop
            default - compiler selects best option
/Qvec[-] enables(DEFAULT)/disables vectorization
/Qvec-guard-write[-]
          enables cache/bandwidth optimization for stores under conditionals
          within vector loops
/Qvec-threshold[n]
          sets a threshold for the vectorization of loops based on the
          probability of profitable execution of the vectorized loop in
          parallel
/Qopt-malloc-options:{0|1|2|3|4}
          specify malloc configuration parameters. Specifying a non-zero <n>
          value will cause alternate configuration parameters to be set for
          how malloc allocates and frees memory
/Qopt-jump-tables:<arg>
[press RETURN to continue]
          control the generation of jump tables
            default - let the compiler decide when a jump table, a series of
                      if-then-else constructs or a combination is generated
            large   - generate jump tables up to a certain pre-defined size
                      (64K entries)
            <n>     - generate jump tables up to <n> in size
          use /Qopt-jump-tables- to lower switch statements as chains of
          if-then-else constructs
/Qopt-block-factor:<n>
          specify blocking factor for loop blocking
/Qfreestanding
          compile in a freestanding environment where the standard library
          may not be present
/Qopt-streaming-stores:<arg>
          specifies whether streaming stores are generated
            always - enables generation of streaming stores under the
            assumption that the application is memory bound
            auto   - compiler decides when streaming stores are used (DEFAULT)
            never - disables generation of streaming stores
/Qipp[:<arg>]
          link some or all of the Intel(R) Integrated Performance Primitives
          (Intel(R) IPP) libraries and bring in the associated headers
            common        - link using the main libraries set. This is the
                            default value when /Qipp is specified
[press RETURN to continue]
            crypto        - link using the main libraries set and the crypto
                            library
/Qmkl[:<arg>]
          link to the Intel(R) Math Kernel Library (Intel(R) MKL) and bring
          in the associated headers
            parallel   - link using the threaded Intel(R) MKL libraries. This
                         is the default when /Qmkl is specified
            sequential - link using the non-threaded Intel(R) MKL libraries
            cluster    - link using the Intel(R) MKL Cluster libraries plus
                         the sequential Intel(R) MKL libraries
/Qtbb     link to the Intel(R) Threading Building Blocks (Intel(R) TBB)
          libraries and bring in the associated headers
/Qopt-subscript-in-range[-]
          assumes no overflows in the intermediate computation of the
          subscripts
/Quse-intel-optimized-headers[-]
          take advantage of the optimized header files
/Qcilk-serialize
          run a Cilk program as a C/C++ serialized program
/Qarray-notation[-]
          enable/disable(DEFAULT) C/C++ array extensions for data parallel
          programming
/Qopt-matmul[-]
          replace matrix multiplication with calls to intrinsics and threading
[press RETURN to continue]
          libraries for improved performance (DEFAULT at /O3 /Qparallel)
/Qsimd[-]
          enables(DEFAULT)/disables vectorization using simd pragma
/Qguide-opts:<arg>
          tells the compiler to analyze certain code and generate
          recommendations that may improve optimizations
/Qguide-file[:<filename>]
          causes the results of guided auto-parallelization to be output to a
          file
/Qguide-file-append[:<filename>]
          causes the results of guided auto-parallelization to be appended to
          a file
/Qguide[:<level>]
          lets you set a level (1 - 4) of guidance for auto-vectorization,
          auto-parallelization, and data transformation (DEFAULT is 4 when the
          option is specified)
/Qguide-data-trans[:<level>]
          lets you set a level (1 - 4) of guidance for data transformation
          (DEFAULT is 4 when the option is specified)
/Qguide-par[:<level>]
          lets you set a level (1 - 4) of guidance for auto-parallelization
          (DEFAULT is 4 when the option is specified)
/Qguide-vec[:<level>]
          lets you set a level (1 - 4) of guidance for auto-vectorization
[press RETURN to continue]
          (DEFAULT is 4 when the option is specified)

Profile Guided Optimization (PGO)
---------------------------------

/Qprof-dir <dir>
          specify directory for profiling output files (*.dyn and *.dpi)
/Qprof-src-root <dir>
          specify project root directory for application source files to
          enable relative path resolution during profile feedback on sources
          below that directory
/Qprof-src-root-cwd
          specify the current directory as the project root directory for
          application source files to enable relative path resolution during
          profile feedback on sources below that directory
/Qprof-src-dir[-]
          specify whether directory names of sources should be
          considered when looking up profile records within the .dpi file
/Qprof-file <file>
          specify file name for profiling summary file
/Qprof-data-order[-]
          enable/disable(DEFAULT) static data ordering with profiling
/Qprof-func-order[-]
          enable/disable(DEFAULT) function ordering with profiling
[press RETURN to continue]
/Qprof-gen[:keyword]
          instrument program for profiling.
          Optional keyword may be srcpos or globdata
/Qprof-gen-
          disable profiling instrumentation
/Qprof-use[:<arg>]
          enable use of profiling information during optimization
            weighted - invokes profmerge with -weighted option to scale data
                        based on run durations
            [no]merge - enable(default)/disable the invocation of the profmerge
                        tool
/Qprof-use-
          disable use of profiling information during optimization
/Qcov-gen
          instrument program for profiling
/Qcov-dir <dir>
          specify directory for profiling output files (*.dyn and *.dpi)
/Qcov-file <file>
          specify file name for profiling summary file
/Qfnsplit[-]
          enable/disable function splitting (enabled with /Qprof-use)
/Qopt-prefetch[:n]
          enable levels of prefetch insertion, where 0 disables.
          n may be 0 through 4 inclusive. Default is 2.
[press RETURN to continue]