@echo off
rem call "D:/Microsoft Visual Studio 10.0/VC/bin/VCVARS32.BAT"
call "C:/Program Files/Intel/Parallel Studio 2011/ips-vars.cmd"
icl /c /o3 fftsg_h.c currTime.c
icl /c /o3 test_speedFFT.cpp
xilink /subsystem:console test_speedFFT.obj fftsg_h.obj currTime.obj
rem icl /help
del *.obj
pause
test_speedFFT.exe
用Intel C++编译器可以提升性能。
使用Intel C++编译得到的fft结果:
len fft time time/(n*log2(n)
64 0.00000057 0.000000001481
128 0.00000138 0.000000001544
256 0.00000269 0.000000001313
512 0.00000778 0.000000001688
1024 0.00001426 0.000000001393
2048 0.00003525 0.000000001565
4096 0.00007433 0.000000001512
8192 0.00016798 0.000000001577
16384 0.00033520 0.000000001461
32768 0.00079489 0.000000001617
65536 0.00160501 0.000000001531
131072 0.00372449 0.000000001672
262144 0.00771416 0.000000001635
524288 0.01943485 0.000000001951
1048576 0.05234874 0.000000002496
2097152 0.10692816 0.000000002428
使用vs2010中的cl编译的结果:
len fft time time/(n*log2(n)
64 0.00000295 0.000000007686
128 0.00000363 0.000000004052
256 0.00000418 0.000000002039
512 0.00001109 0.000000002407
1024 0.00002207 0.000000002155
2048 0.00005774 0.000000002563
4096 0.00011116 0.000000002262
8192 0.00027207 0.000000002555
16384 0.00053441 0.000000002330
32768 0.00125360 0.000000002550
65536 0.00258079 0.000000002461
131072 0.00585856 0.000000002629
262144 0.01220177 0.000000002586
524288 0.03064302 0.000000003076
1048576 0.07136638 0.000000003403
2097152 0.14600080 0.000000003315
命令:
icl /help 得到命令行参考
Intel(R) C++ Compiler Help
==========================
usage: icl [options] file1 [file2 ...] [/link linker_options]
where options represents zero or more compiler options
fileN is a C/C++ source (.c .cc .cpp .cxx .i), assembly (.asm),
object (.obj), static library (.lib), or other linkable file
linker_options represents zero or more linker options
Notes
-----
1. Most Microsoft* Visual C++* compiler options are supported; a warning is
printed for most unsupported options. The precise behavior of performance
options does not always match that of the Microsoft Visual C++ compiler.
2. Intel C++ compiler options may be placed in your icl.cfg file.
3. Most options beginning with /Q are specific to the Intel C++ compiler:
(*I) indicates other options specific to the Intel C++ compiler
(*M) indicates /Q options supported by the Microsoft Visual C++ compiler
Some options listed are only available on a specific system
[press RETURN to continue]
i32 indicates the feature is available on systems based on IA-32
architecture
i64em indicates the feature is available on systems using Intel(R) 64
architecture
Compiler Option List
--------------------
Optimization
------------
/O1 optimize for maximum speed, but disable some optimizations which
increase code size for a small speed benefit
/O2 optimize for maximum speed (DEFAULT)
/O3 optimize for maximum speed and enable more aggressive optimizations
that may not improve performance on some programs
/Ox enable maximum optimizations (same as /O2)
/Os enable speed optimizations, but disable some optimizations which
increase code size for small speed benefit (overrides /Ot)
/Ot enable speed optimizations (overrides /Os)
/Od disable optimizations
/Oi[-] enable/disable inline expansion of intrinsic functions
/Oy[-] enable/disable using EBP as a general purpose register (no frame
pointer) (i32 only)
[press RETURN to continue]
/fast enable /QxHOST /O3 /Qipo /Qprec-div-
options set by /fast cannot be overridden with the exception of
/QxHOST, list options separately to change behavior
/Oa[-] assume no aliasing in program
/Ow[-] assume no aliasing within functions, but assume aliasing across calls
Code Generation
---------------
/Qx<code>
generate specialized code to run exclusively on processors
indicated by <code> as described below
Host generate instructions for the highest instruction set and
processor available on the compilation host machine
SSE2 Intel Pentium 4 and compatible Intel processors. Enables new
optimizations in addition to Intel processor-specific
optimizations
SSE3 Intel(R) Core(TM) processor family with Streaming SIMD
Extensions 3 (Intel(R) SSE3) instruction support
SSSE3 Intel(R) Core(TM)2 processor family with Supplemental
Streaming SIMD Extensions 3 (SSSE3)
SSE4.1 Intel(R) 45nm Hi-k next generation Intel Core(TM)
microarchitecture with support for Streaming SIMD
Extensions 4 (Intel(R) SSE4) Vectorizing
[press RETURN to continue]
Compiler and Media Accelerator instructions
SSE4.2 Can generate Intel(R) SSE4 Efficient Accelerated String
and Text Processing instructions supported by Intel(R)
Core(TM) i7 processors. Can generate Intel(R) SSE4
Vectorizing Compiler and Media Accelerator, Intel(R) SSSE3,
SSE3, SSE2, and SSE instructions and it can optimize for
the Intel(R) Core(TM) processor family.
AVX Enable Intel(R) Advanced Vector Extensions instructions
SSE3_ATOM Can generate MOVBE instructions for Intel processors and
can optimize for the Intel(R) Atom(TM) processor.
/Qax<code1>[,<code2>,...]
generate code specialized for processors specified by <codes>
while also generating generic IA-32 instructions.
<codes> includes one or more of the following:
SSE2 Intel Pentium 4 and compatible Intel processors. Enables new
optimizations in addition to Intel processor-specific
optimizations
SSE3 Intel(R) Core(TM) processor family with Streaming SIMD
Extensions 3 (Intel(R) SSE3) instruction support
SSSE3 Intel(R) Core(TM)2 processor family with Supplemental
Streaming SIMD Extensions 3 (SSSE3)
SSE4.1 Intel(R) 45nm Hi-k next generation Intel Core(TM)
microarchitecture with support for Streaming SIMD
Extensions 4 (Intel(R) SSE4) Vectorizing
[press RETURN to continue]
Compiler and Media Accelerator instructions
SSE4.2 Can generate Intel(R) SSE4 Efficient Accelerated String
and Text Processing instructions supported by Intel(R)
Core(TM) i7 processors. Can generate Intel(R) SSE4
Vectorizing Compiler and Media Accelerator, Intel(R) SSSE3,
SSE3, SSE2, and SSE instructions and it can optimize for
the Intel(R) Core(TM) processor family.
AVX Enable Intel(R) Advanced Vector Extensions instructions
/arch:<code>
generate specialized code to optimize for processors indicated by
<code> as described below
SSE Intel Pentium III and compatible Intel processors
SSE2 Intel Pentium 4 and compatible Intel processors. Enables new
optimizations in addition to Intel processor-specific
optimizations
SSE3 Intel(R) Core(TM) processor family. Code is expected to run
properly on any processor that supports SSE3, SSE2 and SSE
instruction sets
SSSE3 Intel(R) Core(TM)2 processor family with Supplemental
Streaming SIMD Extensions 3 (SSSE3)
SSE4.1 Intel(R) 45nm Hi-k next generation Intel Core(TM)
microarchitecture with support for Streaming SIMD
Extensions 4 (Intel(R) SSE4) Vectorizing
Compiler and Media Accelerator instructions
[press RETURN to continue]
IA32 generate generic IA-32 architecture code for Intel Pentium
III and compatible Intel processors. Disables any default
or previously set extended instruction setting
/Qinstruction:<keyword>
Refine instruction set output for the selected target processor
[no]movbe - Do/do not generate MOVBE instructions with SSE3_ATOM
(requires /QxSSE3_ATOM)
/GR[-] enable/disable C++ RTTI
/Qcxx-features
enable standard C++ features (/GX /GR)
/EHa enable asynchronous C++ exception handling model
/EHs enable synchronous C++ exception handling model
/EHc assume extern "C" functions do not throw exceptions
/Qsafeseh[-]
Registers exceptions for safe exception handling (DEFAULT)
/Gd make __cdecl the default calling convention
/Gr make __fastcall the default calling convention
/Gz make __stdcall the default calling convention
/Qregcall
make __regcall the default calling convention
/hotpatch[:n]
generate padding bytes for function entries to enable image
[press RETURN to continue]
hotpatching. If specified, use 'n' as the padding.
Interprocedural Optimization (IPO)
----------------------------------
/Qip[-] enable(DEFAULT)/disable single-file IP optimization
within files
/Qipo[n] enable multi-file IP optimization between files
/Qipo-c generate a multi-file object file (ipo_out.obj)
/Qipo-S generate a multi-file assembly file (ipo_out.asm)
/Qip-no-inlining
disable full and partial inlining
/Qip-no-pinlining
disable partial inlining
/Qipo-separate
create one object file for every source file (overrides /Qipo[n])
/Qipo-jobs<n>
specify the number of jobs to be executed simultaneously during the
IPO link phase
Advanced Optimizations
----------------------
/Qunroll[n]
[press RETURN to continue]
set maximum number of times to unroll loops. Omit n to use default
heuristics. Use n=0 to disable the loop unroller
/Qunroll-aggressive[-]
enables more aggressive unrolling heuristics
/Qscalar-rep[-]
enable(DEFAULT)/disable scalar replacement (requires /O3)
/Qansi-alias[-]
enable/disable(DEFAULT) use of ANSI aliasing rules optimizations;
user asserts that the program adheres to these rules
/Qansi-alias-check[-]
enable(DEFAULT)/disable ANSI alias checking when using /Qansi-alias
/Qcomplex-limited-range[-]
enable/disable(DEFAULT) the use of the basic algebraic expansions of
some complex arithmetic operations. This can allow for some
performance improvement in programs which use a lot of complex
arithmetic at the loss of some exponent range.
/Qalias-const[-]
enable/disable(DEFAULT) a heuristic stating that if two arguments to
a function have pointer type, a pointer to const does not alias a
pointer to non-const. Also known as the input/output buffer rule, it
assumes that input and output buffer arguments do not overlap.
/Qalias-args[-]
enable(DEFAULT)/disable C/C++ rule that function arguments may be
aliased; when disabling the rule, the user asserts that this is safe
[press RETURN to continue]
/Qopt-multi-version-aggressive[-]
enables more aggressive multi-versioning to check for pointer
aliasing and scalar replacement
/Qopt-ra-region-strategy[:<keyword>]
select the method that the register allocator uses to partition each
routine into regions
routine - one region per routine
block - one region per block
trace - one region per trace
loop - one region per loop
default - compiler selects best option
/Qvec[-] enables(DEFAULT)/disables vectorization
/Qvec-guard-write[-]
enables cache/bandwidth optimization for stores under conditionals
within vector loops
/Qvec-threshold[n]
sets a threshold for the vectorization of loops based on the
probability of profitable execution of the vectorized loop in
parallel
/Qopt-malloc-options:{0|1|2|3|4}
specify malloc configuration parameters. Specifying a non-zero <n>
value will cause alternate configuration parameters to be set for
how malloc allocates and frees memory
/Qopt-jump-tables:<arg>
[press RETURN to continue]
control the generation of jump tables
default - let the compiler decide when a jump table, a series of
if-then-else constructs or a combination is generated
large - generate jump tables up to a certain pre-defined size
(64K entries)
<n> - generate jump tables up to <n> in size
use /Qopt-jump-tables- to lower switch statements as chains of
if-then-else constructs
/Qopt-block-factor:<n>
specify blocking factor for loop blocking
/Qfreestanding
compile in a freestanding environment where the standard library
may not be present
/Qopt-streaming-stores:<arg>
specifies whether streaming stores are generated
always - enables generation of streaming stores under the
assumption that the application is memory bound
auto - compiler decides when streaming stores are used (DEFAULT)
never - disables generation of streaming stores
/Qipp[:<arg>]
link some or all of the Intel(R) Integrated Performance Primitives
(Intel(R) IPP) libraries and bring in the associated headers
common - link using the main libraries set. This is the
default value when /Qipp is specified
[press RETURN to continue]
crypto - link using the main libraries set and the crypto
library
/Qmkl[:<arg>]
link to the Intel(R) Math Kernel Library (Intel(R) MKL) and bring
in the associated headers
parallel - link using the threaded Intel(R) MKL libraries. This
is the default when /Qmkl is specified
sequential - link using the non-threaded Intel(R) MKL libraries
cluster - link using the Intel(R) MKL Cluster libraries plus
the sequential Intel(R) MKL libraries
/Qtbb link to the Intel(R) Threading Building Blocks (Intel(R) TBB)
libraries and bring in the associated headers
/Qopt-subscript-in-range[-]
assumes no overflows in the intermediate computation of the
subscripts
/Quse-intel-optimized-headers[-]
take advantage of the optimized header files
/Qcilk-serialize
run a Cilk program as a C/C++ serialized program
/Qarray-notation[-]
enable/disable(DEFAULT) C/C++ array extensions for data parallel
programming
/Qopt-matmul[-]
replace matrix multiplication with calls to intrinsics and threading
[press RETURN to continue]
libraries for improved performance (DEFAULT at /O3 /Qparallel)
/Qsimd[-]
enables(DEFAULT)/disables vectorization using simd pragma
/Qguide-opts:<arg>
tells the compiler to analyze certain code and generate
recommendations that may improve optimizations
/Qguide-file[:<filename>]
causes the results of guided auto-parallelization to be output to a
file
/Qguide-file-append[:<filename>]
causes the results of guided auto-parallelization to be appended to
a file
/Qguide[:<level>]
lets you set a level (1 - 4) of guidance for auto-vectorization,
auto-parallelization, and data transformation (DEFAULT is 4 when the
option is specified)
/Qguide-data-trans[:<level>]
lets you set a level (1 - 4) of guidance for data transformation
(DEFAULT is 4 when the option is specified)
/Qguide-par[:<level>]
lets you set a level (1 - 4) of guidance for auto-parallelization
(DEFAULT is 4 when the option is specified)
/Qguide-vec[:<level>]
lets you set a level (1 - 4) of guidance for auto-vectorization
[press RETURN to continue]
(DEFAULT is 4 when the option is specified)
Profile Guided Optimization (PGO)
---------------------------------
/Qprof-dir <dir>
specify directory for profiling output files (*.dyn and *.dpi)
/Qprof-src-root <dir>
specify project root directory for application source files to
enable relative path resolution during profile feedback on sources
below that directory
/Qprof-src-root-cwd
specify the current directory as the project root directory for
application source files to enable relative path resolution during
profile feedback on sources below that directory
/Qprof-src-dir[-]
specify whether directory names of sources should be
considered when looking up profile records within the .dpi file
/Qprof-file <file>
specify file name for profiling summary file
/Qprof-data-order[-]
enable/disable(DEFAULT) static data ordering with profiling
/Qprof-func-order[-]
enable/disable(DEFAULT) function ordering with profiling
[press RETURN to continue]
/Qprof-gen[:keyword]
instrument program for profiling.
Optional keyword may be srcpos or globdata
/Qprof-gen-
disable profiling instrumentation
/Qprof-use[:<arg>]
enable use of profiling information during optimization
weighted - invokes profmerge with -weighted option to scale data
based on run durations
[no]merge - enable(default)/disable the invocation of the profmerge
tool
/Qprof-use-
disable use of profiling information during optimization
/Qcov-gen
instrument program for profiling
/Qcov-dir <dir>
specify directory for profiling output files (*.dyn and *.dpi)
/Qcov-file <file>
specify file name for profiling summary file
/Qfnsplit[-]
enable/disable function splitting (enabled with /Qprof-use)
/Qopt-prefetch[:n]
enable levels of prefetch insertion, where 0 disables.
n may be 0 through 4 inclusive. Default is 2.
[press RETURN to continue]
......