SSE基础

最新推荐文章于 2019-08-20 09:58:27 发布

菜鸟决心努力A-A

最新推荐文章于 2019-08-20 09:58:27 发布

阅读量530

点赞数

分类专栏： SIMD

本文链接：https://blog.csdn.net/thefighteran/article/details/51012212

版权

SIMD 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

http://felix.abecassis.me/2011/09/cpp-getting-started-with-sse/

大概内容如下：（格式很乱）

In this article I will present how to use SSE instructions in C++ (or C).
My goal is not to show how to write the fastest possible program using SSE but rather to introduce to its usage.

What is SSE ?

SSE stands for Streaming SIMD Extensions. It is a set of CPU instructions dedicated to applications like signal processing, scientific computation or 3D graphics.

SIMD is an acronym itself: Single Instruction, Multiple Data. A CPU instruction is said to be SIMD when the same operation is applied on multiple data at the same time.

SSE was first introduced in the Pentium III in 1999. Over the time, this instruction set was improved by adding more sophisticated operations. Eight 128-bits registers were added to the CPU: xmm0 through xmm7.
这里写图片描述
Initially those registers could only be used for single-precision computations (i.e. for the type float). But since SSE2, those registers can be used for any primitive data type.
Given a standard 32-bit machine we can therefore store and process in parallel:
- 2 double
- 2 long
- 4 float
- 4 int
- 8 short
- 16 char
Note that integer types can be signed or unsigned, but you will have to use (sometimes) different instructions.

For instance if you have to compute the sum of two int arrays, using SSE you can compute 4 additions in a single assembly instruction.

Simple program

It is not easy getting started with SSE, fortunately the MSDN documentation is complete and well-written.
If you take a look at the list of arithmetic operations you can notice there is always a corresponding assembly instruction. Some operations on the other side (such as set operations) are composite operations.
Using SSE in C++ is indeed a very low-level construct: we will manipulate directly the 128-bits registers through variables of type __m128 for float, __m128d for double and __m128i for int, short, char.

But the good news is that you don’t need to declare arrays of type __m128 in order to use SSE: for instance if you want to calculate the square root of a large array of float, you can simply cast your pointer to the type __m128* and then use SSE operations.

There is a tweak however, most SSE operations requires the data to be 16-bytes aligned, here we will use another variable attributes of gcc.
We use the align attribute:

aligned (alignment)
This attribute specifies a minimum alignment for the variable or structure field, measured in bytes.

Here is a simple code on how to use SSE in order to compute the square root of 4 float in a single operation using the _mm_sqrt_ps function.

float a[] attribute ((aligned (16))) = { 41982., 81.5091, 3.14, 42.666 };
__m128* ptr = (__m128*)a;
__m128 t = _mm_sqrt_ps(*ptr);
The corresponding assembly instruction is SQRTPS, if we generate the assembly code of our program (using gcc -S) we can notice that this instruction is indeed used on a SSE register:

sqrtps %xmm0, %xmm0
Don’t forget the include !

#include

菜鸟决心努力A-A

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
SSE基础

http://felix.abecassis.me/2011/09/cpp-getting-started-with-sse/大概内容如下：（格式很乱）In this article I will present how to use SSE instructions in C++ (or C). My goal is not to show how to write the fastest po
复制链接

扫一扫

专栏目录