SSE基础

http://felix.abecassis.me/2011/09/cpp-getting-started-with-sse/

大概内容如下:(格式很乱)

In this article I will present how to use SSE instructions in C++ (or C).
My goal is not to show how to write the fastest possible program using SSE but rather to introduce to its usage.

What is SSE ?

SSE stands for Streaming SIMD Extensions. It is a set of CPU instructions dedicated to applications like signal processing, scientific computation or 3D graphics.

SIMD is an acronym itself: Single Instruction, Multiple Data. A CPU instruction is said to be SIMD when the same operation is applied on multiple data at the same time.

SSE was first introduced in the Pentium III in 1999. Over the time, this instruction set was improved by adding more sophisticated operations. Eight 128-bits registers were added to the CPU: xmm0 through xmm7.
这里写图片描述
Initially those registers could only be used for single-precision computations (i.e. for the type float). But since SSE2, those registers can be used for any primitive data type.
Given a standard 32-bit machine we can therefore store and process in parallel:
- 2 double
- 2 long
- 4 float
- 4 int
- 8 short
- 16 char
Note that integer types can be signed or unsigned, but you will have to use (sometimes) different instructions.

For instance if you have to compute the sum of two int arrays, using SSE you can compute 4 additions in a single assembly instruction.

Simple program

It is not easy getting started with SSE, fortunately the MSDN documentation is complete and well-written.
If you take a look at the list of arithmetic operations you can notice there is always a corresponding assembly instruction. Some operations on the other side (such as set operations) are composite operations.
Using SSE in C++ is indeed a very low-level construct: we will manipulate directly the 128-bits registers through variables of type __m128 for float, __m128d for double and __m128i for int, short, char.

But the good news is that you don’t need to declare arrays of type __m128 in order to use SSE: for instance if you want to calculate the square root of a large array of float, you can simply cast your pointer to the type __m128* and then use SSE operations.

There is a tweak however, most SSE operations requires the data to be 16-bytes aligned, here we will use another variable attributes of gcc.
We use the align attribute:

aligned (alignment)
This attribute specifies a minimum alignment for the variable or structure field, measured in bytes.

Here is a simple code on how to use SSE in order to compute the square root of 4 float in a single operation using the _mm_sqrt_ps function.

float a[] attribute ((aligned (16))) = { 41982., 81.5091, 3.14, 42.666 };
__m128* ptr = (__m128*)a;
__m128 t = _mm_sqrt_ps(*ptr);
The corresponding assembly instruction is SQRTPS, if we generate the assembly code of our program (using gcc -S) we can notice that this instruction is indeed used on a SSE register:

sqrtps %xmm0, %xmm0
Don’t forget the include !

#include

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值