固定功能的图形流水线 (GeForce 3 时代 , ATI 9700)
早期的API如DirectX和OpenGL 他们都属于固定功能的图形流水线(GLSL)时代的产物。
图元组合和光栅化(Primitive Assembly and Rasterization)
片断纹理化和色彩化(Fragment Texturing and Coloring)
光栅操作(Raster Operations)
可编程实时图形流水线 (简称GeForce 8800 时代, ATI 2900XT)
现在的显卡允许程序员自己编程实现上述流水线中的两个阶段:
·顶点shader实现顶点变换阶段的功能
·片断shader替代片断纹理化和色彩化的功能
顶点处理器
顶点处理器用来运行顶点shader(着色程序)。顶点shader的输入是顶点数据,即位置、颜色、法线等
片断处理器
片断处理器可以运行片断shader,这个单元可以进行如下操作:
·逐像素计算颜色和纹理坐标
·应用纹理
·雾化计算
·如果需要逐像素光照,可以用来计算法线
统一的处理器阵列,三次访问三次计算。
GeForce 8800 GTX Architecture
- 16 subunits each with 8 stream processors.
- The 8 stream procesors execute a single instruction sequence (SIMD) on different data.
- Each 8 processors have their own data and instruction caches.
- Each stream processor can do a multiply and addition (2 operations) in a single clock cycle.

AMD Radeon HD 2900 XT Architecture
- 320 stream processing units arranged in 4 arrays of 80 units each.
- Each block of 80 is divided into 16 groups having 4 arithmetic logic units (ALUs) and one branching unit.
- Groups can operate on 4 component vectors (RGBA color, XYZW projective position, pqrs texture coordinates) simultaneously.
- More recent Radeon HD 4870 has 800 stream processors.

Above figures from:
GPU Computing: Graphics Processing Units -- powerful, programmable, and highly parallel -- are increasingly targeting general-purpose computing applications.
John D. Owens , Mike Houston , David Luebke , Simon Green, John E . Stone , and James C. Phillips
Proceedings of the IEEE, Vol. 96, No. 5, May 2008 p 879-899
这个阶段虽然也有了一些进步,但是在计算过程中需要多次传递数据(转换+传递),无疑是个很麻烦的事情。
GPU 通用计算 (G80-GT200, GeForce 8800GTX)
Tesla GPU 体系结构
可扩展的GPU (GeForce 6800)
目前2015/1/13日,Nvidia最NB的产品型号是 TESLA X1