8. WebGPU 是如何工作的

让我们通过JavaScript 来模拟WebGPU的实现, 来尝试解释 GPU如何使用顶点着色器和片段着色器 。这样会对对真正发生的事情有一个直观的感受。

如果你熟悉 Array.map,稍微认真思考一下,就可以了解这两种不同类型的着色器函数是如何工作的。使用 Array.map ,可以提供一个 shader 函数来转换一个值。

例子:

const shader = v => v * 2;  // double the input
const input = [1, 2, 3, 4];
const output = input.map(shader);   // result [2, 4, 6, 8]

传入 array.map 的参数 “shader”,只是一个 箭头函数,它给定一个数字,并将这个数字乘以2。这可能是 JavaScript 中与“着色器(shader )”含义最接近的类比。它是一个返回或生成值的函数。不用直接调用,但是需要指定它,然后系统会自动回调用它。

对于 GPU 顶点着色器,您不 map 输入数组(即 input 数组))。相反,只需指定希望函数被调用的次数(count)。

function draw(count/*调用的次数*/, vertexShaderFn /*着色器函数*/) {
  const internalBuffer = [];
  for (let i = 0; i < count; ++i) {
    //调用着色器函数,并把当前调用次数作为参数传入
    internalBuffer[i] = vertexShaderFn(i);
  }
  console.log(JSON.stringify(internalBuffer));
}

此处 不再需要 const input 。

const shader = v => v * 2;
// const input = [1, 2, 3, 4]; //不再需要源数组(即 input  数组)
const count = 4;
draw(count, shader); //此处传入了 shader  作为着色器函数
// outputs [0, 2, 4, 6]

使 GPU 工作变得复杂的是,这些函数在计算机中的 GPU 上运行。这意味着创建和引用的所有数据都必须以某种方式发送到 GPU,然后需要与其进行通信并告诉着色器数据放置在哪里以及如何访问它。

顶点和片段着色器可以通过 6 种方式获取数据:Uniforms、属性(Attributes)、缓冲区(Buffers)、纹理(Textures)、变量(Varyings)、常量(Constants)。

1. Uniforms

Uniforms are values that are the same for each iteration of the shader. Think of them as constant global variables. You can set them before a shader is run but, while the shader is being used, they remain constant, or to put it another way, they remain uniform.

Uniforms 是 着色器的每次迭代中都相同的值。可以它们视为全局变量。可以在运行着色器之前设置它们,但在使用着色器时,它们会保持不变,或者换句话说,它们会保持一致。

让我们更改 draw 函数 以将uniforms 传递给着色器。为此,我们将创建一个名为 bindings 的数组,并用它来传递uniforms 。

function draw(count, vertexShaderFn, bindings) {
  const internalBuffer = [];
  for (let i = 0; i < count; ++i) {
    internalBuffer[i] = vertexShaderFn(i, bindings); //here
  }
  console.log(JSON.stringify(internalBuffer));
}

然后改变 **着色器函数 vertexShader ** 来使用uniforms

//着色器函数
const vertexShader = (v, bindings) => {
  const uniforms = bindings[0];//此处使用了 uniforms 
  return v * uniforms.multiplier;
};
const count = 4;
const uniforms1 = {multiplier: 3};
const uniforms2 = {multiplier: 5};
const bindings1 = [uniforms1];
const bindings2 = [uniforms2];
draw(count, vertexShader, bindings1);
// outputs [0, 3, 6, 9]
draw(count, vertexShader, bindings2);
// outputs [0, 5, 10, 15]

So, the concept of uniforms hopefully seems pretty straight forward. The indirection through bindings is there because this “similar” to how things are done in WebGPU. Like was mentioned above, we access the things, in this case the uniforms, by location/index. Here they are found in bindings[0].

因此,希望 uniforms 的概念看起来非常简单。通过 bindings 的间接寻址是因为这与 WebGPU 中的工作方式“相似”。就像上面提到的,我们通过位置/索引访问事物,在本例中是uniforms 。在 bindings[0] 中可以找到它们。

2. Attributes (仅限顶点着色器)

Attributes provide per shader iteration data. In Array.map above, the value v was pulled from input and automatically provided to the function. This is very similar to an attribute in a shader.

属性提供每个着色器迭代数据。在上面的 Array.map 中,值 v 是从 input 数组中提取并自动提供给函数的。这与着色器中的属性非常相似。

不同的是,我们没有映射输入,相反,因为我们只是在计数,需要告诉 WebGPU 这些输入以及如何从中获取数据。

想象一下我们像这样更新 draw

function draw(count, vertexShaderFn, bindings, attribsSpec) {
  const internalBuffer = [];
  for (let i = 0; i < count; ++i) {
    //在每次迭代时 通过计数索引 i 获取Attributes 数据
    const attribs = getAttribs(attribsSpec, i); //here 
    internalBuffer[i] = vertexShaderFn(i, bindings, attribs); //here
  }
  console.log(JSON.stringify(internalBuffer));
}
 
function getAttribs(attribs, ndx) {
  return attribs.map(({source, offset, stride}) => source[ndx * stride + offset]);
}

然后可以这样调用它

const buffer1 = [0, 1, 2, 3, 4, 5, 6, 7];
const buffer2 = [11, 22, 33, 44];
const attribsSpec = [
  { source: buffer1, offset: 0, stride: 2, },
  { source: buffer1, offset: 1, stride: 2, },
  { source: buffer2, offset: 0, stride: 1, },
];
const vertexShader = (v, bindings, attribs) => (attribs[0] + attribs[1]) * attribs[2];
const bindings = [];
const count = 4;
draw(count, vertexShader, bindings, attribsSpec);
// outputs [11, 110, 297, 572]

正如在上面看到的, getAttribs 使用 offset 和 stride 将索引计算到相应的 source 缓冲区中并提取值。然后将提取的值发送到着色器。在每次迭代中, attribs 都会不同

 iteration |  attribs
 ----------+-------------
     0     | [0, 1, 11]
     1     | [2, 3, 22]
     2     | [4, 5, 33]
     3     | [6, 7, 44]

3. Raw Buffers 原始缓冲区

Buffers are effectively arrays, again for our analogy let’s make version of draw that uses buffers. We’ll pass these buffers via bindings like we did with uniforms.

缓冲区只是有效的数组(uniforms可以是任意数据),让我们用类比 制作使用Raw Buffers的 draw 版本。我们将像处理uniforms一样通过 bindings 传递这些缓冲区。

const buffer1 = [0, 1, 2, 3, 4, 5, 6, 7];
const buffer2 = [11, 22, 33, 44];
const attribsSpec = [];
const bindings = [
  buffer1,
  buffer2,
];
const vertexShader = (ndx, bindings, attribs) => 
    (bindings[0][ndx * 2] + bindings[0][ndx * 2 + 1]) * bindings[1][ndx];
const count = 4;
draw(count, vertexShader, bindings, attribsSpec);
// outputs [11, 110, 297, 572]

Here we got the same result as we did with attributes except this time, instead of the system pulling the values out of the buffers for us, we calculated our own indices into the bound buffers. This is more flexible than attributes since we basically have random access to the arrays. But, it’s potentially slower for that same reason. Given the way attributes worked the GPU knows the values will be accessed in order which it can use to optimize. For example, in order access is usually cache friendly. When we calculate our own indices the GPU has no idea which part of a buffer we’re going to access until we actually try to access it.

在这里得到了与 对Attributes 所做的相同的结果,但在这一次中,不是系统从缓冲区中提取值,而是计算了使用绑定缓冲区中的索引。这比属性更灵活,因为基本上可以随机访问数组。但是,出于同样的原因,它可能会更慢。考虑到Attributes 的工作方式,GPU 知道将按照可用于优化的顺序访问这些值。例如,顺序访问通常是缓存友好的。当计算自己的索引时,GPU 不知道要访问缓冲区的哪一部分,直到真正尝试访问它。

4. Textures 纹理

Textures are 1d, 2d, or 3d arrays of data. Of course, we could implement our own 2d or 3d arrays using buffers. What’s special about textures is they can be sampled. Sampling means that we can ask the GPU to compute a value between the values we supply. We’ll cover that this means in the article on textures. For now, let’s make a JavaScript analogy again.

纹理是 1d、2d 或 3d 数据数组。当然,我们可以使用缓冲区实现我们自己的 2d 或 3d 数组。纹理的特别之处在于它们可以被采样。采样意味着可以要求 GPU 计算我们提供的值之间的值。我们将在有关纹理的文章中介绍这意味着什么。现在,让我们再次打个 JavaScript 类比。

首先,我们将创建一个函数 textureSample ,它对值之间的数组进行采样

function textureSample(texture, ndx) {
  const startNdx = ndx | 0;  // round down to an int
  const fraction = ndx % 1;  // get the fractional part between indices
  const start = texture[startNdx];
  const end = texture[startNdx + 1];
  return start + (end - start) * fraction;  // compute value between start and end
}

GPU 上已经存在类似的功能。

现在让我们在着色器中使用它。

const texture = [10, 20, 30, 40, 50, 60, 70, 80];
const attribsSpec = [];
const bindings = [
  texture,
];
const vertexShader = (ndx, bindings, attribs) =>
    textureSample(bindings[0], ndx * 1.75);  //here
const count = 4;
draw(count, vertexShader, bindings, attribsSpec);
// outputs [10, 27.5, 45, 62.5]

当 ndx 为 3 时,我们会将 3 * 1.75 或 5.25 传递给 textureSample 。这将计算 5 的 startNdx 。所以我们将提取索引 5 和 6 ,它们是 60 和 70 。 fraction 变为 0.25 ,所以我们将得到 60 + (70 - 60) * 0.25 即 62.5 。

查看上面的代码,我们可以在着色器函数中自己编写 textureSample 。我们可以手动提取 2 个值并在它们之间进行插值。 GPU 具有此特殊功能的原因是它可以更快地执行此操作,并且根据设置,它可能会读取多达 16 个 4 浮点值来为我们生成一个 4 浮点值。手动完成这将是很多工作。

5. Varyings (仅限片段着色器)

Varyings are outputs from a vertex shader to a fragment shader. As was mentioned above, a vertex shader outputs positions that are used to draw/rasterize points, lines, and triangles.

Varyings 是从顶点着色器到片段着色器的输出。如上所述,顶点着色器输出位置 用于绘制/栅格化点、线和三角形。

假设我们正在画一条线。假设顶点着色器运行了两次,第一次输出相当于 (5,0) ,第二次输出相当于 (25,4) 。基于这 2 个点,GPU 将绘制一条从 (5,0) 到 (25,4) 的线。为此,它将调用片段着色器 20 次,对那条线上的每个像素调用一次。每次它调用片段着色器时,都由我们决定返回什么颜色

假设有一对函数可以帮助我们在两点之间画一条线。第一个函数计算我们需要绘制多少像素和一些值来帮助绘制它们。第二个获取该信息加上一个像素编号并为我们提供一个像素位置。例子:

const line = calcLine([10, 10], [13, 13]);
for (let i = 0; i < line.numPixels; ++i) {
  const p = calcLinePoint(line, i);
  console.log(p);
}
// prints
// 10,10
// 11,11
// 12,12

所以,改变顶点着色器,让它每次迭代输出 2 个值。可以通过多种方式做到这一点。这是一个实现。

const buffer1 = [5, 0, 25, 4];
const attribsSpec = [
  {source: buffer1, offset: 0, stride: 2},
  {source: buffer1, offset: 1, stride: 2},
];
const bindings = [];
const dest = new Array(2);
const vertexShader = (ndx, bindings, attribs) => [attribs[0], attribs[1]];
const count = 2;
draw(count, vertexShader, bindings, attribsSpec);
// outputs [[5, 0], [25, 4]]

现在让我们编写一些代码,一次循环遍历点 2 并调用 mapLine 来栅格化一条线。

function rasterizeLines(dest, destWidth, inputs, fragShaderFn, bindings) {
  for (let ndx = 0; ndx < inputs.length - 1; ndx += 2) {
    const p0 = inputs[ndx    ];
    const p1 = inputs[ndx + 1];
    const line = calcLine(p0, p1);
    for (let i = 0; i < line.numPixels; ++i) {
      const p = calcLinePoint(line, i);
      const offset = p[1] * destWidth + p[0];  // y * width + x
      dest[offset] = fragShaderFn(bindings);
    }
  }
}

我们可以像这样更新绘图以使用该代码

//function draw(count, vertexShaderFn, bindings, attribsSpec) {
function draw(dest, destWidth,
              count, vertexShaderFn, fragmentShaderFn,
              bindings, attribsSpec,
) {
  const internalBuffer = [];
  for (let i = 0; i < count; ++i) {
    const attribs = getAttribs(attribsSpec, i);
    internalBuffer[i] = vertexShaderFn(i, bindings, attribs);
  }
  //console.log(JSON.stringify(internalBuffer));
  rasterizeLines(dest, destWidth, internalBuffer,
                 fragmentShaderFn, bindings);
}

现在实际上是在使用 internalBuffer 😃!

让我们更新调用 draw 的代码。

const buffer1 = [5, 0, 25, 4];
const attribsSpec = [
  {source: buffer1, offset: 0, stride: 2},
  {source: buffer1, offset: 1, stride: 2},
];
const bindings = [];
const vertexShader = (ndx, bindings, attribs) => [attribs[0], attribs[1]];
const count = 2;
// draw(count, vertexShader, bindings, attribsSpec);
 
const width = 30;
const height = 5;
const pixels = new Array(width * height).fill(0);
const fragShader = (bindings) => 6;
 
draw(
   pixels, width,
   count, vertexShader, fragShader,
   bindings, attribsSpec);

If we print pixels as a rectangle where 0 becomes . we’d get this
如果我们将 pixels 打印为一个矩形,其中 0 变为 . 我们会得到这个

.....666......................
........66666.................
.............66666............
..................66666.......
.......................66.....

Unfortunately, our fragment shader gets no input that changes each iteration so there is no way to output anything different for each pixel. This is where varyings come in. Let’s change our first shader to output an extra value.

不幸的是,片段着色器没有得到每次迭代都会改变的输入,所以没有办法为每个像素输出任何不同的东西。这就是 varyings 的用武之地。让我们更改第一个着色器以输出一个额外的值。

const buffer1 = [5, 0, 25, 4];
const buffer2 = [9, 3];
const attribsSpec = [
  {source: buffer1, offset: 0, stride: 2},
  {source: buffer1, offset: 1, stride: 2},
  {source: buffer2, offset: 0, stride: 1},
];
const bindings = [];
const dest = new Array(2);
const vertexShader = (ndx, bindings, attribs) => 
    // [attribs[0], attribs[1]];
    [[attribs[0], attribs[1]], [attribs[2]]];
 
...

If we changed nothing else, after the loop inside draw, internalBuffer would have these values

如果我们不做任何更改,在 draw 内的循环之后, internalBuffer 将具有这些值

 [ 
   [[ 5, 0], [9]],
   [[25, 4], [3]],
 ]

我们可以很容易地计算出一个从 0.0 到 1.0 的值,代表我们沿着这条线有多远。我们可以使用它来插入我们刚刚添加的额外值。

function rasterizeLines(dest, destWidth, inputs, fragShaderFn, bindings) {
  for(let ndx = 0; ndx < inputs.length - 1; ndx += 2) {
    //const p0 = inputs[ndx    ];
    //const p1 = inputs[ndx + 1];
    const p0 = inputs[ndx    ][0];
    const p1 = inputs[ndx + 1][0];
    const v0 = inputs[ndx    ].slice(1);  // everything but the first value
    const v1 = inputs[ndx + 1].slice(1);
    const line = calcLine(p0, p1);
    for (let i = 0; i < line.numPixels; ++i) {
      const p = calcLinePoint(line, i);
      const t = i / line.numPixels;
      const varyings = interpolateArrays(v0, v1, t);
      const offset = p[1] * destWidth + p[0];  // y * width + x
      //dest[offset] = fragShaderFn(bindings);
      dest[offset] = fragShaderFn(bindings, varyings);
    }
  }
}
 
// interpolateArrays([[1,2]], [[3,4]], 0.25) => [[1.5, 2.5]]
function interpolateArrays(v0, v1, t) {
  return v0.map((array0, ndx) => {
    const array1 = v1[ndx];
    return interpolateValues(array0, array1, t);
  });
}
 
// interpolateValues([1,2], [3,4], 0.25) => [1.5, 2.5]
function interpolateValues(array0, array1, t) {
  return array0.map((a, ndx) => {
    const b = array1[ndx];
    return a + (b - a) * t;
  });
}

现在我们可以在片段着色器中使用这些变量

// const fragShader = (bindings) => 6;
const fragShader = (bindings, varyings) => varyings[0] | 0; // convert to int

如果我们现在运行它,我们会看到这样的结果

.....988......................
........87776.................
.............66655............
..................54443.......
.......................33.....

The first iteration of the vertex shader output [[5,0], [9] and the 2nd iteration output [25,4], [3] and you can see, as the fragment shader was called, the 2nd value of each of those varied (was interpolated) between the 2 values.

顶点着色器的第一次迭代输出 [[5,0], [9] ,第二次迭代输出 [25,4], [3] ,您可以看到,在调用片段着色器时,每一个的第二个值在这两个值之间变化(被插值)。

We could make another function mapTriangle that given 3 points rasterized a triangle calling the fragment shader function for each point inside the triangle. It would interpolate the varyings from 3 points instead of 2.

我们可以创建另一个函数 mapTriangle ,给定 3 个点光栅化一个三角形,为三角形内的每个点调用片段着色器函数。它将从 3 个点而不是 2 个点插入变量。

Here are all the examples above running live in case you find it useful to play around with them to understand them.

以下是上面所有实时运行的示例,以防您发现使用它们来理解它们很有用。

======= [ run by count ] =======
[0,2,4,6]

======= [ uniforms ] =======
[0,3,6,9]
[0,5,10,15]

======= [ attributes ] =======
[11,110,297,572]

======= [ buffers ] =======
[11,110,297,572]

======= [ textures ] =======
[10,27.5,45,62.5]

======= [ varyings: calcLine ] =======
10,10
11,11
12,12

======= [ varyings: output 2 values per iteration ] =======
[[5,0],[25,4]]

======= [ varyings: draw line with constant value ] =======
.....666......................
........66666.................
.............66666............
..................66666.......
.......................66.....

======= [ varyings: draw line with varying value ] =======
.....988......................
........87776.................
.............66655............
..................54443.......
.......................33.....

上面 JavaScript 中发生的事情只是一个类比。实际上如何插值 varyings、如何绘制线、如何访问缓冲区、如何采样纹理、uniforms、指定属性等细节在 WebGPU 中是不同的,但概念非常相似所以我希望这个 JavaScript 类比 提供了一些帮助获得正在发生的事情的大概了解。

Why is it this way? Well, if you look at draw and rasterizeLines you might notice that each iteration is entirely independent of the other iterations. Another way to say this, you could process each iteration in any order. Instead of 0, 1, 2, 3, 4 you could process them 3, 1, 4, 0, 2 and you’d get the exact same result. The fact that they are independent means each iteration can be run in parallel by a different processor. Modern 2021 top end GPUs have ~10000 processors. That means up to 10000 things can be run in parallel. That is where the power of using the GPU comes from. By following these patterns the system can massively parallelize the work.

为什么会这样?好吧,如果您查看 draw 和 rasterizeLines ,您可能会注意到每次迭代都完全独立于其他迭代。换句话说,您可以按任何顺序处理每次迭代。你可以处理 3、1、4、0、2 而不是 0、1、2、3、4,你会得到完全相同的结果。它们是独立这一事实意味着每次迭代都可以由不同的处理器并行运行。现代 2021 年高端 GPU 拥有约 10000 个处理器。这意味着最多可以并行运行 10000 个东西。这就是使用 GPU 的强大之处。通过遵循这些模式,系统可以大规模并行化工作。

最大的限制是:

  1. 着色器函数只能引用其输入(属性、缓冲区、纹理、uniforms、变量)。
  2. 着色器不能分配内存。
  3. A shader has to careful if it references its destination, the thing
    it’s generating values for. 如果着色器引用了它的目的地,即它正在为其生成值的东西,则必须小心。

When you think about it this makes sense. Imagine fragShader above tried to reference dest directly. That would mean when trying to parallelize things it would be impossible to coordinate. Which iteration would go first? If the 3rd iteration referenced dest[0] then the 0th iteration would need to run first but if the 0th iteration referenced dest[3] then the 3rd iteration would need to run first.

当您考虑它时,这是有道理的。想象一下上面的 fragShader 试图直接引用 dest 。这意味着当试图将事物并行化时,它是不可能协调的。哪个迭代会先进行?如果第 3 次迭代引用 dest[0] ,则第 0 次迭代需要先运行,但如果第 0 次迭代引用 dest[3] ,则第 3 次迭代需要先运行。

Designing around this limitation also happens with CPUs and multiple thread or processes but in GPU land, with up to 10000 processors running at once, it takes special coordination. We’ll try to cover some of the techniques in other articles.

围绕此限制进行设计也适用于 CPU 和多线程或进程,但在 GPU 领域,同时运行多达 10000 个处理器,需要特殊协调。我们将尝试在其他文章中介绍一些技术。

原文地址

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值