functional
实现高阶函数。
您可以从 algorithm 包导入这些 API。例如:
from algorithm import map
别名:
-
Static1DTileUnitFunc = fn[Int](Int, /) capturing -> None
: Signature of a 1d tiled function that performs some work with a static tile size and an offset. i.e. func<tile_size: Int> (offset: Int) -
Dynamic1DTileUnitFunc = fn(Int, Int, /) capturing -> None
: Signature of a 1d tiled function that performs some work with a dynamic tile size and an offset. i.e. func(offset: Int, tile_size: Int) -
BinaryTile1DTileUnitFunc = fn[Int](Int, Int, /) capturing -> None
: Signature of a tiled function that performs some work with a dynamic tile size and a secondary static tile size. -
Static2DTileUnitFunc = fn[Int, Int](Int, Int, /) capturing -> None
: Signature of a 2d tiled function that performs some work with a static tile size and an offset. i.e. func<tile_size_x: Int, tile_size_y: Int> (offset_x: Int, offset_y: Int) -
SwitchedFunction = fn[Bool]() capturing -> None
-
SwitchedFunction2 = fn[Bool, Bool]() capturing -> None
-
Static1DTileUnswitchUnitFunc = fn[Int, Bool](Int, Int, /) capturing -> None
: Signature of a tiled function that performs some work with a static tile size and an offset. i.e. func<tile_size: Int> (offset: Int) -
Static1DTileUnitFuncWithFlag = fn[Int, Bool](Int, /) capturing -> None
-
Dynamic1DTileUnswitchUnitFunc = fn[Bool](Int, Int, Int, /) capturing -> None
-
Static1DTileUnitFuncWithFlags = fn[Int, Bool, Bool](Int, /) capturing -> None
map
map[func: fn(Int, /) capturing -> None](size: Int)
将函数映射到从0到size的范围上。
Parameters:
- func (
fn(Int, /) capturing -> None
):要映射的函数。
Args:
- size (
Int
):元素的数量。
vectorize
vectorize[func: fn[Int](Int, /) capturing -> None, simd_width: Int, unroll_factor: Int](size: Int)
通过在 0 到 的范围内映射函数并在每一步size
递增 来简化 SIMD 优化循环simd_width
。其余部分size % simd_width
将在单独的迭代中运行。
下面的示例演示了如何通过使用计算机上的 SIMD 寄存器同时设置多个值来提高循环的性能:
from algorithm.functional import vectorize
# The amount of elements to loop through
alias size = 10
# How many Dtype.int32 elements fit into the SIMD register (4 on 128bit)
alias simd_width = simdwidthof[DType.int32]()
fn main():
var p = DTypePointer[DType.int32].alloc(size)
# @parameter allows the closure to capture the `p` pointer
@parameter
fn closure[simd_width: Int](i: Int):
print("storing", simd_width, "els at pos", i)
p.store[width=simd_width](i, i)
vectorize[closure, simd_width](size)
print(p.load[width=size]())
在 SIMD 寄存器大小为 128 的机器上,这将在每次迭代时设置 4xInt32 值。 10 % 4 的余数为 2,因此最后两个元素将在两次单独的迭代中设置:
storing 4 els at pos 0
storing 4 els at pos 4
storing 1 els at pos 8
storing 1 els at pos 9
[0, 0, 0, 0, 4, 4, 4, 4, 8, 9]
您还可以展开循环以潜在地提高性能,但代价是二进制大小:
vectorize[closure, width, unroll_factor=2](size)
在生成的程序集中,函数调用将被重复,从而导致算术、比较和条件跳转操作减少。程序集在伪代码中看起来像这样:
closure[4](0)
closure[4](4)
# Remainder loop won't unroll unless `size` is passed as a parameter
for i in range(8, 10):
closure[1](i)
closure[1](i)
size如果已知编译时间可以减少剩余部分的迭代,则可以将其作为参数传递。仅当余数是 2 的指数 (2, 4, 8, 16, …) 时才会发生这种情况。如果不是 2 的指数,余数循环仍会展开以提高性能。
Parameters:
- func (
fn[Int](Int, /) capturing -> None
):将在循环体中调用的函数。 - simd _ width (
Int
):SIMD向量宽度。 - unroll_factor (
Int
):主循环的展开因子(默认 1 )****。
Args:
- size (
Int
): 循环的上限。
vectorize[func: fn[Int](Int, /) capturing -> None, simd_width: Int, size: Int, unroll_factor: Int]()
通过在 0 到 的范围内映射函数并在每一步size
递增 来简化 SIMD 优化循环simd_width
。size % simd_width
如果它是 2 的指数,则其余部分将在单次迭代中运行。
下面的示例演示了如何通过使用计算机上的 SIMD 寄存器同时设置多个值来提高循环的性能:
from algorithm.functional import vectorize
# The amount of elements to