mojo实现高阶函数(algorithm)

最新推荐文章于 2024-09-13 14:59:21 发布

启航学途

最新推荐文章于 2024-09-13 14:59:21 发布

阅读量1.1k

点赞数 20

分类专栏： Mojo 文章标签： mojo 算法数据结构 mojo实现高阶函数 mojo algorithm mojo算法 mojo数据结构

本文链接：https://blog.csdn.net/j610152753/article/details/141143863

版权

functional

实现高阶函数。

您可以从 algorithm 包导入这些 API。例如：

from algorithm import map

别名:

Static1DTileUnitFunc = fn[Int](Int, /) capturing -> None: Signature of a 1d tiled function that performs some work with a static tile size and an offset. i.e. func<tile_size: Int> (offset: Int)
Dynamic1DTileUnitFunc = fn(Int, Int, /) capturing -> None: Signature of a 1d tiled function that performs some work with a dynamic tile size and an offset. i.e. func(offset: Int, tile_size: Int)
BinaryTile1DTileUnitFunc = fn[Int](Int, Int, /) capturing -> None: Signature of a tiled function that performs some work with a dynamic tile size and a secondary static tile size.
Static2DTileUnitFunc = fn[Int, Int](Int, Int, /) capturing -> None: Signature of a 2d tiled function that performs some work with a static tile size and an offset. i.e. func<tile_size_x: Int, tile_size_y: Int> (offset_x: Int, offset_y: Int)
SwitchedFunction = fn[Bool]() capturing -> None
SwitchedFunction2 = fn[Bool, Bool]() capturing -> None
Static1DTileUnswitchUnitFunc = fn[Int, Bool](Int, Int, /) capturing -> None: Signature of a tiled function that performs some work with a static tile size and an offset. i.e. func<tile_size: Int> (offset: Int)
Static1DTileUnitFuncWithFlag = fn[Int, Bool](Int, /) capturing -> None
Dynamic1DTileUnswitchUnitFunc = fn[Bool](Int, Int, Int, /) capturing -> None
Static1DTileUnitFuncWithFlags = fn[Int, Bool, Bool](Int, /) capturing -> None

`map`

map[func: fn(Int, /) capturing -> None](size: Int)

将函数映射到从0到size的范围上。

Parameters:

func (fn(Int, /) capturing -> None)：要映射的函数。

Args:

size (Int)：元素的数量。

`vectorize`

vectorize[func: fn[Int](Int, /) capturing -> None, simd_width: Int, unroll_factor: Int](size: Int)

通过在 0 到的范围内映射函数并在每一步size递增来简化 SIMD 优化循环simd_width。其余部分size % simd_width将在单独的迭代中运行。

下面的示例演示了如何通过使用计算机上的 SIMD 寄存器同时设置多个值来提高循环的性能：

from algorithm.functional import vectorize

# The amount of elements to loop through
alias size = 10
# How many Dtype.int32 elements fit into the SIMD register (4 on 128bit)
alias simd_width = simdwidthof[DType.int32]()

fn main():
    var p = DTypePointer[DType.int32].alloc(size)

    # @parameter allows the closure to capture the `p` pointer
    @parameter
    fn closure[simd_width: Int](i: Int):
        print("storing", simd_width, "els at pos", i)
        p.store[width=simd_width](i, i)

    vectorize[closure, simd_width](size)
    print(p.load[width=size]())

在 SIMD 寄存器大小为 128 的机器上，这将在每次迭代时设置 4xInt32 值。 10 % 4 的余数为 2，因此最后两个元素将在两次单独的迭代中设置：

storing 4 els at pos 0
storing 4 els at pos 4
storing 1 els at pos 8
storing 1 els at pos 9
[0, 0, 0, 0, 4, 4, 4, 4, 8, 9]

您还可以展开循环以潜在地提高性能，但代价是二进制大小：

vectorize[closure, width, unroll_factor=2](size)

在生成的程序集中，函数调用将被重复，从而导致算术、比较和条件跳转操作减少。程序集在伪代码中看起来像这样：

closure[4](0)
closure[4](4)
# Remainder loop won't unroll unless `size` is passed as a parameter
for i in range(8, 10):
    closure[1](i)
    closure[1](i)

size如果已知编译时间可以减少剩余部分的迭代，则可以将其作为参数传递。仅当余数是 2 的指数 (2, 4, 8, 16, …) 时才会发生这种情况。如果不是 2 的指数，余数循环仍会展开以提高性能。

Parameters:

func ( fn[Int](Int, /) capturing -> None)：将在循环体中调用的函数。
simd _ width ( Int)：SIMD向量宽度。
unroll_factor ( Int)：主循环的展开因子（默认 1 ）****。

Args:

size ( Int): 循环的上限。

vectorize[func: fn[Int](Int, /) capturing -> None, simd_width: Int, size: Int, unroll_factor: Int]()

通过在 0 到的范围内映射函数并在每一步size递增来简化 SIMD 优化循环simd_width。size % simd_width如果它是 2 的指数，则其余部分将在单次迭代中运行。

下面的示例演示了如何通过使用计算机上的 SIMD 寄存器同时设置多个值来提高循环的性能：

from algorithm.functional import vectorize

# The amount of elements to

最低0.47元/天解锁文章

启航学途

关注

20
点赞
踩
22

收藏

觉得还不错? 一键收藏
打赏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录