小MCU的福音|2K内存单片机就能跑的嵌入式AI模型

嵌入式Linux,

于 2025-03-05 08:04:33 发布

阅读量203

点赞数

文章标签：单片机人工智能嵌入式硬件

原文链接：https://mp.weixin.qq.com/s?__biz=MzA5NTM3MjIxMw==&mid=2247515695&idx=1&sn=e35ac1ea5196e550a7e5b6a45c69d643&chksm=9124369e692fc9896d7dcfe6189cd0dd999c10b915b02ccec844fd1f1523074f2cf04c18cc4c&scene=126&sessionid=0

版权

作者 | strongerHuang

微信公众号 | strongerHuang

这两年随着ChatGPT、DeepSeek的火爆，AI已经遍布工作和生活的各个角落，嵌入式端侧AI也逐渐发展起来了。

今天就来分享一个可用于2KB内存单片机的嵌入式AI模型：uTensor。

关于 uTensor 模型

uTensor 是一个基于 Tensorflow 构建的极其轻量级的机器学习推理框架，并针对 Arm 处理器进行了优化。它由一个运行时库和一个处理大部分模型转换工作的离线工具组成。

模型地址：

https://github.com/uTensor/uTensor

此存储库包含核心运行时和运算符、内存管理器、调度器等的一些示例实现，核心运行时的大小仅为：2KB！

uTensor只需要2KB内存的轻量化设计特点，就是实现了极致压缩：将TensorFlow模型转换为.cpp、.hpp源代码，消除冗余依赖。同时，预分配内存区域，杜绝运行时内存的泄漏。

实测核心运行时和基础算子的总代码量仅2KB，相当于一张图片的1/1000.

uTensor 工作原理

uTensor 工作原理大致如下图所示：

在 Tensorflow 中构建和训练模型，uTensor 获取模型并生成 .cpp 和 .hpp 源文件。这些文件包含生成的推理所需的 C++代码，只需要把生成的源文件复制到你的嵌入式项目中即可，实现过程非常简单。

uTensor 运行时由两个主要组件组成：

uTensor Core：其中包含满足 uTensor 性能运行时契约所需的基本数据结构、接口和类型等。
uTensor 库：作为一系列基于 uTensor Core 构建的默认实现。

构建系统分别编译这两个组件，使用户能够轻松扩展和覆盖构建在 uTensor 核心之上的实现，例如自定义内存管理器、张量、运算符和错误处理程序。

错误处理程序：

SimpleErrorHandler errH(50); // Maintain a history of 50 events
Context::get_default_context()->set_ErrorHandler(&errH);
...
// A bunch of allocations
...


// Check to make sure a rebalance has occurred inside our allocator
bool has_rebalanced = std::find(errH.begin(), errH.end(), localCircularArenaAllocatorRebalancingEvent()) != errH.end();

Tensor 读写接口：

uint8_t myBuffer[4] = { 0xde, 0xad, 0xbe, 0xef };
Tensor mTensor = new BufferTensor({2,2}, u8, myBuffer); // define a 2x2 tensor of uint8_ts


uint8_t a1 = mTensor(0,0);  // implicitly casts the memory referenced at this index to a uint8_t
printf("0x%hhx\n", a1);     // prints 0xde


uint16_t a2 = mTensor(0,0); // implicitly casts the memory referenced at this index to a uint16_t
printf("0x%hx\n", a2);      // prints 0xdead


uint32_t a3 = mTensor(0,0); // implicitly casts the memory referenced at this index to a uint32_t
printf("0x%x\n", a3);      // prints 0xdeadbeef


// You can also write and read values with explicit casting and get similar behavior
mTensor(0,0) = static_cast<uint8_t>(0xFF);
printf("0xhhx\n", static_cast<uint8_t>(mTensor(0,0)));

出于性能原因，各种 Tensor 读/写接口更像缓冲区，而不是成熟的 C++ 类型化对象，尽管高级接口本质上看起来非常 Pythonic 。实际的读取和写入取决于用户如何转换此缓冲区。

uTensor 构建、运行和测试

官方给出了 uTensor 构建、运行和测试的一些方法。

比如在本地构建和测试：

git clone git@github.com:uTensor/uTensor.git
cd uTensor/
git checkout proposal/rearch
git submodule init
git submodule update
mkdir build
cd build/
cmake -DPACKAGE_TESTS=ON -DCMAKE_BUILD_TYPE=Debug ..
make
make test

在 Arm Mbed OS 上构建和运行：

mbed new my_project
cd my_project
mbed import https://github.com/uTensor/uTensor.git
# Create main file
# Run uTensor-cli workflow and copy model directory here
mbed compile # as normal

还有在在Arm 系统上构建和运行：

mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=Debug -DCMAKE_TOOLCHAIN_FILE=../extern/CMSIS_5/CMSIS/DSP/gcc.cmake  ..


//使用 CMSIS 优化内核
mkdir build && cd build
cmake -DARM_PROJECT=1 -DCMAKE_BUILD_TYPE=Debug -DCMAKE_TOOLCHAIN_FILE=../extern/CMSIS_5/CMSIS/DSP/gcc.cmake  ..

以上只是提供了一些参考和思路，实现的具体细节，需要大家进一步结合 uTensor 模型进行优化。

------------ END ------------

点击“阅读原文”查看更多分享。