开始使用Microsoft Phi-3-mini——尝试使用ONNX运行时在iPhone上运行Phi-3-mini

C. 使用适用于iOS的ONNX运行时编译生成式AI

D. 在 Xcode 中创建 App 应用程序

E. 将ONNX量化的INT4模型复制到App应用程序项目

F. 在ViewControllers中添加C++ API

总结

资源

Microsoft、Google和Apple都发布了SLM（Microsoft phi3-mini、Google Gemma和Apple OpenELM），适用于不同时期的边缘设备。开发人员在Nvidia Jetson Orin、Raspberry Pi和AI PC上离线部署SLM。这为生成式AI提供了更多的应用场景。我们从上一篇文章中学习了几种部署应用程序的方法，那么我们如何将SLM应用程序部署到移动设备呢？

本文是基于iPhone的初步探索。我们知道，Microsoft phi3-mini已经在Hugging Face上发布了三种格式，其中gguf和onnx是量化模型。我们可以根据不同的硬件条件部phi3-mini的量化模型。因此，让我们开始探索基于Phi-3-mini onnx格式的定量模型。如果您想使用GGUF格式，建议使用 LLM Farm应用程序。

使用ONNX运行时的生成式AI

在AI时代，AI模型的便携性非常重要。ONNX Runtime可以轻松地将经过训练的模型部署到不同的设备。开发者无需关注推理框架，使用统一的API完成模型推理。在生成式AI时代，ONNX Runtime也进行了代码优化（https://onnxruntime.ai/docs/genai/）。通过优化的ONNX Runtime，可以在不同的终端上推断出量化的生成式AI模型。在使用ONNX Runtime的生成式AI中，您可以通过Python、C#、C/C++推断AI模型API。当然，在iPhone上部署可以利用C++的生成式AI和ONNX运行时API。

步骤

A. 准备工作

macOS 14+
Xcode 15+系列
iOS SDK 17.x开发工具包
安装Python 3.10+（建议使用Conda）
安装Python库——python-flatbuffers
安装CMake

B.编译适用于iOS的ONNX运行时

git clone https://github.com/microsoft/onnxruntime.git

cd onnxruntime

./build.sh --build_shared_lib --ios --skip_tests --parallel --build_dir ./build_ios --ios --apple_sysroot iphoneos --osx_arch arm64 --apple_deploy_target 17.4 --cmake_generator Xcode --config Release

通知

1、在编译之前，您必须确保Xcode配置正确，并在终端上进行设置。

sudo xcode-select -switch /Applications/Xcode.app/Contents/Developer

2、ONNX Runtime需要根据不同的平台进行编译。对于iOS，你可以基于arm64/x86_64进行编译

3、建议直接使用最新的iOS SDK进行编译。当然，您也可以降低版本以与过去的SDK兼容。

C. 使用适用于iOS的ONNX运行时编译生成式AI

注意：由于使用ONNX运行时的生成式AI处于预览状态，因此请注意这些更改。

git clone https://github.com/microsoft/onnxruntime-genai

cd onnxruntime-genai

git checkout yguo/ios-build-genai


mkdir ort

cd ort

mkdir include

mkdir lib

cd ../


cp ../onnxruntime/include/onnxruntime/core/session/onnxruntime_c_api.h ort/include
cp ../onnxruntime/build_ios/Release/Release-iphoneos/libonnxruntime*.dylib* ort/lib

python3 build.py --parallel --build_dir ./build_ios_simulator --ios --ios_sysroot iphoneos --osx_arch arm64 --apple_deployment_target 17.4 --cmake_generator Xcode

D. 在 Xcode 中创建 App 应用程序

我选择Objective-C作为App开发方法，因为将生成式AI与ONNX运行时C++ API一起使用，Objective-C的兼容性更好。当然，您也可以通过Swift桥接来完成相关调用。

E. 将ONNX量化的INT4模型复制到App应用程序项目

我们需要以ONNX格式导入INT4量化模型，需要先下载该模型

下载后，需要将其添加到Xcode中项目的Resources目录中。

F. 在ViewControllers中添加C++ API

注意：

1、将相应的C++头文件添加到项目中

2、在Xcode中添加onnxruntime-genai.dylib

3、在此示例中，直接使用C Samples上的代码进行测试。也可以直接添加更多来运行（比如ChatUI）

4、因为需要调用C++，所以请将ViewController.m修改为ViewController.mm

 NSString *llmPath = [[NSBundle mainBundle] resourcePath];
    char const *modelPath = llmPath.cString;

    auto model =  OgaModel::Create(modelPath);

    auto tokenizer = OgaTokenizer::Create(*model);

    const char* prompt = "<|system|>You are a helpful AI assistant.<|end|><|user|>Can you introduce yourself?<|end|><|assistant|>";

    auto sequences = OgaSequences::Create();
    tokenizer->Encode(prompt, *sequences);

    auto params = OgaGeneratorParams::Create(*model);
    params->SetSearchOption("max_length", 100);
    params->SetInputSequences(*sequences);

    auto output_sequences = model->Generate(*params);
    const auto output_sequence_length = output_sequences->SequenceCount(0);
    const auto* output_sequence_data = output_sequences->SequenceData(0);
    auto out_string = tokenizer->Decode(output_sequence_data, output_sequence_length);
    
    auto tmp = out_string;

G. 查看运行结果

示例代码：Phi-3MiniSamples/ios at main · Azure-Samples/Phi-3MiniSamples · GitHub

总结

这是一个非常初步的运行结果，因为我使用的是iPhone 12，所以运行速度比较慢，推理过程中CPU使用率达到130%。如果能有Apple MLX框架在iOS机制下配合推理会更好，所以我在这个项目中期待的是，带有ONNX Runtime的生成式AI可以为iOS提供硬件加速。当然，您也可以尝试使用较新的iPhone设备进行测试。

这只是一个初步的探索，但这是一个好的开始。我期待使用ONNX Runtime改进生成式AI。