Windows on Arm平台浅尝pytorch-directml (x64)

Seanོ

已于 2025-04-07 21:49:57 修改

阅读量1.8k

点赞数 31

文章标签： windows pytorch 人工智能深度学习 arm

于 2023-12-21 13:35:14 首次发布

本文链接：https://blog.csdn.net/weixin_43549733/article/details/135128637

版权

Windows on Arm平台浅尝pytorch-directml (x64)

前阵子把手上的Lenovo ThinkPad X13s手动升级到了Windows 11 22h2，为了能尽可能的体验下实际效果，决定从Stable-diffusion入手。

本来呢，用Windows内置的ML框架，搭配 onnx-runtime或者directml跑些简单的推理照说问题并不大了。结果试了微软官方给的Stable-diffusion c# 示例，搭了 olive-ai[directml]的0.2.1 ~ 0.3.1 多个版本和ort多个版本, 始终以找不到GroupNorm Kernel实现的错误收场。

OnnxRuntimeException: [ErrorCode:NotImplemented] Failed to find kernel for GroupNorm(1) (node GroupNorm_0). Kernel not found

于是退而求其次，试下WOA平台上的其他深度学习框架的进展，于是便意外有了这次笔记。

环境配置

更新操作系统和显卡驱动

机器出厂带的系统是Windows 10, 没有收到Windows 11的推送, 于是用微软的官方工具，下载了Windows 11 安装包, 手动升级至22h2。
升级完成后，通过Windows Update更新了全部更新包和驱动。
最后，从联想官方下载了最新的Graphcis驱动。

平台: Lenovo ThinkPad X13s Gen 1 5G
OS: Windows 11 Pro 22h2 (Build 22621.2428)
Graphics Driver: 30.0.3741.8500

搭软件栈

为了把Stable-diffusion跑起来，最好得有GPU支持。X13s的Adreno GPU没有cuda，但是高通已经提供了direct12支持，于是考虑directml曲线救国。而且幸运的是，找了一圈发现有大牛已经做好了directml版本的stable-diffusion-webui。
python已经从3.11开始提供了Windows on Arm的正式版本，于是自然的就先找了最新的Python for Windows Arm64版本。但是接下来发现时至今日，Pytorch没有发布WOA版本，更不用说directml版的Pytorch了。

想到Windows 11提供了Arm64EC的方式对x64提供支持，考虑到arm64原生版本的残缺, 要不整个软件栈就用x64版试试？

先从各自官网下载下面的软件，再将各自安好:

Application	Version (x64)
git	2.43.0 64-bit version
python	3.10.11-amd64

再从stable-diffusion-webui-directml 仓库拉代码:

git clone https://github.com/lshqqytiger/stable-diffusion-webui-directml.git

当前拉下来的版本是1.6.1。

C:\xeng\stable-diffusion-webui-directml>git show
commit 03eec1791be011e087985ae93c1f66315d5a250e (HEAD -> master, origin/master, origin/HEAD)
Merge: 64e6b068 4afaaf8a
Author: Seunghoon Lee <lshqqytiger@naver.com>
Date:   Wed Nov 8 13:09:37 2023 +0900

    Merge remote-tracking branch 'upstream/master'

然后跟stable-diffusion-webui的配置流程一样：

用科学的方式配置好网络
下载个SD v1.5的底模放入models/Stable-diffusion目录
双击webui-user.bat安装并启动webui

比预想的好，顺利启动：

Creating venv in directory C:\xeng\stable-diffusion-webui-directml\venv using python "C:\Users\xeng\AppData\Local\Programs\Python\Python310\python.exe"
venv "C:\xeng\stable-diffusion-webui-directml\venv\Scripts\Python.exe"
fatal: No names found, cannot describe anything.
Python 3.10.11 (tags/v3.10.11:7d4cc5a, Apr  5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)]
Version: 1.6.1
Commit hash: 03eec1791be011e087985ae93c1f66315d5a250e
Installing torch and torchvision
Collecting torch==2.0.0
  Using cached torch-2.0.0-cp310-cp310-win_amd64.whl (172.3 MB)
Collecting torchvision==0.15.1
  Using cached torchvision-0.15.1-cp310-cp310-win_amd64.whl (1.2 MB)
Collecting torch-directml
  Using cached torch_directml-0.2.0.dev230426-cp310-cp310-win_amd64.whl (8.2 MB)
Collecting networkx
  Using cached networkx-3.2.1-py3-none-any.whl (1.6 MB)
Collecting jinja2
  Using cached Jinja2-3.1.2-py3-none-any.whl (133 kB)
Collecting typing-extensions
  Downloading typing_extensions-4.9.0-py3-none-any.whl (32 kB)
Collecting filelock
  Using cached filelock-3.13.1-py3-none-any.whl (11 kB)
Collecting sympy
  Using cached sympy-1.12-py3-none-any.whl (5.7 MB)
Collecting numpy
  Using cached numpy-1.26.2-cp310-cp310-win_amd64.whl (15.8 MB)
Collecting pillow!=8.3.*,>=5.3.0
  Using cached Pillow-10.1.0-cp310-cp310-win_amd64.whl (2.6 MB)
Collecting requests
  Using cached requests-2.31.0-py3-none-any.whl (62 kB)
Collecting MarkupSafe>=2.0
  Using cached MarkupSafe-2.1.3-cp310-cp310-win_amd64.whl (17 kB)
Collecting urllib3<3,>=1.21.1
  Using cached urllib3-2.1.0-py3-none-any.whl (104 kB)
Collecting charset-normalizer<4,>=2
  Using cached charset_normalizer-3.3.2-cp310-cp310-win_amd64.whl (100 kB)
Collecting certifi>=2017.4.17
  Using cached certifi-2023.11.17-py3-none-any.whl (162 kB)
Collecting idna<4,>=2.5
  Using cached idna-3.6-py3-none-any.whl (61 kB)
Collecting mpmath>=0.19
  Using cached mpmath-1.3.0-py3-none-any.whl (536 kB)
Installing collected packages: mpmath, urllib3, typing-extensions, sympy, pillow, numpy, networkx, MarkupSafe, idna, filelock, charset-normalizer, certifi, requests, jinja2, torch, torchvision, torch-directml
Successfully installed MarkupSafe-2.1.3 certifi-2023.11.17 charset-normalizer-3.3.2 filelock-3.13.1 idna-3.6 jinja2-3.1.2 mpmath-1.3.0 networkx-3.2.1 numpy-1.26.2 pillow-10.1.0 requests-2.31.0 sympy-1.12 torch-2.0.0 torch-directml-0.2.0.dev230426 torchvision-0.15.1 typing-extensions-4.9.0 urllib3-2.1.0

[notice] A new release of pip is available: 23.0.1 -> 23.3.2
[notice] To update, run: C:\xeng\stable-diffusion-webui-directml\venv\Scripts\python.exe -m pip install --upgrade pip
Installing clip
Installing open_clip
Installing requirements for CodeFormer
Installing requirements
Launching Web UI with arguments:
no module 'xformers'. Processing without...
no module 'xformers'. Processing without...
No module 'xformers'. Proceeding without it.
Loading weights [7c819b6d13] from C:\xeng\stable-diffusion-webui-directml\models\Stable-diffusion\majicmixRealistic_v7.safetensors
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Creating model from config: C:\xeng\stable-diffusion-webui-directml\configs\v1-inference.yaml
Startup time: 608.9s (prepare environment: 579.0s, import torch: 13.7s, import gradio: 3.6s, setup paths: 3.9s, initialize shared: 3.4s, other imports: 1.2s, setup codeformer: 0.3s, load scripts: 1.9s, create ui: 1.0s, gradio launch: 0.7s).Applying attention optimization: InvokeAI... done.
Model loaded in 15.0s (load weights from disk: 1.7s, create model: 7.5s, apply weights to model: 4.5s, move model to device: 0.3s, calculate empty prompt: 0.8s).

画小姐姐

先用默认的参数，画个麦琪，很快就OOM了。不过还好不是其他错误，有戏！
在这里插入图片描述把分辨率改成256x256，看看跑的动不：

GPU全力的跑起来了，CPU的占用很低。

她来了她来了，小姐姐来了:
在这里插入图片描述

Sampler用restart花了快6分钟。同样分辨率，改成Euler a试试:

100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [02:48<00:00,  8.43s/it]
Total progress: 100%|██████████████████████████████████████████████████████████████████| 20/20 [02:57<00:00,  8.87s/it]

约三分钟，不过20步就不太够了:
在这里插入图片描述

320x320还是可以跑出来的, 10分钟，呵呵，姗姗来迟啊。Better late than never.

100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [09:44<00:00, 29.22s/it]
Total progress: 100%|██████████████████████████████████████████████████████████████████| 20/20 [09:55<00:00, 29.78s/it]
Total progress: 100%|██████████████████████████████████████████████████████████████████| 20/20 [09:55<00:00, 28.29s/it]

在这里插入图片描述

小结

这性能和出图大小跟手上的cuda生态比起来，确实很寒酸。

不过能用x64软件栈在WOA平台上用Pytorch顺利跑通Stable-diffusion，属实有点意外，onnx runtime团队需要加油，WOA未来可期哦。

软件版本

python:

C:\xeng\>python -V
Python 3.10.11

wheels:

absl-py==2.0.0
accelerate==0.21.0
addict==2.4.0
aenum==3.1.15
aiofiles==23.2.1
aiohttp==3.9.1
aiosignal==1.3.1
altair==5.2.0
antlr4-python3-runtime==4.9.3
anyio==3.7.1
async-timeout==4.0.3
attrs==23.1.0
basicsr==1.4.2
beautifulsoup4==4.12.2
blendmodes==2022
boltons==23.1.1
cachetools==5.3.2
certifi==2023.11.17
charset-normalizer==3.3.2
clean-fid==0.1.35
click==8.1.7
clip==1.0
colorama==0.4.6
contourpy==1.2.0
cycler==0.12.1
deprecation==2.1.0
diffusers==0.24.0
einops==0.4.1
exceptiongroup==1.2.0
facexlib==0.3.0
fastapi==0.94.0
ffmpy==0.3.1
filelock==3.13.1
filterpy==1.4.5
fonttools==4.47.0
frozenlist==1.4.1
fsspec==2023.12.2
ftfy==6.1.3
future==0.18.3
gdown==4.7.1
gfpgan==1.3.8
gitdb==4.0.11
GitPython==3.1.32
google-auth==2.25.2
google-auth-oauthlib==1.2.0
gradio==3.41.2
gradio_client==0.5.0
grpcio==1.60.0
h11==0.12.0
httpcore==0.15.0
httpx==0.24.1
huggingface-hub==0.19.4
idna==3.6
imageio==2.33.1
importlib-metadata==7.0.0
importlib-resources==6.1.1
inflection==0.5.1
Jinja2==3.1.2
jsonmerge==1.8.0
jsonschema==4.20.0
jsonschema-specifications==2023.11.2
kiwisolver==1.4.5
kornia==0.6.7
lark==1.1.2
lazy_loader==0.3
lightning-utilities==0.10.0
llvmlite==0.41.1
lmdb==1.4.1
lpips==0.1.4
Markdown==3.5.1
MarkupSafe==2.1.3
matplotlib==3.8.2
mpmath==1.3.0
multidict==6.0.4
networkx==3.2.1
numba==0.58.1
numpy==1.23.5
oauthlib==3.2.2
omegaconf==2.2.3
open-clip-torch==2.20.0
opencv-python==4.8.1.78
orjson==3.9.10
packaging==23.2
pandas==2.1.4
piexif==1.1.3
Pillow==9.5.0
platformdirs==4.1.0
protobuf==3.20.0
psutil==5.9.5
pyasn1==0.5.1
pyasn1-modules==0.3.0
pydantic==1.10.13
pydub==0.25.1
pyparsing==3.1.1
PySocks==1.7.1
python-dateutil==2.8.2
python-multipart==0.0.6
pytorch-lightning==1.9.4
pytz==2023.3.post1
PyWavelets==1.5.0
PyYAML==6.0.1
realesrgan==0.3.0
referencing==0.32.0
regex==2023.10.3
requests==2.31.0
requests-oauthlib==1.3.1
resize-right==0.0.2
rpds-py==0.15.2
rsa==4.9
safetensors==0.3.1
scikit-image==0.21.0
scipy==1.11.4
semantic-version==2.10.0
sentencepiece==0.1.99
six==1.16.0
smmap==5.0.1
sniffio==1.3.0
soupsieve==2.5
starlette==0.26.1
sympy==1.12
tb-nightly==2.16.0a20231219
tensorboard-data-server==0.7.2
tf-keras-nightly==2.16.0.dev2023121910
tifffile==2023.12.9
timm==0.9.2
tokenizers==0.13.3
tomesd==0.1.3
tomli==2.0.1
toolz==0.12.0
torch==2.0.0
torch-directml==0.2.0.dev230426
torchdiffeq==0.2.3
torchmetrics==1.2.1
torchsde==0.2.5
torchvision==0.15.1
tqdm==4.66.1
trampoline==0.1.2
transformers==4.30.2
typing_extensions==4.9.0
tzdata==2023.3
urllib3==2.1.0
uvicorn==0.24.0.post1
wcwidth==0.2.12
websockets==11.0.3
Werkzeug==3.0.1
yapf==0.40.2
yarl==1.9.4
zipp==3.17.0