Hailo Dataflow Compiler用于将深度学习模型编译为能够在Hailo-8上运行的HEF文件。
安装需求:系统内存大于16G
为了便于安装可以采用在featurize中租用的方式安装文件
转换为HEF文件需要大概三步:
1、编写一个深度学习模型,转换为HAR文件
2、量化模型,通过加入代表数据集,转换为量化的HAR文件
3、编译转换为HEF文件
首先,更新,安装必要的环境
sudo apt-get update # 避免报错
sudo apt-get install python3-dev graphviz libgraphviz-dev pkg-config
然后,在安装hailo_dataflow_compiler
pip install hailo_dataflow_compiler-3.27.0-py3-none-linux_x86_64.whl
安装成功后,根据手册编译一个模型
from hailo_sdk_client import ClientRunner
import tensorflow as tf
# Building a simple Keras model
def build_small_example_net():
inputs = tf.keras.Input(shape=(24, 24, 96), name="img")
x = tf.keras.layers.Conv2D(24, 1, name='conv1')(inputs)
x = tf.keras.layers.BatchNormalization(momentum=0.9, name='bn1')(x)
outputs = tf.keras.layers.ReLU(max_value=6.0, name='relu1')(x)
model = tf.keras.Model(inputs, outputs, name="small_example_net")
return model
# Converting the Model to tflite
model = build_small_example_net()
model_name = 'small_example'
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.target_spec.supported_ops = [
tf.lite.OpsSet.TFLITE_BUILTINS, # enable TensorFlow Lite ops.
tf.lite.OpsSet.SELECT_TF_OPS # enable TensorFlow ops.
]
tflite_model = converter.convert() # may cause warnings in jupyter notebook, don't worry.
tflite_model_path = 'small_example.tflite'
with tf.io.gfile.GFile(tflite_model_path, 'wb') as f:
f.write(tflite_model)
chosen_hw_arch = 'hailo8'
# Parsing the model to Hailo format
runner = ClientRunner(hw_arch=chosen_hw_arch)
hn, npz = runner.translate_tf_model(tflite_model_path, model_name)
hailo_model_har_name = f'{model_name}_hailo_model.har'
runner.save_har(hailo_model_har_name)
# 保存的数据可视化数据图
from IPython.display import SVG
!hailo visualizer {hailo_model_har_name} --no-browser
# 验证流程采用代表数据组
import numpy as np
calib_dataset=np.random.randint(low=0, high=10, size=(50,24, 24, 96))
np.save('calib_set.npy', calib_dataset)
runner.optimize(calib_dataset)
runner.optimize后显示的内容
[info] Starting Model Optimization
[warning] Reducing optimization level to 1 (the accuracy won't be optimized and compression won't be used) because there's less data than the recommended amount (1024)
[info] Model received quantization params from the hn
[info] Starting Mixed Precision
[info] Mixed Precision is done (completion time is 00:00:00.31)
[info] create_layer_norm skipped
[info] Starting Stats Collector
[info] Using dataset with 50 entries for calibration
Calibration: 100%|██████████| 50/50 [00:05<00:00, 9.34entries/s]
[info] Stats Collector is done (completion time is 00:00:05.47)
[info] No shifts available for layer small_example/conv1/conv_op, using max shift instead. delta=0.10578701765651832
[info] Starting Bias Correction
[info] The algorithm Bias Correction will use up to 0.02 GB of storage space
[info] Using dataset with 50 entries for Bias Correction
Bias Correction: 100%|██████████| 1/1 [00:02<00:00, 2.59s/blocks, Layers=['small_example/conv1_output_0']]
[info] Bias Correction is done (completion time is 00:00:02.89)
[info] Adaround skipped
[info] Fine Tune skipped
[info] Starting Layer Noise Analysis
Full Quant Analysis: 100%|██████████| 2/2 [00:03<00:00, 1.51s/iterations]
[info] Layer Noise Analysis is done (completion time is 00:00:03.77)
[info] Output layers signal-to-noise ratio (SNR): measures the quantization noise (higher is better)
[info] small_example/output_layer1 SNR: 47.17 dB
[info] Model Optimization is done
转换为量化的HAR文件
# Save the result state to a Quantized HAR file
quantized_model_har_path = f'{model_name}_quantized_model.har'
runner.save_har(quantized_model_har_path)
转换为HEF文件
from hailo_sdk_client import ClientRunner
runner = ClientRunner(har=quantized_model_har_path)
hef = runner.compile()
file_name = f'{model_name}.hef'
with open(file_name, 'wb') as f:
f.write(hef)
模型分析
har_path = f'{model_name}_compiled_model.har'
runner.save_har(har_path)
!hailo profiler {har_path}
模型输出的信息
[info] Saved HAR to: /home/featurize/small_example_compiled_model.har
[info] Current Time: 00:52:07, 04/11/24
[info] CPU: Architecture: x86_64, Model: Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz, Number Of Cores: 6, Utilization: 0.7%
[info] Memory: Total: 25GB, Available: 22GB
[info] System info: OS: Linux, Kernel: 5.4.0-91-generic
[info] Hailo DFC Version: 3.27.0
[info] HailoRT Version: Not Installed
[info] PCIe: No Hailo PCIe device was found
[info] Running `hailo profiler small_example_compiled_model.har`
[info] Running profile for small_example in state compiled_model
[info]
Model Details
-------------------------------- ----------
Input Tensors Shapes 24x24x96
Operations per Input Tensor 0.00 GOPs
Operations per Input Tensor 0.00 GMACs
Pure Operations per Input Tensor 0.00 GOPs
Pure Operations per Input Tensor 0.00 GMACs
Model Parameters 0.01 M
-------------------------------- ----------
Profiler Input Settings
----------------- -----------------
Optimization Goal Reach Highest FPS
Profiler Mode Compiled
----------------- -----------------
Performance Summary
---------------------- --------------------
Number of Devices 1
Number of Contexts 1
Throughput 57463.01 FPS
Latency 0.02 ms
Operations per Second 152.52 GOP/s
MACs per Second 77.05 GMAC/s
Total Input Bandwidth 2.96 Gigabytes/sec
Total Output Bandwidth 757.57 Megabytes/sec
Context Switch Configs N/A
---------------------- --------------------
[info] Saved Profiler HTML Report to: /home/featurize/small_example_compiled_model.html
输出文件如下所示
后续在设备端运行的情况分析待续。。。
也可以参照之前的文章
Hailo-8算力卡初步测试分享
【Hailo-8算力卡推断测试】