InferVision NN model deployment webinar notes

最新推荐文章于 2022-04-11 11:56:16 发布

lucky6qi

最新推荐文章于 2022-04-11 11:56:16 发布

阅读量127

点赞数

本文链接：https://blog.csdn.net/lucky6qi/article/details/103265618

版权

InferVision NN model deployment webinar

Modle life span

build time: load modle into memory (zip, encode, send, read, unzip, decode, certificate…)
luanch time: load modle into GPU (api, check load, free, status verification, luanch config)
run time: read input, provide output (input api, output api, execute)

free
instance under process -> process kill -> free memory
gRPC: process communication

Scheduler center

Register (receive Archive, return id)
Reference (receive input, return output)

Instance status:

working, but can receive new task
will be killed, cannot receive new task
Assumption: model can be loaded into one GPU

How to find an instance:

find a matching instance is working now, just send
no matching instance, but available GPU, lauch new instance
no matching instance, no available GPU, to kill a working instance (find it and kill it), lauch new instance …
no matching instance, no available GPU, nobody to kill, wait… [based timeout from Client]

Schduler challange

Better monitoring and analysis
Better instance matching
Better load balance
Avoid race condition
Error handle
log
Mimic production environment (load, profile, test, optimization)
Test (unit test, integration test, etc.)
Interfare
Distributed

Client

I/O cost ->High efficient IPC (process commu: sharing memory)
Client no request, Server idle ->Keep scheduler request busy

Based on memory on client, determine request n -> shared memory
Preprocess input = CPU on client -> to shared memory -> to request queue -> to scheduler

Same modle on one GPU 4 models parralle，batch
Operator combine -> to batches not to multi-instances

docker container layers v.s. model layers [LARGE v.s. SMALL]