Transformers
https://kserve.github.io/website/0.10/modelserving/v1beta1/transformer/feast/
使用Feast在线功能库部署具有转换器的推理服务
Transformer是一个推理服务组件,它与模型推理一起进行预/后处理。在这个例子中,我们展示了作为预处理的一部分的在线特征增强的用例,而不是原始数据到张量的典型输入转换。我们使用Feast Transformer来收集在线特征,使用SKLearn预测器运行推理,并将后期处理作为直通处理。
在你开始之前
1.您的~/.kube/config应该指向安装了KServe的集群。
2.集群的Istio Ingress网关必须可以通过网络访问。
3.您可以在kserve存储库中找到代码示例。
注释
此示例使用Feast版本0.30.2
创建Redis服务器
本例使用Redis作为在线存储。使用以下命令部署Redis服务器。
cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
name: redis-server
spec:
replicas: 1
selector:
matchLabels:
app: redis-server
template:
metadata:
labels:
app: redis-server
name: redis-server
spec:
containers:
- name: redis-server
image: redis
args: [ "--appendonly", "yes" ]
ports:
- name: redis-server
containerPort: 6379
env:
- name: ALLOW_EMPTY_PASSWORD
value: "yes"
---
apiVersion: v1
kind: Service
metadata:
name: redis-service
spec:
type: LoadBalancer
selector:
app: redis-server
ports:
- protocol: TCP
port: 6379
targetPort: 6379
EOF
期望输出
$ deployment.apps/redis-server created
$ service/redis-service created
创建 Feast服务器
生成功能存储初始化程序docker镜像
功能存储初始化器是一个初始化容器,它初始化一个新的示例功能存储库,用示例驱动程序数据填充在线存储,并将功能存储库复制到卷装载。功能存储初始值设定项dockerfile可以在代码示例目录中找到。签出盛宴代码示例,并在示例目录下运行以下命令:
docker build -t $USERNAME/feature-store-initializer:latest -f feature_store_initializer.Dockerfile .
docker push $USERNAME/feature-store-initializer:latest
构建Feast服务器docker镜像
Feast服务器dockerfile可以在代码示例目录中找到。
docker build -t $USERNAME/feast-server:latest -f feast_server.Dockerfile .
docker push $USERNAME/feast-server:latest
部署Feast服务器
等待Redis Deployment可用。现在,在下面的命令中更新init容器和容器的image字段,并部署Feast服务器。
cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
name: feature-server
spec:
replicas: 1
selector:
matchLabels:
app: feature-server
template:
metadata:
labels:
app: feature-server
name: feature-server
spec:
initContainers:
- name: feature-store-initializer
image: "{username}/feature-store-initializer:latest"
volumeMounts:
- mountPath: /mnt
name: feature-store-volume
containers:
- name: feature-server
image: "{username}/feast-server:latest"
args: [ -c, /mnt/driver_feature_repo/feature_repo, serve, -h, 0.0.0.0 ]
ports:
- name: feature-server
containerPort: 6566
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
volumeMounts:
- mountPath: /mnt
name: feature-store-volume
volumes:
- name: feature-store-volume
emptyDir:
sizeLimit: 100Mi
---
apiVersion: v1
kind: Service
metadata:
name: feature-server-service
spec:
type: LoadBalancer
selector:
app: feature-server
ports:
- protocol: TCP
port: 6566
targetPort: 6566
EOF
期望输出
$ deployment.apps/feature-server created
$ service/feature-server-service created
使用Feast创建转换器
扩展Model类并实现预/后处理功能
KServe.Model基类主要定义了preprocess, predict和postprocess三个处理程序,这些处理程序按顺序执行,preprocess的输出作为输入传递给predict,当传递predictor_host时,predict处理程序默认会对预测器url进行HTTP调用,并返回一个响应,然后传递给postprocess程序。KServe自动为Transformer填充predictor_host并处理对predictor的调用,对于gRPC预测器,当前您需要覆盖 预测 处理程序才能进行gRPC调用。
要实现Transformer,可以从基本Model类派生,然后覆盖预处理和后处理处理程序,使其具有自己的自定义转换逻辑。
我们创建了一个类Driver Transformer,它为这个驱动程序排名示例扩展了Model。转换器需要额外的参数才能与Feast交互:
- feast_serving_url:格式为<host_name:port>或ip:port
- entity_id_name:要从Feast功能存储中检索功能的实体id的名称
- feature_refs:要检索的特征的特征引用
生成Transformer docker图像
驱动程序转换器dockerfile可以在代码示例目录中找到。签检验feast代码示例,并在示例目录下运行以下命令:
docker build -t $USERNAME/driver-transformer:latest -f driver_transformer.Dockerfile .
docker push $USERNAME/driver-transformer:latest
创建推理服务
在Feast Transformer镜像中,我们打包了驱动程序转换器类,这样KServe就知道在提出模型推理请求之前,使用预处理实现来增加具有在线特征的输入。然后,推理服务使用SKLearn为驱动级别的模型提供服务,该模型使用Feast离线功能进行训练,可在storageUri下指定的gcs存储桶中使用。更新容器的image字段和feat_serving_url参数以创建推理服务,其中包括一个feast Transformer和一个SKLearn Predictor。
新框架
cat <<EOF | kubectl apply -f -
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
name: "sklearn-driver-transformer"
spec:
transformer:
containers:
- image: "kserve/driver-transformer:latest"
name: driver-container
command:
- "python"
- "-m"
- "driver_transformer"
args:
- --feast_serving_url
- "feature-server-service.default.svc.cluster.local:6566"
- --entity_id_name
- "driver_id"
- --feature_refs
- "driver_hourly_stats:conv_rate"
- "driver_hourly_stats:acc_rate"
- "driver_hourly_stats:avg_daily_trips"
predictor:
model:
modelFormat:
name: sklearn
storageUri: "gs://kfserving-examples/models/feast/driver"
EOF
旧框架
cat <<EOF | kubectl apply -f -
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
name: "sklearn-driver-transformer"
spec:
transformer:
containers:
- image: "kserve/driver-transformer:latest"
name: driver-container
command:
- "python"
- "-m"
- "driver_transformer"
args:
- --feast_serving_url
- "feature-server-service.default.svc.cluster.local:6566"
- --entity_id_name
- "driver_id"
- --feature_refs
- "driver_hourly_stats:conv_rate"
- "driver_hourly_stats:acc_rate"
- "driver_hourly_stats:avg_daily_trips"
predictor:
sklearn:
storageUri: "gs://kfserving-examples/models/feast/driver"
EOF
期望输出
$ inferenceservice.serving.kserve.io/sklearn-driver-transformer created
运行预测
为推理请求准备输入。将以下Json复制到名为driver-input.json的文件中。
{
"instances": [[1001], [1002], [1003], [1004], [1005]]
}
在测试推理服务之前,首先检查它是否处于就绪状态。现在,确定入口IP和端口,并设置INGRESS_HOST和INGRESS_PORT
SERVICE_NAME=sklearn-driver-transformer
MODEL_NAME=sklearn-driver-transformer
INPUT_PATH=@./driver-input.json
SERVICE_HOSTNAME=$(kubectl get inferenceservice $SERVICE_NAME -o jsonpath='{.status.url}' | cut -d "/" -f 3)
curl -v -H "Host: ${SERVICE_HOSTNAME}" -d $INPUT_PATH http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/$MODEL_NAME:predict
期望输出
> POST /v1/models/sklearn-driver-transformer:predict HTTP/1.1
> Host: sklearn-driver-transformer.default.example.com
> User-Agent: curl/7.85.0
> Accept: */*
> Content-Length: 57
> Content-Type: application/x-www-form-urlencoded
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< content-length: 115
< content-type: application/json
< date: Thu, 30 Mar 2023 09:46:52 GMT
< server: istio-envoy
< x-envoy-upstream-service-time: 112
<
* Connection #0 to host 1.2.3.4 left intact
{"predictions":[0.45905828209879473,1.5118208033011165,0.21514156911776539,0.5555778492605103,0.49638665080127176]}
使用推理服务部署Transformer
Transformer是一个推理服务组件,它与模型推理一起进行预/后处理。它通常获取原始输入,并将它们转换为张量模型服务器所期望的输入。在这个例子中,我们展示了一个使用自定义Transformer运行推理的例子,该转换器通过REST和gRPC协议进行通信。
创建自定义镜像转换器
使用KServe Model API实现前/后处理
KServe.Model基类主要定义了预处理、预测和后处理三个处理程序,这些处理程序按顺序执行,预处理程序的输出作为输入传递给预测处理程序。当传递了predictor_host时,预测处理程序会调用预测器并返回一个响应,然后将该响应传递给后处理处理程序。KServe自动为Transformer填充predictor_host,并将调用移交给predictor。默认情况下,transformer对预测器进行REST调用,要对预测器执行gRPC调用,可以传递值为grpc-v2的–protocol参数。
要实现Transformer,可以从基本Model类派生,然后覆盖预处理和后处理处理程序,使其具有自己的自定义转换逻辑。对于Open(v2)推理协议,KServe为预测、预处理、后处理处理程序提供了InferRequest和InferResponse API对象,以抽象REST/gRPC解码和编码的实现细节。
from kserve import Model, ModelServer, model_server, InferInput, InferRequest
from typing import Dict
from PIL import Image
import torchvision.transforms as transforms
import logging
import io
import base64
logging.basicConfig(level=kserve.constants.KSERVE_LOGLEVEL)
def image_transform(byte_array):
"""converts the input image of Bytes Array into Tensor
Args:
instance (dict): The request input for image bytes.
Returns:
list: Returns converted tensor as input for predict handler with v1/v2 inference protocol.
"""
image_processing = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])
image = Image.open(io.BytesIO(byte_array))
tensor = image_processing(image).numpy()
return tensor
# for v1 REST predictor the preprocess handler converts to input image bytes to float tensor dict in v1 inference REST protocol format
class ImageTransformer(kserve.Model):
def __init__(self, name: str, predictor_host: str, headers: Dict[str, str] = None):
super().__init__(name)
self.predictor_host = predictor_host
self.ready = True
def preprocess(self, inputs: Dict, headers: Dict[str, str] = None) -> Dict:
return {'instances': [image_transform(instance) for instance in inputs['instances']]}
def postprocess(self, inputs: Dict, headers: Dict[str, str] = None) -> Dict:
return inputs
# for v2 gRPC predictor the preprocess handler converts the input image bytes tensor to float tensor in v2 inference protocol format
class ImageTransformer(kserve.Model):
def __init__(self, name: str, predictor_host: str, protocol: str, headers: Dict[str, str] = None):
super().__init__(name)
self.predictor_host = predictor_host
self.protocol = protocol
self.ready = True
def preprocess(self, request: InferRequest, headers: Dict[str, str] = None) -> InferRequest:
input_tensors = [image_transform(instance) for instance in request.inputs[0].data]
input_tensors = np.asarray(input_tensors)
infer_inputs = [InferInput(name="INPUT__0", datatype='FP32', shape=list(input_tensors.shape),
data=input_tensors)]
infer_request = InferRequest(model_name=self.model_name, infer_inputs=infer_inputs)
return infer_request
请参阅此处的代码示例。
Transformer服务器入口点
对于单个模型,只需创建一个transformer对象并将其注册到模型服务器。
if __name__ == "__main__":
model = ImageTransformer(args.model_name, predictor_host=args.predictor_host,
protocol=args.protocol)
ModelServer().start(models=[model])
对于多模型的情况,如果所有模型都可以共享相同的转换器,则可以为不同的模型注册相同的转换器;如果每个模型都需要自己的转换,则可以注册不同的转换器。
if __name__ == "__main__":
for model_name in model_names:
transformer = ImageTransformer(model_name, predictor_host=args.predictor_host)
models.append(transformer)
kserve.ModelServer().start(models=models)
生成Transformer docker图像
在kserve/python目录下,使用Dockerfile构建transformer docker镜像
cd python
docker build -t $DOCKER_USER/image-transformer:latest -f transformer.Dockerfile .
docker push {username}/image-transformer:latest
使用REST Predictor部署推理服务¶
创建推理服务
默认情况下,推理服务使用TorchServe来为PyTorch模型提供服务,并且可以根据TorchServe模型存储库布局从云存储中的模型存储库加载模型。在本例中,模型存储库包含一个MNIST模型,但您可以在其中存储多个模型。
新框架
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: torch-transformer
spec:
predictor:
model:
modelFormat:
name: pytorch
storageUri: gs://kfserving-examples/models/torchserve/image_classifier
transformer:
containers:
- image: kserve/image-transformer:latest
name: kserve-container
command:
- "python"
- "-m"
- "model"
args:
- --model_name
- mnist
旧框架
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: torch-transformer
spec:
predictor:
pytorch:
storageUri: gs://kfserving-examples/models/torchserve/image_classifier
transformer:
containers:
- image: kserve/image-transformer:latest
name: kserve-container
command:
- "python"
- "-m"
- "model"
args:
- --model_name
- mnist
注释
STORAGE_URI是一个内置环境变量,用于为自定义容器注入存储初始值设定项,就像预打包预测器的StorageURI字段一样。
下载的工件存储在/mnt/models下。
应用推理服务transformer-new.yaml
kubectl apply -f transformer-new.yaml
期望输出
$ inferenceservice.serving.kserve.io/torch-transformer created
运行预测
首先,下载请求输入有效负载。
然后,确定入口IP和端口,并设置INGRESS_HOST和INGRESS_PORT。
SERVICE_NAME=torch-transformer
MODEL_NAME=mnist
INPUT_PATH=@./input.json
SERVICE_HOSTNAME=$(kubectl get inferenceservice $SERVICE_NAME -o jsonpath='{.status.url}' | cut -d "/" -f 3)
curl -v -H "Host: ${SERVICE_HOSTNAME}" -d $INPUT_PATH http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/$MODEL_NAME:predict
期望输出
> POST /v1/models/mnist:predict HTTP/1.1
> Host: torch-transformer.default.example.com
> User-Agent: curl/7.73.0
> Accept: */*
> Content-Length: 401
> Content-Type: application/x-www-form-urlencoded
>
* upload completely sent off: 401 out of 401 bytes
Handling connection for 8080
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< content-length: 20
< content-type: application/json; charset=UTF-8
< date: Tue, 12 Jan 2021 09:52:30 GMT
< server: istio-envoy
< x-envoy-upstream-service-time: 83
<
* Connection #0 to host localhost left intact
{"predictions": [2]}
使用gRPC协议部署调用Predictor的推理服务
与REST相比,由于协议缓冲区的紧密封装和gRPC使用HTTP/2,gRPC速度更快。在许多情况下,gRPC可能是Transformer和Predictor之间更有效的通信协议,因为您可能需要在它们之间传输大张量。
创建推理服务
使用以下yaml创建推理服务,其中包括一个Transformer和一个Triton Predictor。由于KServe默认使用PyTorch模型的TorchServe服务运行时,因此您需要将服务运行时重写为KServe-tritonserver,以便使用gRPC协议。转换器通过指定–Protocol参数来调用具有V2 gRPC协议的预测器。
新框架
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: torch-grpc-transformer
spec:
predictor:
model:
modelFormat: pytorch
storageUri: gs://kfserving-examples/models/torchscript
runtime: kserve-tritonserver
runtimeVersion: 20.10-py3
ports:
- name: h2c
protocol: TCP
containerPort: 9000
transformer:
containers:
- image: kserve/image-transformer:latest
name: kserve-container
command:
- "python"
- "-m"
- "model"
args:
- --model_name
- cifar10
- --protocol
- grpc-v2
旧框架
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: torch-grpc-transformer
spec:
predictor:
triton:
storageUri: gs://kfserving-examples/models/torchscript
runtimeVersion: 20.10-py3
ports:
- name: h2c
protocol: TCP
containerPort: 9000
transformer:
containers:
- image: kserve/image-transformer:latest
name: kserve-container
command:
- "python"
- "-m"
- "model"
args:
- --model_name
- cifar10
- --protocol
- grpc-v2
应用推理服务grpc_transformer.yaml
kubectl apply -f grpc_transformer.yaml
期望输出
$ inferenceservice.serving.kserve.io/torch-grpc-transformer created
运行预测
首先,下载请求输入有效负载。
然后,确定入口IP和端口,并设置INGRESS_HOST和INGRESS_PORT
SERVICE_NAME=torch-grpc-transformer
MODEL_NAME=cifar10
INPUT_PATH=@./image.json
SERVICE_HOSTNAME=$(kubectl get inferenceservice $SERVICE_NAME -o jsonpath='{.status.url}' | cut -d "/" -f 3)
curl -v -H "Host: ${SERVICE_HOSTNAME}" -d $INPUT_PATH http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/$MODEL_NAME:predict
期望输出
* Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 8080 (#0)
> POST /v1/models/cifar10:predict HTTP/1.1
> Host: torch-transformer.default.example.com
> User-Agent: curl/7.64.1
> Accept: */*
> Content-Length: 3394
> Content-Type: application/x-www-form-urlencoded
> Expect: 100-continue
>
Handling connection for 8080
< HTTP/1.1 100 Continue
* We are completely uploaded and fine
< HTTP/1.1 200 OK
< content-length: 222
< content-type: application/json; charset=UTF-8
< date: Thu, 03 Feb 2022 01:50:07 GMT
< server: istio-envoy
< x-envoy-upstream-service-time: 73
<
* Connection #0 to host localhost left intact
{"predictions": [[-1.192867636680603, -0.35750141739845276, -2.3665435314178467, 3.9186441898345947, -2.0592284202575684, 4.091977119445801, 0.1266237050294876, -1.8284690380096436, 2.628898859024048, -4.255198001861572]]}* Closing connection 0
gRPC与REST的性能比较
从以下transformer和predictor的延迟统计数据中,您可以看到,与gRPC相比,transformer到predictor调用REST花费更长的时间(92ms vs 55ms),REST序列化和反序列化33232形状张量花费更多的时间,并且使用gRPC将其作为紧密封装的numpy数组序列化字节进行传输。
# from REST v1 transformer log
2023-01-09 07:15:55.263 79476 root INFO [__call__():128] requestId: N.A., preprocess_ms: 6.083965302, explain_ms: 0, predict_ms: 92.653036118, postprocess_ms: 0.007867813
# from REST v1 predictor log
2023-01-09 07:16:02.581 79402 root INFO [__call__():128] requestId: N.A., preprocess_ms: 13.532876968, explain_ms: 0, predict_ms: 48.450231552, postprocess_ms: 0.006914139
# from REST v1 transformer log
2023-01-09 07:27:52.172 79715 root INFO [__call__():128] requestId: N.A., preprocess_ms: 2.567052841, explain_ms: 0, predict_ms: 55.0532341, postprocess_ms: 0.101804733
# from gPPC v2 predictor log
2023-01-09 07:27:52.171 79711 root INFO [__call__():128] requestId: , preprocess_ms: 0.067949295, explain_ms: 0, predict_ms: 51.237106323, postprocess_ms: 0.049114227