使用kfserving发布sklearn_iris遇到的问题汇总

在使用kfserving发布模型服务的时候会遇到以下这些问题进行记录。

核心原则

1. 多看github的issue

2. 多看项目的源码

检查命令

  • kubectl get inferenceservice -A
  • kubectl describe inferenceservice.serving.kubeflow.org/sklearn-iris -n kubeflow
  • 查看kfserving的日志:kubectl logs StatefulSet/kfserving-controller-manager -c manager -n kubeflow
  • kubectl get revision sklearn-iris-predictor-default-ngztb -n kubeflow -o yaml
  • kubectl get ksvc -n kubeflow
  • kubectl describe ksvc/sklearn-iris-predictor-default -n kubeflow
  • kubectl get configuration -A
  • kubectl -n kubeflow get events

问题一:no endpoints available for service "kfserving-webhook-server-service"

解决方案:

kubectl get mutatingwebhookconfigurations

kubectl delete mutatingwebhookconfigurations inferenceservice.serving.kubeflow.org

kubectl delete validatingwebhookconfigurations inferenceservice.serving.kubeflow.org

kubectl delete po kfserving-controller-manager-0 -n kfserving-system

重新启动:kfctl apply -f $(pwd)/kfctl_k8s_istio.v1.0.1.yaml -V

 

kubectl get revision -n kubeflow

kubectl describe revision xgb-kfserving-predictor-default -n kubeflow

问题二:对镜像的的鉴权

Warning InternalError 12m (x32 over 7h20m) revision-controller failed to resolve image to digest: failed to fetch image information: Get https:/xxx.xxx.com/v2/: x509: certificate is valid for *.parkingcrew.net, parkingcrew.net, not harbor.prd.com

解决方案:

命令:kubectl -n knative-serving edit configmap config-deployment

修改值registriesSkippingTagResolving: "ko.local,dev.local"为registriesSkippingTagResolving: "xx.xx.com"

记得把example注释掉

一劳永逸的方式,修改配置文件:kustomize/knative-install/base/config-map.yaml

 

kuebctl get pods -n kubeflow

kubectl describe pod sklearn-iris-predictor-default-zxgdn-deployment-b89978b-fhzw6 -n kubeflow

问题三:Normal BackOff 28s kubelet, docker-dsu-sitsvr-kubeflow011 Back-off pulling image "gcr.io/kfserving/storage-initializer:0.2.2"

解决方案

重新下载knative-releases_knative_dev_serving_cmd_queue:0.0.2到harbor上

下载地址:https://hub.docker.com/search?q=sklearnserver&type=image

docker pull adamjm32/storage-initializer:0.2.2

docker tag adamjm32/storage-initializer:0.2.2 xx.xx.com/kubeflow/storage-initializer:0.2.2

docker push xx.xx.com/kubeflow/storage-initializer:0.2.2

修改:kustomize/kfserving-install/base/config-map.yaml的storageInitializer的值

如果遇到问题:standard_init_linux.go:211: exec user process caused "exec format error"

是镜像问题,换个镜像就好了

 

 

kuebctl get pods -n kubeflow

kubectl logs sklearn-iris-predictor-default-vj4jx-deployment-66f78fb5cd4btsr -n kubeflow -c storage-initializer

问题四:Warning InspectFailed 9s (x2 over 24s) kubelet, docker-dsu-sitsvr-kubeflow011 Failed to apply default image tag "harbor.prd.com/kubeflow/sklearnserver:": couldn't parse image reference "harbor.prd.com/kubeflow/sklearnserver:": invalid reference format

在kustomize/kfserving-install/base/config-map.yaml添加

 

 

 

 

       "sklearn": {
            "image": "harbor.prd.com/kubeflow/sklearnserver",
            "defaultImageVersion": "0.2.2",
            "allowedImageVersions": [
               "0.2.2",
               "0.2.3",
               "0.2.4"
            ]   
        }, 

代码里有个使用runtimeVersion的判断

https://github.com/kubeflow/kfserving/blob/master/pkg/apis/serving/v1alpha2/framework_scikit.go

 

在模型发布服务的yaml里面添加一个runtimeVersion

apiVersion: "serving.kubeflow.org/v1alpha2"
kind: "InferenceService"
metadata:
  name: "sklearn-iris"
  namespace: kubeflow
spec:
  default:
    predictor:
      minReplicas: 1
      sklearn:
        storageUri: "pvc://kfserving-pvc-source/sklearn_iris/model.joblib"
        runtimeVersion: "0.2.4"

kubectl logs sklearn-iris-predictor-default-pjfxj-deployment-6c95bbf6557l4rl -n kubeflow -c kfserving-container

问题五

[I 201118 12:17:40 storage:35] Copying contents of /mnt/models to local
Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/sklearnserver/sklearnserver/__main__.py", line 33, in <module>
    model.load()
  File "/sklearnserver/sklearnserver/model.py", line 33, in load
    model_path = kfserving.Storage.download(self.model_dir)
  File "/kfserving/kfserving/storage.py", line 58, in download
    (_GCS_PREFIX, _S3_PREFIX, _LOCAL_PREFIX))

问题解析

找不到相应的路径,需要挂载pvc

参照代码:https://github.com/kubeflow/kfserving/blob/master/pkg/webhook/admission/pod/storage_initializer_injector.go

PvcSourceMountName = "kfserving-pvc-source"

 

解决方案

把pvc的名字对应上:kfserving-pvc-source

创建新的pvc

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
    name: kfserving-pvc-source
    namespace: kubeflow
spec:
    resources:
        requests:
            storage: 10Gi
    accessModes:
    - ReadWriteMany
    storageClassName: cbs

修改模型发布服务的yaml文件内容

storageUri: "pvc://kfserving-pvc-source/sklearn_iris/model.joblib"

 

 

问题六:

Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/sklearnserver/sklearnserver/__main__.py", line 33, in <module>
    model.load()
  File "/sklearnserver/sklearnserver/model.py", line 37, in load
    self._model = joblib.load(model_file) #pylint:disable=attribute-defined-outside-init
  File "/usr/local/lib/python3.6/dist-packages/joblib/numpy_pickle.py", line 585, in load
    obj = _unpickle(fobj, filename, mmap_mode)
  File "/usr/local/lib/python3.6/dist-packages/joblib/numpy_pickle.py", line 504, in _unpickle
    obj = unpickler.load()
  File "/usr/lib/python3.6/pickle.py", line 1050, in load
    dispatch[key[0]](self)
  File "/usr/lib/python3.6/pickle.py", line 1338, in load_global
    klass = self.find_class(module, name)
  File "/usr/lib/python3.6/pickle.py", line 1388, in find_class
    __import__(module, level=0)
ModuleNotFoundError: No module named 'sklearn.svm._classes'

解决方案

https://github.com/kubeflow/kfserving/issues/1214

需要修改下sklearn的版本

pip3 install --upgrade scikit-learn==0.20.3

 

 

 

 

 

 

  • 1
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值