MLU-K8S-PLUGIN安装

1 安装过程

    1.  环境要求

MLU100, MLU270, x5k, MLU220 devices

MLU100 driver > 3.5; MLU270 driver >2.2.0; MLU220 driver > 4.1.1

libcndev.so >= V1.8.0

Kubernetes >= v1.11.2

    1.  下载与构建

(以下步骤需在联网环境进行,如离线安装,需在通网节点下载)

1.2.1 Clone 包:

git clone GitHub - Cambricon/cambricon-k8s-device-plugin

1.2.2 进入安装目录:

cd cambricon-k8s-device-plugin/device-plugin

1.2.3 构建镜像:

    如离线安装,需修改build_image.sh 以及Dockerfile,修改内容见附录;

        ./build_image.sh

        注意:构建完成后将镜像传送至安装节点;

确保Cambricon neuware已安装,在构建镜像过程中会需要libcndev.so。

    1.  加载镜像

docker load -i image/cambricon-k8s-device-plugin-amd64.tar

    1. 部署守护进程

1.4.1修改yaml文件

vim ./example/cambricon-device-plugin-daemonset.yaml

可修改参数:

args:

  - --mode=default #device plugin mode: default, sriov or env-share

      - --virtualization-num=1 # virtualization number for each MLU, used only in sriov mode or env-share mode

1.4.2启动进程

kubectl create -f cambricon-device-plugin-daemonset.yaml

    1. 使用MLU运行任务

Cambricon MLU现在可以通过容器级资源需求使用,使用资源名称:‎cambricon.com/mlu

例:

apiVersion: v1

kind: Pod

metadata:

  name: pod1

spec:

  restartPolicy: OnFailure

  containers:

    - image: ubuntu:16.04

      name: pod1-ctr

      command: ["sleep"]

      args: ["100000"]

      resources:

        limits:

          cambricon.com/mlu: 1

  1. 问题解决
    1. 构建镜像过程中出现以下问题:

Err:1 http://deb.debian.org/debian buster InRelease                 

  Temporary failure resolving 'deb.debian.org'

Err:2 http://security.debian.org/debian-security buster/updates InRelease

  Temporary failure resolving 'security.debian.org'

Err:3 http://deb.debian.org/debian buster-updates InRelease         

  Temporary failure resolving 'deb.debian.org'

Reading package lists... Done   

W: Failed to fetch http://deb.debian.org/debian/dists/buster/InRelease  Temporary failure resolving 'deb.debian.org'

W: Failed to fetch http://security.debian.org/debian-security/dists/buster/updates/InRelease  Temporary failure resolving 'security.debian.org'

W: Failed to fetch http://deb.debian.org/debian/dists/buster-updates/InRelease  Temporary failure resolving 'deb.debian.org'

W: Some index files failed to download. They have been ignored, or old ones used instead.

解决办法:

vim /etc/docker/daemon.json,

添加行 "dns": ["114.114.114.114","8.8.8.8"]

重启docker:systemctl restart docker

    1. 报错:file not found in build context or excluded by .dockerignore

  原因:dockerfile 不能获取 父目录

  解决办法:将文件copy到当前目录

    1. Pause k8s 镜像下载失败

如果kubernetes集群在内网环境中,无法访问gcr.io网站,则可先通过一台能访问gcr.io的机器下载pause镜像,导出后再导入内网的docker私有镜像仓库中,并在kubelet的启动参数中加上--pod_infra_container_image,然后重启kubelet.

docker pull kubernetes/pause

    1. spec.template.spec.containers[0].securityContext.privileged:Forbidden: disallowed by policy问题

解决方法:kube-apiserverkubelet的启动脚本中添加--allow_privileged=true

步骤:

1.管理节点vim /etc/sysconfig/kube-apiserver

2. 修改KUBE_APISERVER_OPTS='--allow_privileged=true'

3. systemctl daemon-reload

  systemctl restart kube-apiserver

  systemctl status -l kube-apiserver

4.计算节点 vi /etc/sysconfig/kubelet

5. 修改KUBELET_OPTS='--allow_privileged=true'

6. systemctl daemon-reload

  systemctl restart kubelet

  systemctl status -l kubelet

    1. http: server gave HTTP response to HTTPS client

解决方法:vim /etc/docker/daemon.json

         修改{ "insecure-registries":["xxxxxxxxx:5000"] }

        systemctl daemon-reload

        systemctl restart docker

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值