LXCFS 测试遇到的一些问题汇总

环境

软件版本
k8sv1.19.10
docker20.10.7
osUbuntu 20.04.2

部署

使用了大佬的工程 lxcfs-admission-webhook,但对 lxcfs 的镜像做了部分修改

FROM ubuntu:20.04 as build
ENV LXCFS_VERSION 4.0.12
RUN apt update \
    && DEBIAN_FRONTEND=noninteractive apt install -y build-essential wget meson python3-pip cmake fuse libfuse-dev pkg-config 
RUN pip3 install jinja2
RUN wget https://linuxcontainers.org/downloads/lxcfs/lxcfs-$LXCFS_VERSION.tar.gz \
	&& mkdir /lxcfs \
	&& tar xzvf lxcfs-$LXCFS_VERSION.tar.gz -C /lxcfs  --strip-components=1 \
	&& cd /lxcfs \
	&& ./configure \
	&& make


FROM ubuntu:20.04
STOPSIGNAL SIGINT
COPY --from=build /lxcfs/src/lxcfs /usr/local/bin/lxcfs
COPY --from=build /lxcfs/src/.libs/liblxcfs.so /usr/local/lib/lxcfs/liblxcfs.so
COPY --from=build /lxcfs/src/lxcfs /lxcfs/lxcfs
COPY --from=build /lxcfs/src/.libs/liblxcfs.so /lxcfs/liblxcfs.so
COPY --from=build /usr/lib/x86_64-linux-gnu/libfuse.so.2.9.9 /lxcfs/libfuse.so.2.9.9
COPY --from=build /usr/lib/x86_64-linux-gnu/libulockmgr.so.1.0.1 /lxcfs/libulockmgr.so.1.0.1

COPY start.sh /
CMD ["/start.sh"]

#!/bin/bash

# Cleanup
nsenter -m/proc/1/ns/mnt fusermount -u /var/lib/lxcfs 2> /dev/null || true
nsenter -m/proc/1/ns/mnt [ -L /etc/mtab ] || \
        sed -i "/^lxcfs \/var\/lib\/lxcfs fuse.lxcfs/d" /etc/mtab

# remove /var/lib/lxcfs
rm -rf /var/lib/lxcfs/*

# Prepare
mkdir -p /usr/local/lib/lxcfs /var/lib/lxcfs

# Update lxcfs
cp -f /lxcfs/lxcfs /usr/local/bin/lxcfs
cp -f /lxcfs/liblxcfs.so /usr/local/lib/lxcfs/liblxcfs.so

cp -f /lxcfs/libfuse.so.2.9.9 /usr/lib64/libfuse.so.2.9.9
cp -f /lxcfs/libulockmgr.so.1.0.1 /usr/lib64/libulockmgr.so.1.0.1

ln -s /usr/lib64/libfuse.so.2.9.9 /usr/lib64/libfuse.so.2
ln -s /usr/lib64/libulockmgr.so.1.0.1 /usr/lib64/libulockmgr.so.1

# Mount
exec nsenter -m/proc/1/ns/mnt /usr/local/bin/lxcfs /var/lib/lxcfs/  --enable-cfs -l

测试遇到的一些问题

角色

角色用途其他
lxcfslxcfs 是一个开源的 FUSE 的用户态文件系统,用来实现来支持LXC容器使容器内部获得正确的限制的 cpu、内存等信息(daemonset 方式部署)
lxcfs-admission-webhook按需拦截 pod 的创建, patch lxcfs 相关的 volume一个 deployment
PODS需要获取正确数据的程序nginx、java 等

测试的 pod

apiVersion: apps/v1
kind: Deployment
metadata:
  name: test-lxcfs
spec:
  replicas: 1
  selector:
    matchLabels:
      app: testlxcfs
  template:
    metadata:
      labels:
        app: testlxcfs
    spec:
      containers:
        - name: alpine
          #image: ubuntu:20.04
          image: alpine:3.16
          command: ["/bin/sh"]
          args: ["-c", "sleep 1d"]
          imagePullPolicy: Always
          resources:
            requests:
              memory: "256Mi"
              cpu: "0.2"
            limits:
              memory: "1024Mi"
              cpu: "0.5"


正常情况

### 1 cpu
# kubectl exec -it test-lxcfs-5b449777dd-fl7m9 -- cat /proc/cpuinfo | grep processor | wc -l
1
### 1G memory
# kubectl exec -it test-lxcfs-5b449777dd-fl7m9 -- cat /proc/meminfo  | grep MemTotal
MemTotal:        1048576 kB


问题

小概率发生

lxcfs 异常

  • 已经正常运行的 pod 不能获取 cpu、内存,即使 lxcfs 恢复也是一样的情况

处理:Lxcfs调研测试container_remount_lxcfs.,简单来说就是利用 linux mnt 的 namespace 加 nsenter 重新 mount

### lxcfs 异常
# kubectl exec -it test-lxcfs-5b449777dd-fl7m9 -- cat /proc/cpuinfo 
cat: can't open '/proc/cpuinfo': Socket not connected
command terminated with exit code 
# kubectl exec -it test-lxcfs-5b449777dd-fl7m9 -- cat /proc/meminfo 
cat: can't open '/proc/meminfo': Socket not connected
command terminated with exit code 1
### lxcfs 恢复正常
# kubectl exec -it test-lxcfs-5b449777dd-fl7m9 -- cat /proc/cpuinfo 
cat: can't open '/proc/cpuinfo': Socket not connected
command terminated with exit code 1
# kubectl exec -it test-lxcfs-5b449777dd-fl7m9 -- cat /proc/meminfo
cat: can't open '/proc/meminfo': Socket not connected
command terminated with exit code 1
### 重建 pod
# kubectl exec -it test-lxcfs-5b449777dd-bzgqm -- cat /proc/cpuinfo | grep processor | wc -l
1
# kubectl exec -it test-lxcfs-5b449777dd-bzgqm -- cat /proc/meminfo  | grep MemTotal
MemTotal:        1048576 kB

  • 新建 pod 不能正常启动,直到 lxcfs 恢复
# kubectl get pods -w 
NAME                          READY   STATUS    RESTARTS   AGE
test-lxcfs-5b449777dd-r5g68   0/1     Pending   0          0s
test-lxcfs-5b449777dd-r5g68   0/1     Pending   0          0s
test-lxcfs-5b449777dd-r5g68   0/1     Init:0/1   0          0s
test-lxcfs-5b449777dd-r5g68   0/1     Init:0/1   0          1s
test-lxcfs-5b449777dd-r5g68   0/1     PodInitializing   0          2s
test-lxcfs-5b449777dd-r5g68   0/1     RunContainerError   0          4s
test-lxcfs-5b449777dd-r5g68   0/1     RunContainerError   1          5s
test-lxcfs-5b449777dd-r5g68   0/1     CrashLoopBackOff    1          17s
test-lxcfs-5b449777dd-r5g68   0/1     RunContainerError   2          38s
test-lxcfs-5b449777dd-r5g68   0/1     RunContainerError   3          51s
test-lxcfs-5b449777dd-r5g68   0/1     CrashLoopBackOff    3          52s
test-lxcfs-5b449777dd-r5g68   1/1     Running             4          90s

### describe 信息
Error: failed to start container "alpine": Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: rootfs_linux.go:76: mounting "/var/lib/lxcfs/proc/loadavg" to rootfs at "/proc/loadavg" caused: mount through procfd: not a directory: unknown: Are you trying to mount a directory onto a file (or vice-versa)? Check if the specified host path exists and is the expected type

### Running 之后能够获取正确的信息
# kubectl exec -it test-lxcfs-5b449777dd-r5g68 -- cat /proc/cpuinfo | grep processor | wc -l
1
# kubectl exec -it test-lxcfs-5b449777dd-r5g68 -- cat /proc/meminfo  | grep MemTotal
MemTotal:        1048576 kB

lxcfs-admission-webhook 异常

pod 将不会被 path lxcfs 相关的 volume,pod 获取的信息和宿主机一致;需要 lxcfs-admission-webhook 恢复正常只有重建 pod

lxcfs、lxcfs-admission-webhook 启动优先级

需要确保启动优先级高于其他 pod(服务器断电,重启等情况)

Thanks

official
在 Kubernetes 中使用最新的 LXCFS
lxcfs 原理
lxcfs-admission-webhook

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值