今天晚上应用变更时,反馈应用无法正常构建。因之前碰到过类似的问题,于是这次专门看了一下。集群为OCP3.11,节点为rhel7。
初步定位为容器mount的路径过长导致,直接登录异常服务器,进行mount点的检查,发现有些路径已经超过的256的长度。
[root@tr730a30-ose ~]# for point in $(findmnt -lk -o TARGET) ; do bytes=$(systemd-escape --path --suffix=mount $point | wc -c); if [[ $bytes -ge 256 ]]; then echo "$bytes $point";fi ; done |grep prod-uqb2bgrp-v-2-17-9-1-v-2-0-20220608174247
277 /var/lib/origin/openshift.local.volumes/pods/e26ac721-e710-11ec-9f01-0050568a55cd/volume-subpaths/prod-uqb2bgrp-v-2-17-9-1-v-2-0-20220608174247-4/prod-uqb2bgrp-v-2-17-9-1-v-2-0-20220608174247/9
278 /var/lib/origin/openshift.local.volumes/pods/e26ac721-e710-11ec-9f01-0050568a55cd/volume-subpaths/prod-uqb2bgrp-v-2-17-9-1-v-2-0-20220608174247-5/prod-uqb2bgrp-v-2-17-9-1-v-2-0-20220608174247/10
接下来要分析一下路径为什么这么长。查看dc的配置,其中secret使用的为subpath模式,当K8S在挂载时,会加入很多的路径。
- mountPath: /opt/appdata/output/shared/uqb2b/UQTrip.jpg
name: prod-uqb2bgrp-v-2-17-9-1-v-2-0-20220608174247-7
subPath: UQTrip.jpg
- mountPath: /opt/appdata/output/shared/uqb2b/cdptrustcacerts.jks
name: prod-uqb2bgrp-v-2-17-9-1-v-2-0-20220608174247-8
subPath: cdptrustcacerts.jks
但仍有个问题让人疑惑,同样的配置,只在部分节点上才出问题。做出以下猜测:
256并不是一个硬限制,正常是可以的,只有异常情况下才不允许创建。
于是查看节点的系统日志,确实有大量的异常信息在刷屏。
Aug 03 19:00:34 tr730a30-ose systemd[1]: Failed to set up mount unit: Invalid argument
Aug 03 19:00:34 tr730a30-ose systemd[1]: Failed to set up mount unit: Invalid argument
Aug 03 19:00:34 tr730a30-ose systemd[1]: Failed to set up mount unit: Invalid argument
Aug 03 19:00:34 tr730a30-ose systemd[1]: Failed to set up mount unit: Invalid argument
Aug 03 19:00:34 tr730a30-ose systemd[1]: Failed to set up mount unit: Invalid argument
Aug 03 19:00:34 tr730a30-ose systemd[1]: Failed to set up mount unit: Invalid argument
Aug 03 19:00:34 tr730a30-ose systemd[1]: Failed to set up mount unit: Invalid argument
Aug 03 19:00:34 tr730a30-ose systemd[1]: Failed to set up mount unit: Invalid argument
Aug 03 19:00:34 tr730a30-ose systemd[1]: Failed to set up mount unit: Invalid argument
Aug 03 19:00:34 tr730a30-ose systemd[1]: Failed to set up mount unit: Invalid argument
后来查了一下红帽的文档,说这个问题它们也没有最终的方案,只给出了如下的临时解决方案
Running systemctl daemon-reload
and systemctl reset-failed
periodically should prevent the issue from happening until resolution can be provided.