概述
kube-controller-manager组件中,有两个controller与存储相关,分别是PV controller与AD controller。
基于tag v1.17.4
https://github.com/kubernetes/kubernetes/releases/tag/v1.17.4
PV Cotroller分析
这节先对PV controller进行分析。
涉及主要对象
(1)Persistent Volume (PV): 持久化存储卷,详细定义了存储的各项参数。
(2)Persistent Volume Claim (PVC):持久化存储卷的使用声明,也即说明需要什么样的多大的存储。
(3)StorageClass:创建pv的模板,定义了创建存储的模板参数。
PV对象主要状态变更
(1)available --> bound:一个pv对象创建出来后,处于available状态。pv controller会为pvc对象寻找合适的pv对象与之绑定,随即pv对象状态变更为bound。
(2)bound --> released:当与pv绑定的pvc对象被删除后,如果回收逻辑为retain,则pv对象状态变更为released。
pvc对象主要状态变更
(1)pending --> bound:一个pvc对象创建出来后,处于pending状态。pv controller会为pvc对象寻找合适的pv对象与之绑定,随即pvc对象状态变更为bound。
pvc如何选择合适的pv来绑定?
(1)volumeName匹配:当pvc对象中指定了volumeName属性,则会直接查询名称为该volumeName属性值一致的pv,并与之绑定,当该pv不存在时,该pvc会一直处于pending状态;
(2)volumeMode匹配:选择具有与pvc相同的volumeMode的pv(Block/FileSystem);
(3)storageclass匹配:选择具有与pvc相同的stroageclass名称的pv;
(4)accessMode匹配:选择具有与pvc相同的accessMode的pv;
(5)size检查:选择size大于等于且最接近pvc的size声明的pv。
其他:当一个 PVC 找不到合适的 PV 时,相应的volume plugin就会根据 StorageClass对象的参数配置去做一个动态创建 PV 的操作;而当存在一个合适的 PV 时,就会直接与现有的 PV绑定,而不再去动态创建。当pvc的volumeName属性不为空时,任何情况下都不会触发动态创建pv的操作。
pv与pvc提前绑定特性
当一个pv的spec.claimRef
属性指定了pvc时,则该pv只会与指定pvc绑定,不会与其他pvc绑定。示例如下:
apiVersion: v1
kind: PersistentVolume
metadata:
name: pvc-226aad72-c9ca-48d7-a0b2-c0f7599a3132
spec:
claimRef:
apiVersion: v1
kind: PersistentVolumeClaim
name: rbd-test-1
namespace: test
resourceVersion: "260347739"
uid: 226aad72-c9ca-48d7-a0b2-c0f7599a3132
...
由out-tree volume plugin来创建的pv,就使用了pv与pvc提前绑定的特性,创建pv时带上spec.claimRef
属性,指定了特定的pvc,防止并发操作时出现多个pvc与多个pv交叉绑定导致出错的情况。
PV Cotroller作用
PV Cotroller全称PersistentVolume controller,主要负责:
(1)pv、pvc对象的绑定;
(2)pv、pvc对象的生命周期管理(如创建/删除底层存储,创建/删除pv对象,pv与pvc对象的状态变更)。
注意
(1)当一个pvc创建出来后,pv controller会先寻找现存的合适的pv与之绑定,当找不到合适的pv时,才会去创建新的pv。
(2)前面说过,根据源码所在位置,volume plugin分为in-tree与out-tree两个部分。
(3)创建/删除底层存储、创建/删除pv对象的操作,由PV controller调用volume plugin(in-tree)来完成。如果是k8s通过ceph-csi(csi plugin)来使用ceph存储,volume plugin为ceph-csi,属于out-tree,所以创建/删除底层存储、创建/删除pv对象的操作由external-provisioner来完成。
PV和PVC的源码处理逻辑都在kubernetes/pkg/controller/volume/persistentvolume/pv_controller_base.go
和kubernetes/pkg/controller/volume/persistentvolume/pv_controller.go
这两个文件中。
源码分析入口
直接看到PersistentVolumeController
的Run
方法,主要就是起了三个Goroutine,分别运行3个方法,下面将一一分析。
// kubernetes/pkg/controller/volume/persistentvolume/pv_controller_base.go
func (ctrl *PersistentVolumeController) Run(stopCh <-chan struct{}) {
defer utilruntime.HandleCrash()
defer ctrl.claimQueue.ShutDown()
defer ctrl.volumeQueue.ShutDown()
klog.Infof("Starting persistent volume controller")
defer klog.Infof("Shutting down persistent volume controller")
if !cache.WaitForNamedCacheSync("persistent volume", stopCh, ctrl.volumeListerSynced, ctrl.claimListerSynced, ctrl.classListerSynced, ctrl.podListerSynced, ctrl.NodeListerSynced) {
return
}
ctrl.initializeCaches(ctrl.volumeLister, ctrl.claimLister)
go wait.Until(ctrl.resync, ctrl.resyncPeriod, stopCh)
go wait.Until(ctrl.volumeWorker, time.Second, stopCh)
go wait.Until(ctrl.claimWorker, time.Second, stopCh)
metrics.Register(ctrl.volumes.store, ctrl.claims)
<-stopCh
}
1 ctrl.resync
resync方法十分简单,主要作用是定时循环查询出pv和pvc列表,然后放入到队列volumeQueue和claimQueue中,让volumeWorker和claimWorker进行消费。
//kubernetes/pkg/controller/volume/persistentvolume/pv_controller_base.go
func (ctrl *PersistentVolumeController) resync() {
klog.V(4).Infof("resyncing PV controller")
pvcs, err := ctrl.claimLister.List(labels.NewSelector())
if err != nil {
klog.Warningf("cannot list claims: %s", err)
return
}
for _, pvc := range pvcs {
ctrl.enqueueWork(ctrl.claimQueue, pvc)
}
pvs, err := ctrl.volumeLister.List(labels.NewSelector())
if err != nil {
klog.Warningf("cannot list persistent volumes: %s", err)
return
}
for _, pv := range pvs {
ctrl.enqueueWork(ctrl.volumeQueue, pv)
}
}
2 ctrl.volumeWorker
volumeWorker方法主要是维护pv的状态,根据不同的状况对pv对象的状态值进行更新。当找不到与pv绑定的pvc时,会调用volume plugin来删除底层存储,并删除pv对象(这里包含了in-tree与out-tree两条逻辑)。
volumeWorker会不断循环消费volumeQueue队列里面的数据,然后获取到相应的PV执行updateVolume操作。
//kubernetes/pkg/controller/volume/persistentvolume/pv_controller_base.go
func (ctrl *PersistentVolumeController) volumeWorker() {
workFunc := func() bool {
keyObj, quit := ctrl.volumeQueue.Get()
if quit {
return true
}
defer ctrl.volumeQueue.Done(keyObj)
key := keyObj.(string)
klog.V(5).Infof("volumeWorker[%s]", key)
_, name, err := cache.SplitMetaNamespaceKey(key)
if err != nil {
klog.V(4).Infof("error getting name of volume %q to get volume from informer: %v", key, err)
return false
}
volume, err := ctrl.volumeLister.Get(name)
if err == nil {
// The volume still exists in informer cache, the event must have
// been add/update/sync
ctrl.updateVolume(volume)
return false
}
if !errors.IsNotFound(err) {
klog.V(2).Infof("error getting volume %q from informer: %v", key, err)
return false
}
// The volume is not in informer cache, the event must have been
// "delete"
volumeObj, found, err := ctrl.volumes.store.GetByKey(key)
if err != nil {
klog.V(2).Infof("error getting volume %q from cache: %v", key, err)
return false
}
if !found {
// The controller has already processed the delete event and
// deleted the volume from its cache
klog.V(2).Infof("deletion of volume %q was already processed", key)
return false
}
volume, ok := volumeObj.(*v1.PersistentVolume)
if !ok {
klog.Errorf("expected volume, got %+v", volumeObj)
return false
}
ctrl.deleteVolume(volume)
return false
}
for {
if quit := workFunc(); quit {
klog.Infof("volume worker queue shutting down")
return
}
}
}
2.1 ctrl.updateVolume
updateVolume方法会调用syncVolume方法,执行核心流程。
func (ctrl *PersistentVolumeController) updateVolume(volume *v1.PersistentVolume) {
// Store the new volume version in the cache and do not process it if this
// is an old version.
//更新缓存
new, err := ctrl.storeVolumeUpdate(volume)
if err != nil {
klog.Errorf("%v", err)
}
if !new {
return
}
//核心方法,根据当前 PV 对象的规格对 PV 和 PVC 进行绑定或者解绑
err = ctrl.syncVolume(volume)
if err != nil {
if errors.IsConflict(err) {
// Version conflict error happens quite often and the controller
// recovers from it easily.
klog.V(3).Infof("could not sync volume %q: %+v", volume.Name, err)
} else {
klog.Errorf("could not sync volume %q: %+v", volume.Name, err)
}
}
}
ctrl.syncVolume
syncVolume方法为核心方法,主要调谐更新pv的状态:
(1)如果spec.claimRef未设置,则是未使用过的pv,则调用updateVolumePhase函数更新状态设置 phase 为 available;
(2)如果spec.claimRef不为空,则该pv已经与pvc bound过了,此时若对应的pvc不存在,则更新pv状态为released;
(3)如果pv对应的pvc被删除了,调用ctrl.reclaimVolume根据pv的回收策略进行相应操作,如果是retain,则不做操作,如果是delete,则调用volume plugin来删除底层存储,并删除pv对象(当volume plugin为csi时,将走out-tree逻辑,pv controller不做删除存储与pv对象的操作,由external provisioner组件来完成该操作)。
func (ctrl *PersistentVolumeController) syncVolume(volume *v1.PersistentVolume) error {
klog.V(4).Infof("synchronizing PersistentVolume[%s]: %s", volume.Name, getVolumeStatusForLogging(volume))
...
//如果spec.claimRef未设置,则是未使用过的pv,则调用updateVolumePhase函数更新状态设置 phase 为 available
if volume.Spec.ClaimRef == nil {
klog.V(4).Infof("synchronizing PersistentVolume[%s]: volume is unused", volume.Name)
if _, err := ctrl.updateVolumePhase(volume, v1.VolumeAvailable, ""); err != nil {
return err
}
return nil
} else /* pv.Spec.ClaimRef != nil */ {
//正在被bound中,更新状态available
if volume.Spec.ClaimRef.UID == "" {
klog.V(4).Infof("synchronizing PersistentVolume[%s]: volume is pre-bound to claim %s", volume.Name, claimrefToClaimKey(volume.Spec.ClaimRef))
if _, err := ctrl.updateVolumePhase(volume, v1.VolumeAvailable, ""); err != nil {
return err
}
return nil
}
klog.V(4).Infof("synchronizing PersistentVolume[%s]: volume is bound to claim %s", volume.Name, claimrefToClaimKey(volume.Spec.ClaimRef))
// Get the PVC by _name_
var claim *v1.PersistentVolumeClaim
//根据 pv 的 claimRef 获得 pvc
claimName := claimrefToClaimKey(volume.Spec.ClaimRef)
obj, found, err := ctrl.claims.GetByKey(claimName)
if err != nil {
return err
}
//如果在队列未发现,可能是volume被删除了,或者失败了,重新同步pvc
if !found && metav1.HasAnnotation(volume.ObjectMeta, pvutil.AnnBoundByController) {
if volume.Status.Phase != v1.VolumeReleased && volume.Status.Phase != v1.VolumeFailed {
obj, err = ctrl.claimLister.PersistentVolumeClaims(volume.Spec.ClaimRef.Namespace).Get(volume.Spec.ClaimRef.Name)
if err != nil && !apierrors.IsNotFound(err) {
return err
}
found = !apierrors.IsNotFound(err)
if !found {
obj, err = ctrl.kubeClient.CoreV1().PersistentVolumeClaims(volume.Spec.ClaimRef.Namespace).Get(context.TODO(), volume.Spec.ClaimRef.Name, metav1.GetOptions{})
if err != nil && !apierrors.IsNotFound(err) {
return err
}
found = !apierrors.IsNotFound(err)
}
}
}
if !found {
klog.V(4).Infof("synchronizing PersistentVolume[%s]: claim %s not found", volume.Name, claimrefToClaimKey(volume.Spec.ClaimRef))
} else {
var ok bool
claim, ok = obj.(*v1.PersistentVolumeClaim)
if !ok {
return fmt.Errorf("Cannot convert object from volume cache to volume %q!?: %#v", claim.Spec.VolumeName, obj)
}
klog.V(4).Infof("synchronizing PersistentVolume[%s]: claim %s found: %s", volume.Name, claimrefToClaimKey(volume.Spec.ClaimRef), getClaimStatusForLogging(claim))
}
if claim != nil && claim.UID != volume.Spec.ClaimRef.UID {
klog.V(4).Infof("synchronizing PersistentVolume[%s]: claim %s has different UID, the old one must have been deleted", volume.Name, claimrefToClaimKey(volume.Spec.ClaimRef))
// Treat the volume as bound to a missing claim.
claim = nil
}
//claim可能被删除了,或者pv被删除了
if claim == nil {
if volume.Status.Phase != v1.VolumeReleased && volume.Status.Phase != v1.VolumeFailed {
// Also, log this only once:
klog.V(2).Infof("volume %q is released and reclaim policy %q will be executed", volume.Name, volume.Spec.PersistentVolumeReclaimPolicy)
if volume, err = ctrl.updateVolumePhase(volume, v1.VolumeReleased, ""); err != nil {
return err
}
}
//根据persistentVolumeReclaimPolicy配置做相应的处理,Retain 保留/ Delete 删除/ Recycle 回收
if err = ctrl.reclaimVolume(volume); err != nil {
return err
}
if volume.Spec.PersistentVolumeReclaimPolicy == v1.PersistentVolumeReclaimRetain {
// volume is being retained, it references a claim that does not exist now.
klog.V(4).Infof("PersistentVolume[%s] references a claim %q (%s) that is not found", volume.Name, claimrefToClaimKey(volume.Spec.ClaimRef), volume.Spec.ClaimRef.UID)
}
return nil
} else if claim.Spec.VolumeName == "" {
if pvutil.CheckVolumeModeMismatches(&claim.Spec, &volume.Spec) {
volumeMsg := fmt.Sprintf("Cannot bind PersistentVolume to requested PersistentVolumeClaim %q due to incompatible volumeMode.", claim.Name)
ctrl.eventRecorder.Event(volume, v1.EventTypeWarning, events.VolumeMismatch, volumeMsg)
claimMsg := fmt.Sprintf("Cannot bind PersistentVolume %q to requested PersistentVolumeClaim due to incompatible volumeMode.", volume.Name)
ctrl.eventRecorder.Event(claim, v1.EventTypeWarning, events.VolumeMismatch, claimMsg)
// Skipping syncClaim
return nil
}
if metav1.HasAnnotation(volume.ObjectMeta, pvutil.AnnBoundByController) {
klog.V(4).Infof("synchronizing PersistentVolume[%s]: volume not bound yet, waiting for syncClaim to fix it", volume.Name)
} else {
klog.V(4).Infof("synchronizing PersistentVolume[%s]: volume was bound and got unbound (by user?), waiting for syncClaim to fix it", volume.Name)
}
ctrl.claimQueue.Add(claimToClaimKey(claim))
return nil
// 已经绑定更新状态status phase为Bound
} else if claim.Spec.VolumeName == volume.Name {
// Volume is bound to a claim properly, update status if necessary
klog.V(4).Infof("synchronizing PersistentVolume[%s]: all is bound", volume.Name)
if _, err = ctrl.updateVolumePhase(volume, v1.VolumeBound, ""); err != nil {
// Nothing was saved; we will fall back into the same
// condition in the next call to this method
return err
}
return nil
// PV绑定到PVC上,但是PVC被绑定到其他PV上,重置
} else {
// Volume is bound to a claim, but the claim is bound elsewhere
if metav1.HasAnnotation(volume.ObjectMeta, pvutil.AnnDynamicallyProvisioned) && volume.Spec.PersistentVolumeReclaimPolicy == v1.PersistentVolumeReclaimDelete {
if volume.Status.Phase != v1.VolumeReleased && volume.Status.Phase != v1.VolumeFailed {
klog.V(2).Infof("dynamically volume %q is released and it will be deleted", volume.Name)
if volume, err = ctrl.updateVolumePhase(volume, v1.VolumeReleased, ""); err != nil {
return err
}
}
if err = ctrl.reclaimVolume(volume); err != nil {
return err
}
return nil
} else {
if metav1.HasAnnotation(volume.ObjectMeta, pvutil.AnnBoundByController) {
klog.V(4).Infof("synchronizing PersistentVolume[%s]: volume is bound by controller to a claim that is bound to another volume, unbinding", volume.Name)
if err = ctrl.unbindVolume(volume); err != nil {
return err
}
return nil
} else {
klog.V(4).Infof("synchronizing PersistentVolume[%s]: volume is bound by user to a claim that is bound to another volume, waiting for the claim to get unbound", volume.Name)
if err = ctrl.unbindVolume(volume); err != nil {
return err
}
return nil
}
}
}
}
}
ctrl.reclaimVolume
(1)根据pv对象的annotation:pv.kubernetes.io/provisioned-by
配置,决定走in-tree或out-tree逻辑来删除存储、删除pv对象。
(2)out-tree逻辑:让相应的plugin来删除存储,并删除pv对象(例:当volume plugin为ceph-csi时,由external provisioner来完成删除存储与pv对象的操作)。
(3)in-tree逻辑:在ctrl.doDeleteVolume方法中进行删除存储的操作,在ctrl.deleteVolumeOperation方法中进行删除pv对象的操作。
// reclaimVolume implements volume.Spec.PersistentVolumeReclaimPolicy and
// starts appropriate reclaim action.
func (ctrl *PersistentVolumeController) reclaimVolume(volume *v1.PersistentVolume) error {
if migrated := volume.Annotations[pvutil.AnnMigratedTo]; len(migrated) > 0 {
// PV is Migrated. The PV controller should stand down and the external
// provisioner will handle this PV
return nil
}
switch volume.Spec.PersistentVolumeReclaimPolicy {
case v1.PersistentVolumeReclaimRetain:
klog.V(4).Infof("reclaimVolume[%s]: policy is Retain, nothing to do", volume.Name)
case v1.PersistentVolumeReclaimRecycle:
klog.V(4).Infof("reclaimVolume[%s]: policy is Recycle", volume.Name)
opName := fmt.Sprintf("recycle-%s[%s]", volume.Name, string(volume.UID))
ctrl.scheduleOperation(opName, func() error {
ctrl.recycleVolumeOperation(volume)
return nil
})
case v1.PersistentVolumeReclaimDelete:
klog.V(4).Infof("reclaimVolume[%s]: policy is Delete", volume.Name)
opName := fmt.Sprintf("delete-%s[%s]", volume.Name, string(volume.UID))
// create a start timestamp entry in cache for deletion operation if no one exists with
// key = volume.Name, pluginName = provisionerName, operation = "delete"
ctrl.operationTimestamps.AddIfNotExist(volume.Name, ctrl.getProvisionerNameFromVolume(volume), "delete")
ctrl.scheduleOperation(opName, func() error {
_, err := ctrl.deleteVolumeOperation(volume)
if err != nil {
// only report error count to "volume_operation_total_errors"
// latency reporting will happen when the volume get finally
// deleted and a volume deleted event is captured
metrics.RecordMetric(volume.Name, &ctrl.operationTimestamps, err)
}
return err
})
default:
// Unknown PersistentVolumeReclaimPolicy
if _, err := ctrl.updateVolumePhaseWithEvent(volume, v1.VolumeFailed, v1.EventTypeWarning, "VolumeUnknownReclaimPolicy", "Volume has unrecognized PersistentVolumeReclaimPolicy"); err != nil {
return err
}
}
return nil
}
// deleteVolumeOperation deletes a volume. This method is running in standalone
// goroutine and already has all necessary locks.
func (ctrl *PersistentVolumeController) deleteVolumeOperation(volume *v1.PersistentVolume) (string, error) {
klog.V(4).Infof("deleteVolumeOperation [%s] started", volume.Name)
// This method may have been waiting for a volume lock for some time.
// Previous deleteVolumeOperation might just have saved an updated version, so
// read current volume state now.
newVolume, err := ctrl.kubeClient.CoreV1().PersistentVolumes().Get(volume.Name, metav1.GetOptions{})
if err != nil {
klog.V(3).Infof("error reading persistent volume %q: %v", volume.Name, err)
return "", nil
}
needsReclaim, err := ctrl.isVolumeReleased(newVolume)
if err != nil {
klog.V(3).Infof("error reading claim for volume %q: %v", volume.Name, err)
return "", nil
}
if !needsReclaim {
klog.V(3).Infof("volume %q no longer needs deletion, skipping", volume.Name)
return "", nil
}
pluginName, deleted, err := ctrl.doDeleteVolume(volume)
if err != nil {
// Delete failed, update the volume and emit an event.
klog.V(3).Infof("deletion of volume %q failed: %v", volume.Name, err)
if volerr.IsDeletedVolumeInUse(err) {
// The plugin needs more time, don't mark the volume as Failed
// and send Normal event only
ctrl.eventRecorder.Event(volume, v1.EventTypeNormal, events.VolumeDelete, err.Error())
} else {
// The plugin failed, mark the volume as Failed and send Warning
// event
if _, err := ctrl.updateVolumePhaseWithEvent(volume, v1.VolumeFailed, v1.EventTypeWarning, events.VolumeFailedDelete, err.Error()); err != nil {
klog.V(4).Infof("deleteVolumeOperation [%s]: failed to mark volume as failed: %v", volume.Name, err)
// Save failed, retry on the next deletion attempt
return pluginName, err
}
}
// Despite the volume being Failed, the controller will retry deleting
// the volume in every syncVolume() call.
return pluginName, err
}
if !deleted {
// The volume waits for deletion by an external plugin. Do nothing.
return pluginName, nil
}
klog.V(4).Infof("deleteVolumeOperation [%s]: success", volume.Name)
// Delete the volume
if err = ctrl.kubeClient.CoreV1().PersistentVolumes().Delete(volume.Name, nil); err != nil {
// Oops, could not delete the volume and therefore the controller will
// try to delete the volume again on next update. We _could_ maintain a
// cache of "recently deleted volumes" and avoid unnecessary deletion,
// this is left out as future optimization.
klog.V(3).Infof("failed to delete volume %q from database: %v", volume.Name, err)
return pluginName, nil
}
return pluginName, nil
}
3 ctrl.claimWorker
claimWorker主要是维护pvc的状态,根据不同的状况对pvc对象的状态值进行更新,主要是给pvc找到合适的pv做绑定bound操作;当找不到合适的pv时,会调用volume plugin来创建底层存储,并创建pv对象(这里包含了in-tree和out-tree两条逻辑)。
claimWorker会不断循环消费claimQueue队列里面的数据,然后获取到相应的PVC执行updateClaim方法做进一步处理。
//kubernetes/pkg/controller/volume/persistentvolume/pv_controller_base.go
func (ctrl *PersistentVolumeController) claimWorker() {
workFunc := func() bool {
keyObj, quit := ctrl.claimQueue.Get()
if quit {
return true
}
defer ctrl.claimQueue.Done(keyObj)
key := keyObj.(string)
klog.V(5).Infof("claimWorker[%s]", key)
namespace, name, err := cache.SplitMetaNamespaceKey(key)
if err != nil {
klog.V(4).Infof("error getting namespace & name of claim %q to get claim from informer: %v", key, err)
return false
}
claim, err := ctrl.claimLister.PersistentVolumeClaims(namespace).Get(name)
if err == nil {
// The claim still exists in informer cache, the event must have
// been add/update/sync
ctrl.updateClaim(claim)
return false
}
if !errors.IsNotFound(err) {
klog.V(2).Infof("error getting claim %q from informer: %v", key, err)
return false
}
// The claim is not in informer cache, the event must have been "delete"
claimObj, found, err := ctrl.claims.GetByKey(key)
if err != nil {
klog.V(2).Infof("error getting claim %q from cache: %v", key, err)
return false
}
if !found {
// The controller has already processed the delete event and
// deleted the claim from its cache
klog.V(2).Infof("deletion of claim %q was already processed", key)
return false
}
claim, ok := claimObj.(*v1.PersistentVolumeClaim)
if !ok {
klog.Errorf("expected claim, got %+v", claimObj)
return false
}
ctrl.deleteClaim(claim)
return false
}
for {
if quit := workFunc(); quit {
klog.Infof("claim worker queue shutting down")
return
}
}
}
3.1 ctrl.updateClaim
主要调用ctrl.syncClaim做进一步处理。
//kubernetes/pkg/controller/volume/persistentvolume/pv_controller_base.go
// updateClaim runs in worker thread and handles "claim added",
// "claim updated" and "periodic sync" events.
func (ctrl *PersistentVolumeController) updateClaim(claim *v1.PersistentVolumeClaim) {
// Store the new claim version in the cache and do not process it if this is
// an old version.
new, err := ctrl.storeClaimUpdate(claim)
if err != nil {
klog.Errorf("%v", err)
}
if !new {
return
}
err = ctrl.syncClaim(claim)
if err != nil {
if errors.IsConflict(err) {
// Version conflict error happens quite often and the controller
// recovers from it easily.
klog.V(3).Infof("could not sync claim %q: %+v", claimToClaimKey(claim), err)
} else {
klog.Errorf("could not sync volume %q: %+v", claimToClaimKey(claim), err)
}
}
}
3.1.1 ctrl.syncClaim
主要逻辑:
(1)检查pvc中key为"pv.kubernetes.io/bind-completed"的annotation,有则说明该pvc已经完成了绑定操作;
(2)没有该annotation,则调用ctrl.syncUnboundClaim,给Unbound的pvc,找到对应的PV,执行绑定操作;
(3)有该annotation,则调用ctrl.syncBoundClaim,看是否需要做修复逻辑。
//kubernetes/pkg/controller/volume/persistentvolume/pv_controller.go
func (ctrl *PersistentVolumeController) syncClaim(claim *v1.PersistentVolumeClaim) error {
klog.V(4).Infof("synchronizing PersistentVolumeClaim[%s]: %s", claimToClaimKey(claim), getClaimStatusForLogging(claim))
// Set correct "migrated-to" annotations on PVC and update in API server if
// necessary
newClaim, err := ctrl.updateClaimMigrationAnnotations(claim)
if err != nil {
// Nothing was saved; we will fall back into the same
// condition in the next call to this method
return err
}
claim = newClaim
if !metav1.HasAnnotation(claim.ObjectMeta, pvutil.AnnBindCompleted) {
return ctrl.syncUnboundClaim(claim)
} else {
return ctrl.syncBoundClaim(claim)
}
}
ctrl.syncUnboundClaim主要逻辑:
(1)调用ctrl.volumes.findBestMatchForClaim
,给Unbound的pvc,找到对应的PV,调用bind执行绑定操作;
(2)当找不到合适的pv时,调用ctrl.provisionClaim来做进一步操作。
ctrl.volumes.findBestMatchForClaim主要逻辑:给pvc找到合适的pv列表。怎么找呢,其实一开始介绍pvc如何选择合适的pv来绑定的时候就有介绍过,具体逻辑可以自己看一下代码,pv与pvc提前绑定特性也在该方法里有所体现。
ctrl.provisionClaim主要逻辑:
(1)获取pvc对应的storageclass对象,根据volume plugin的配置,决定走in-tree或out-tree逻辑来创建存储、创建pv对象。
(2)out-tree逻辑:调用ctrl.provisionClaimOperationExternal来给pvc对象设置annotation:volume.beta.kubernetes.io/storage-provisioner:{plugin name}
,让相应的plugin来创建存储,并创建pv对象(例:当volume plugin为ceph-csi时,由external provisioner来完成创建存储与pv对象的操作)。
(3)in-tree逻辑:调用ctrl.provisionClaimOperation来创建存储,并创建pv对象。
//kubernetes/pkg/controller/volume/persistentvolume/pv_controller.go
func (ctrl *PersistentVolumeController) syncUnboundClaim(claim *v1.PersistentVolumeClaim) error {
// This is a new PVC that has not completed binding
// OBSERVATION: pvc is "Pending"
if claim.Spec.VolumeName == "" {
// User did not care which PV they get.
delayBinding, err := pvutil.IsDelayBindingMode(claim, ctrl.classLister)
if err != nil {
return err
}
// [Unit test set 1]
volume, err := ctrl.volumes.findBestMatchForClaim(claim, delayBinding)
if err != nil {
klog.V(2).Infof("synchronizing unbound PersistentVolumeClaim[%s]: Error finding PV for claim: %v", claimToClaimKey(claim), err)
return fmt.Errorf("Error finding PV for claim %q: %v", claimToClaimKey(claim), err)
}
if volume == nil {
klog.V(4).Infof("synchronizing unbound PersistentVolumeClaim[%s]: no volume found", claimToClaimKey(claim))
// No PV could be found
// OBSERVATION: pvc is "Pending", will retry
switch {
case delayBinding && !pvutil.IsDelayBindingProvisioning(claim):
ctrl.eventRecorder.Event(claim, v1.EventTypeNormal, events.WaitForFirstConsumer, "waiting for first consumer to be created before binding")
case v1helper.GetPersistentVolumeClaimClass(claim) != "":
if err = ctrl.provisionClaim(claim); err != nil {
return err
}
return nil
default:
ctrl.eventRecorder.Event(claim, v1.EventTypeNormal, events.FailedBinding, "no persistent volumes available for this claim and no storage class is set")
}
// Mark the claim as Pending and try to find a match in the next
// periodic syncClaim
if _, err = ctrl.updateClaimStatus(claim, v1.ClaimPending, nil); err != nil {
return err
}
return nil
} else /* pv != nil */ {
// Found a PV for this claim
// OBSERVATION: pvc is "Pending", pv is "Available"
claimKey := claimToClaimKey(claim)
klog.V(4).Infof("synchronizing unbound PersistentVolumeClaim[%s]: volume %q found: %s", claimKey, volume.Name, getVolumeStatusForLogging(volume))
if err = ctrl.bind(volume, claim); err != nil {
// On any error saving the volume or the claim, subsequent
// syncClaim will finish the binding.
// record count error for provision if exists
// timestamp entry will remain in cache until a success binding has happened
metrics.RecordMetric(claimKey, &ctrl.operationTimestamps, err)
return err
}
// OBSERVATION: claim is "Bound", pv is "Bound"
// if exists a timestamp entry in cache, record end to end provision latency and clean up cache
// End of the provision + binding operation lifecycle, cache will be cleaned by "RecordMetric"
// [Unit test 12-1, 12-2, 12-4]
metrics.RecordMetric(claimKey, &ctrl.operationTimestamps, nil)
return nil
}
} else /* pvc.Spec.VolumeName != nil */ {
// [Unit test set 2]
// User asked for a specific PV.
klog.V(4).Infof("synchronizing unbound PersistentVolumeClaim[%s]: volume %q requested", claimToClaimKey(claim), claim.Spec.VolumeName)
obj, found, err := ctrl.volumes.store.GetByKey(claim.Spec.VolumeName)
if err != nil {
return err
}
if !found {
// User asked for a PV that does not exist.
// OBSERVATION: pvc is "Pending"
// Retry later.
klog.V(4).Infof("synchronizing unbound PersistentVolumeClaim[%s]: volume %q requested and not found, will try again next time", claimToClaimKey(claim), claim.Spec.VolumeName)
if _, err = ctrl.updateClaimStatus(claim, v1.ClaimPending, nil); err != nil {
return err
}
return nil
} else {
volume, ok := obj.(*v1.PersistentVolume)
if !ok {
return fmt.Errorf("Cannot convert object from volume cache to volume %q!?: %+v", claim.Spec.VolumeName, obj)
}
klog.V(4).Infof("synchronizing unbound PersistentVolumeClaim[%s]: volume %q requested and found: %s", claimToClaimKey(claim), claim.Spec.VolumeName, getVolumeStatusForLogging(volume))
if volume.Spec.ClaimRef == nil {
// User asked for a PV that is not claimed
// OBSERVATION: pvc is "Pending", pv is "Available"
klog.V(4).Infof("synchronizing unbound PersistentVolumeClaim[%s]: volume is unbound, binding", claimToClaimKey(claim))
if err = checkVolumeSatisfyClaim(volume, claim); err != nil {
klog.V(4).Infof("Can't bind the claim to volume %q: %v", volume.Name, err)
// send an event
msg := fmt.Sprintf("Cannot bind to requested volume %q: %s", volume.Name, err)
ctrl.eventRecorder.Event(claim, v1.EventTypeWarning, events.VolumeMismatch, msg)
// volume does not satisfy the requirements of the claim
if _, err = ctrl.updateClaimStatus(claim, v1.ClaimPending, nil); err != nil {
return err
}
} else if err = ctrl.bind(volume, claim); err != nil {
// On any error saving the volume or the claim, subsequent
// syncClaim will finish the binding.
return err
}
// OBSERVATION: pvc is "Bound", pv is "Bound"
return nil
} else if pvutil.IsVolumeBoundToClaim(volume, claim) {
// User asked for a PV that is claimed by this PVC
// OBSERVATION: pvc is "Pending", pv is "Bound"
klog.V(4).Infof("synchronizing unbound PersistentVolumeClaim[%s]: volume already bound, finishing the binding", claimToClaimKey(claim))
// Finish the volume binding by adding claim UID.
if err = ctrl.bind(volume, claim); err != nil {
return err
}
// OBSERVATION: pvc is "Bound", pv is "Bound"
return nil
} else {
// User asked for a PV that is claimed by someone else
// OBSERVATION: pvc is "Pending", pv is "Bound"
if !metav1.HasAnnotation(claim.ObjectMeta, pvutil.AnnBoundByController) {
klog.V(4).Infof("synchronizing unbound PersistentVolumeClaim[%s]: volume already bound to different claim by user, will retry later", claimToClaimKey(claim))
// User asked for a specific PV, retry later
if _, err = ctrl.updateClaimStatus(claim, v1.ClaimPending, nil); err != nil {
return err
}
return nil
} else {
// This should never happen because someone had to remove
// AnnBindCompleted annotation on the claim.
klog.V(4).Infof("synchronizing unbound PersistentVolumeClaim[%s]: volume already bound to different claim %q by controller, THIS SHOULD NEVER HAPPEN", claimToClaimKey(claim), claimrefToClaimKey(volume.Spec.ClaimRef))
return fmt.Errorf("Invalid binding of claim %q to volume %q: volume already claimed by %q", claimToClaimKey(claim), claim.Spec.VolumeName, claimrefToClaimKey(volume.Spec.ClaimRef))
}
}
}
}
}
// provisionClaim starts new asynchronous operation to provision a claim if
// provisioning is enabled.
func (ctrl *PersistentVolumeController) provisionClaim(claim *v1.PersistentVolumeClaim) error {
if !ctrl.enableDynamicProvisioning {
return nil
}
klog.V(4).Infof("provisionClaim[%s]: started", claimToClaimKey(claim))
opName := fmt.Sprintf("provision-%s[%s]", claimToClaimKey(claim), string(claim.UID))
plugin, storageClass, err := ctrl.findProvisionablePlugin(claim)
// findProvisionablePlugin does not return err for external provisioners
if err != nil {
ctrl.eventRecorder.Event(claim, v1.EventTypeWarning, events.ProvisioningFailed, err.Error())
klog.Errorf("error finding provisioning plugin for claim %s: %v", claimToClaimKey(claim), err)
// failed to find the requested provisioning plugin, directly return err for now.
// controller will retry the provisioning in every syncUnboundClaim() call
// retain the original behavior of returning nil from provisionClaim call
return nil
}
ctrl.scheduleOperation(opName, func() error {
// create a start timestamp entry in cache for provision operation if no one exists with
// key = claimKey, pluginName = provisionerName, operation = "provision"
claimKey := claimToClaimKey(claim)
ctrl.operationTimestamps.AddIfNotExist(claimKey, ctrl.getProvisionerName(plugin, storageClass), "provision")
var err error
if plugin == nil {
_, err = ctrl.provisionClaimOperationExternal(claim, storageClass)
} else {
_, err = ctrl.provisionClaimOperation(claim, plugin, storageClass)
}
// if error happened, record an error count metric
// timestamp entry will remain in cache until a success binding has happened
if err != nil {
metrics.RecordMetric(claimKey, &ctrl.operationTimestamps, err)
}
return err
})
return nil
}
至此,pv controller的分析已经完毕,下面进行一下简单的总结。
总结
PV Cotroller全称PersistentVolume controller,主要负责pv、pvc对象的绑定与pv、pvc对象的生命周期管理(如创建/删除底层存储,创建/删除pv对象,pv与pvc对象的状态变更)。
创建存储、pv对象
当一个pvc对象创建出来后,pv controller会为其寻找合适的pv进行绑定,当一个 PVC 找不到合适的 PV 时,相应的volume plugin就会根据 StorageClass对象的参数配置去做一个动态创建 PV 的操作。
根据pvc对应的storageclass对象中volume plugin的配置,决定走in-tree或out-tree逻辑来创建存储、创建pv对象:
(1)out-tree逻辑:pv controller来给pvc对象设置annotation:volume.beta.kubernetes.io/storage-provisioner:{plugin name}
,让相应的plugin来创建存储,并创建pv对象(例:当volume plugin为ceph-csi时,由external provisioner来完成创建存储与pv对象的操作)。
(2)in-tree逻辑:调用内置的volume plungin来创建存储,并创建pv对象。
删除存储、pv对象
当与pv绑定的pvc被删除后,pv的状态变更为released,如果pv配置的回收策略为retain,则不会对pv以及底层存储资源做删除操作,如果pv的回收策略为delete,则调用volume plugin来做底层存储以及pv对象的删除操作。(动态创建的pv,其配置的回收策略继承自storageclass对象)
根据pv对象的annotation:pv.kubernetes.io/provisioned-by
配置,决定走in-tree或out-tree逻辑来删除存储、删除pv对象:
(1)out-tree逻辑:让相应的plugin来删除存储,并删除pv对象(例:当volume plugin为ceph-csi时,由external provisioner来完成删除存储与pv对象的操作)。
(2)in-tree逻辑:pv controller调用内置的volume plungin来删除存储,并创建pv对象。