kubernetes上的灾难恢复

最新推荐文章于 2023-06-30 11:30:00 发布

weixin_26637765

最新推荐文章于 2023-06-30 11:30:00 发布

阅读量536

点赞数

文章标签： python

原文链接：https://medium.com/dev-genius/disaster-recovery-on-kubernetes-98c5c78382bb

版权

Using VMWare’s Velero to backup and restore, perform disaster recovery, as well as migrate Kubernetes resources.

使用VMWare的Velero进行备份和还原，执行灾难恢复以及迁移Kubernetes资源。

Although Kubernetes (and especially managed Kubernetes services such as GKE, EKS, and AKS) provide out-of-the-box reliability and resiliency with self-healing and horizontal scaling capabilities, production systems still require disaster recovery solutions to protect against human error (e.g. accidentally deleting a namespace or secret) and infrastructure failures outside of Kubernetes (e.g. persistent volumes). While more companies are embracing multi-region solutions, it is a complicated and potentially expensive option if all you need is a simple backup and restore option. In this post, we’ll look at using Velero to backup and restore Kubernetes resources as well as demonstrating its use as a disaster recovery or migration tool.

尽管Kubernetes(特别是托管的Kubernetes服务，例如GKE，EKS和AKS)提供了开箱即用的可靠性和弹性以及自我修复和水平扩展功能，但生产系统仍然需要灾难恢复解决方案来防止人为错误(例如，意外删除名称空间或机密)以及Kubernetes之外的基础架构故障(例如，持久卷)。尽管越来越多的公司开始采用多区域解决方案，但是如果您只需要简单的备份和还原选项，那么它就是一个复杂且可能代价昂贵的选择。在本文中，我们将介绍如何使用Velero备份和还原Kubernetes资源，以及如何将其用作灾难恢复或迁移工具。

仍然需要备份吗？ (Are Backups Still Needed?)

A key point that is often lost when running services in high availability (HA) mode is that HA (and thus replication) is not the same as having backups. HA protects against zonal failures, but it will not protect against data corruption or accidental removals. It is very easy to mix up the context or namespaces and accidentally delete or update the wrong Kubernetes resources. This may be a Custom Resource Definition (CRD), a secret, or a namespace. Some may argue that with IaaS tools like Terraform and external solutions to manage some of these Kubernetes resources (e.g. Vault for secrets, ChartMuseum for Helm charts), backups become unnecessary. Still, if you are running a StatefulSet in your cluster (e.g. ELK stack for logging or self-hosting Postgres to install plugins not support on RDS or Cloud SQL), backups are needed to recover from persistent volume failures.

在高可用性(HA)模式下运行服务时，经常丢失的一个关键点是HA(以及复制)与拥有备份不同。 HA可以防止区域性故障，但不能防止数据损坏或意外删除。混淆上下文或名称空间并意外删除或更新错误的Kubernetes资源非常容易。这可以是自定义资源定义(CRD)，机密或名称空间。有人可能会争辩说，使用诸如Terraform之类的IaaS工具和用于管理其中一些Kubernetes资源的外部解决方案(例如，用于秘密的Vault，用于Helm图表的ChartMuseum)，无需备份。不过，如果您在集群中运行StatefulSet(例如，用于记录日志的ELK堆栈或用于自托管Postgres的状态以安装RDS或Cloud SQL不支持的插件)，则需要备份以从持久卷故障中恢复。

维莱罗 (Velero)

Velero (formerly known as Ark) is an open-source tool from Heptio (acquired by VMWare) to back up and restore Kubernetes cluster resources and persistent volumes. Velero runs inside the Kubernetes cluster and integrates with various storage providers (e.g. AWS S3, GCP Storage, Minio) as well as restic to take snapshots either on-demand or on a schedule.

Velero(以前称为Ark)是Heptio(由VMWare收购)的一种开源工具，用于备份和还原Kubernetes集群资源和持久卷。 Velero在Kubernetes集群中运行，并与各种存储提供商(例如AWS S3，GCP Storage，Minio)以及Restic集成，可按需或按计划进行快照。

安装 (Installation)

Velero can be installed via Helm or via the CLI tool. In general, it seems like the CLI gets the latest updates and the Helm chart lags behind slightly with compatible Docker images. However, with each release, the Velero team does a great job updating the documentation to patch CRDs and the new Velero container image, so upgrading the Helm chart to the latest isn’t a huge concern.

可以通过Helm或CLI工具安装Velero。通常，CLI似乎获得了最新更新，而Helm图表在兼容的Docker映像方面略有滞后。但是，对于每个发行版，Velero团队都非常努力地更新文档以修补CRD和新的Velero容器映像，因此将Helm图表升级到最新并不是一个大问题。

组态 (Configuration)

Once you have the server installed, you can configure Velero via CLI or by modifying values.yaml for the Helm chart. The key configuration steps are installing the plugins for the storage provider and defining the Storage Location as well as the Volume Snapshot Location:

一旦安装了服务器，就可以通过CLI或通过修改Helm图表的values.yaml来配置Velero。关键的配置步骤是为存储提供程序安装插件，并定义存储位置以及卷快照位置：

configuration:
  provider: aws
  backupStorageLocation:
    name: aws
    bucket: <aws-bucket-name>
    prefix: velero
    config:
      kmsKeyId: <my-kms-key>
      region: <aws-region>
  volumeSnapshotLocation:
    name: aws
    config:
      region: ${region}
  logLevel: debug

(Note: There is an issue with CRDs with the latest Helm chart causing backup storage and volume snapshot location to not set the configured values as default. If you decide to name the storage and snapshot location, add --storage-location <name> --volume-snapshot-location name in the following Velero commands)

(注意： CRD的最新Helm图表存在问题，导致备份存储和卷快照位置未将配置值设置为默认值。如果决定命名存储和快照位置，请添加--storage-location <name> --volume-snapshot-location name以下Velero命令中的--storage-location <name> --volume-snapshot-location name )

创建备份 (Creating a Backup)

To create a backup, simply apply the backup command to a namespace or select by labels:

要创建备份，只需将backup命令应用于名称空间或按标签选择：

$ $ velero backup create postgres-backup --selector release=postgres

When the backup command is issued, Velero runs through the following steps:

发出备份命令后，Velero将执行以下步骤：

Call the Kubernetes API to create the Backup CRD
调用Kubernetes API创建Backup CRD
Velero BackupController validates the request
Velero BackupController验证请求
Once the request is validated, it queries the Kubernetes resources and takes snapshots of disks to back up and creates a tarball
验证请求后，它将查询Kubernetes资源并拍摄磁盘快照以备份并创建tarball
Finally, it initiates the upload of the backup objects to the configured storage service
最后，它启动将备份对象上传到配置的存储服务

恢复数据 (Restoring Data)

To list the available backups, first run:

要列出可用的备份，请首先运行：

$ velero backup get

Now you can restore from backup by issuing:

现在，您可以通过发出以下命令从备份中还原：

$ 
  --from-backup BACKUP_NAME

Velero also supports restoring objects into a different namespace if you do not wish to override the existing resources (append --namespace-mappings old-ns-1:new-ns-1 to the above command). This is useful if you are experiencing outages and want to diagnose the problem for later while immediately restoring the service.

如果您不希望覆盖现有资源，Velero还支持将对象还原到其他名称空间(将--namespace-mappings old-ns-1:new-ns-1追加到上述命令中)。如果您遇到服务中断并想在以后立即恢复服务的同时诊断问题，这很有用。

Velero can change the storage class of persistent volumes during restores. This may be a good way to migrate workloads from HDD to SSD storage or to a smaller disk if you over-provisioning the persistent volume (see the documentation for the configuration).

Velero可以在还原期间更改持久卷的存储类别。如果您过度配置了永久卷，这可能是将工作负载从HDD迁移到SSD存储器或较小磁盘的好方法(请参阅配置文档)。

Finally, you can also selectively restore sub-components of the backup. Inspect the backup tarball by running:

最后，您还可以有选择地还原备份的子组件。通过运行以下命令检查备用压缩包：

$ velero backup download <backup-name>

From the tarball, you can choose a manifest for a specific resource and individually issue kubectl apply -f . This is useful if you took a snapshot of the entire namespace rather than filtering by labels.

从tarball中，您可以为特定资源选择清单，然后单独发出kubectl apply -f 。如果您对整个名称空间进行了快照而不是按标签进行过滤，这将很有用。

计划备份 (Scheduled Backups)

Instead of only creating backups on-demand, you can also configure scheduled backups for critical components:

您不仅可以按需创建备份，还可以为关键组件配置计划的备份：

Via CLI:

通过CLI：

$ velero schedule create mysql --schedule="0 2* * *" --include-namespaces mysql

Via Helm values:

通过Helm值：

schedules:
  mysql:
    schedule: 0 2 * * *
    template:
      labelSelector:
        matchLabels:
          app: mysql
      snapshotVolumes: true
      ttl: 720h

Notice the ttl configuration, which specifies the time to expire scheduled backups. If you are using a Cloud Storage provider, you can leverage lifecycle policies or control that via Velero as shown above to reduce storage costs.

注意ttl配置，该配置指定了预定备份的到期时间。如果您使用的是Cloud Storage提供商，则可以利用生命周期策略或通过Velero控制生命周期策略，如上所示，以降低存储成本。

其他用途 (Other Uses)

Besides simply taking backups, Velero can be used as a disaster recovery solution by combining schedules and read-only backup storage locations. Configure Velero to create a daily schedule:

除了简单地进行备份外，Velero还可以通过结合日程表和只读备份存储位置来用作灾难恢复解决方案。配置Velero创建每日时间表：

If you need to recreate resources due to human error or infrastructure outage, change the backup location to be read-only to prevent new backup objects from being created:

如果由于人为错误或基础架构中断而需要重新创建资源，请将备份位置更改为只读，以防止创建新的备份对象：

$ 
    --namespace velero \
    --type merge \
    --patch '{"spec":{"accessMode":"ReadOnly"}}'

Restore from backup in another location:

从另一个位置的备份还原：

And finally, revert backup to be writable again:

最后，将备份还原为可写状态：

$ kubectl patch backupstoragelocation <STORAGE LOCATION NAME> \
   --namespace velero \
   --type merge \
   --patch '{"spec":{"accessMode":"ReadWrite"}}'

This process works to migrate clusters to a different region (if the provider supports it) or to create the last working version prior to a Kubernetes upgrade. Finally, even if Velero does not natively support migration of persistent volumes across clouds, you can configure restic to make backups at filesystem level and migrate data for a hybrid-cloud back up solution.

此过程可用于将群集迁移到其他区域(如果提供程序支持的话)或在Kubernetes升级之前创建最后一个工作版本。最后，即使Velero本身不支持跨云迁移持久卷，您也可以配置Restic在文件系统级别进行备份并为混合云备份解决方案迁移数据。

其他解决方案 (Other Solutions)

While Velero is very easy to use and configure, it may not fit your specific use case (e.g. cross-cloud backup). As mentioned above, Velero integrates with other solutions such as restic or OpenEBS, but if you are looking for alternatives, the following list provides both open-source and enterprise options:

尽管Velero易于使用和配置，但可能不适合您的特定用例(例如，跨云备份)。如上所述，Velero与Restic或OpenEBS等其他解决方案集成，但是如果您正在寻找替代方案，则以下列表提供了开源和企业选项：

翻译自: https://medium.com/dev-genius/disaster-recovery-on-kubernetes-98c5c78382bb

weixin_26637765

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
kubernetes上的灾难恢复

Using VMWare’s Velero to backup and restore, perform disaster recovery, as well as migrate Kubernetes resources.使用VMWare的Velero进行备份和还原，执行灾难恢复以及迁移Kubernetes资源。Although Kubernetes (and especially manag...
复制链接

扫一扫