Kubernetes上Redis的生产清单

最新推荐文章于 2023-11-21 15:23:34 发布

weixin_26637765

最新推荐文章于 2023-11-21 15:23:34 发布

阅读量236

点赞数

文章标签： redis

原文链接：https://medium.com/swlh/production-checklist-for-redis-on-kubernetes-60173d5a5325

版权

Redis is a popular open-source in-memory data store and cache that has become an integral part of building a scalable microservice system. While all the major cloud providers offer a fully-managed Redis service (Amazon ElastiCache, Azure Cache for Redis, and GCP Memorystore), it can also be easily deployed in Kubernetes if you need more control over the Redis configurations. Redis provides decent performance out of the box, but if you are preparing to run a production workload, make sure to go over this checklist before you go live.

Redis是一种流行的开源内存中数据存储和缓存，已成为构建可伸缩微服务系统不可或缺的一部分。尽管所有主要的云提供商都提供了完全托管的Redis服务( Amazon ElastiCache ， Redis的Azure缓存和GCP Memorystore )，但是如果您需要对Redis配置的更多控制，也可以轻松地将其部署在Kubernetes中。 Redis开箱即用地提供了不错的性能，但是如果您准备运行生产工作负载，请确保在上线之前检查此清单。

硬件优化 (Hardware Optimization)

As with any database, Redis performance is tied to the underlying VM specifications. Create a node pool with memory-optimized machines with high network bandwidth limits to minimize latency between clients and Redis servers. Redis is single-threaded, which means fast CPUs with large caches (e.g. VMs backed by Intel Skylake or Cascade Lake) perform better, and adding multi-cores do not directly improve performance. If your workload is mostly small objects (<10 KB), speed or RAM and memory bandwidth are not as critical to optimizing Redis performance. You can read more on Redis performance on different hardware here.

与任何数据库一样，Redis的性能与基础VM规范相关。使用具有高网络带宽限制的内存优化型计算机创建节点池，以最大程度地减少客户端和Redis服务器之间的延迟。 Redis是单线程的，这意味着具有高速缓存的快速CPU (例如，由Intel Skylake或Cascade Lake支持的VM)性能更好，并且添加多核并不能直接提高性能。如果您的工作负载主要是小对象(<10 KB)，则速度或RAM和内存带宽对于优化Redis性能并不重要。您可以在此处阅读有关Redis在不同硬件上的性能的更多信息。

选择一种部署方法 (Choose A Deployment Method)

To deploy a Redis cluster on Kubernetes, you can either use Bitnami’s Redis Helm Chart or one of the Redis Operators. While I am normally in favor of Kubernetes Operators, it doesn’t seem like there is a popular and mature Redis Operator compared to Bitnami’s Helm Chart. Redis Labs, the creators of Redis, provides an official Redis Enterprise Kubernetes Operator, but if you need a true open-source version, you can choose between Spotahome’s Operator or the Operator from Amadeus IT Group (in alpha). I have no personal experience with either of these operators, but the engineers at Flant wrote a blog post on their failures using the Redis Operator by Spotahome.

要在Kubernetes上部署Redis集群，可以使用Bitnami的Redis Helm Chart或Redis运算符之一。虽然我通常赞成Kubernetes运算符，但与Bitnami的Helm Chart相比，似乎没有一个流行且成熟的Redis运算符。 Redis的创建者Redis Labs提供了正式的Redis Enterprise Kubernetes运营商，但是如果您需要真正的开源版本，则可以在Spotahome的运营商或Amadeus IT Group 的运营商之间进行选择。我对这两个操作员都没有亲身经历，但是Flant的工程师在Spotahome上使用Redis Operator撰写了有关其失败的博客文章。

Bitnami supports two deployment options for Redis: a master-slave cluster with Redis Sentinel and a Redis cluster topology with sharding. If you have a high read throughput, using the master-slave cluster helps offload the read operations to the slave pods. The sentinels are configured to promote a slave pod to a master in the case of a failure. On the other hand, Redis cluster shards data across multiple instances and is a great fit when memory requirements exceed the limits for a single master with CPU becoming the bottleneck (>100GB). Redis cluster also supports high availability with each master connected to one or more slave pods. When the master pod crashes, one of the slaves will be promoted to master.

Bitnami支持Redis的两个部署选项：具有Redis Sentinel的主从集群和具有分片的Redis集群拓扑。如果您具有较高的读取吞吐量，则使用主从群集有助于将读取操作卸载到从属Pod。标记被配置为在发生故障的情况下将从属吊舱提升为主控。另一方面，Redis群集可在多个实例之间分片数据，并且当内存需求超过单个主服务器的限制且CPU成为瓶颈(> 100GB)时，Redis群集非常适合。 Redis集群还支持高可用性，每个主节点都连接到一个或多个从属Pod。当主控容器崩溃时，其中一个从属将被提升为主控。

永久储存 (Persistent Storage)

Redis stores some data in ephemeral storage, but using persistent volumes are critical for high-availability. Redis provides two persistence options:

Redis将一些数据存储在临时存储中，但是使用持久卷对于高可用性至关重要。 Redis提供了两个持久性选项：

RDB (Redis Database File): point-in-time snapshots
RDB (Redis数据库文件)：时间点快照
AOF (Append Only File): logs of every Redis operation
AOF (仅附加文件)：每个Redis操作的日志

It’s possible to combine both types of persistence, but it’s important to understand the tradeoffs between the two options for the best performance.

可以将两种类型的持久性结合在一起，但是了解这两种选项之间的权衡以获得最佳性能非常重要。

RDB is a compact snapshot optimized for a typical backup operation. RDB backup operation has minimal impact on Redis performance since the parent process forks a child process to create the backup. In disaster recovery scenarios, RDB boots up faster than AOF since the file size is smaller and more compact. However, since RDB is essentially a point-in-time snapshot, it will lose data in between RDB snapshots if a failure occurs.

RDB是为典型备份操作优化的紧凑快照。 RDB备份操作对Redis性能的影响最小，因为父进程会分叉子进程来创建备份。在灾难恢复方案中，RDB的启动速度比AOF更快，因为文件大小更小且更紧凑。但是，由于RDB本质上是一个时间点快照，因此如果发生故障，它将丢失RDB快照之间的数据。

AOF, on the other hand, keeps a log of every operation and is more durable than RDB as it can be configured to fsnyc on every second or query. In the event of an outage, AOF can run through the log and replay every operation. Redis can also automatically and safely rewrite the AOF in the background if it gets too big. The downside to AOF is file size and speed. With replication turned on, sometimes the slaves cannot sync with the master fast enough to revive all the data. AOF can also be much slower than RDB depending on the fsync policy.

另一方面，AOF可以记录每个操作的日志，并且比RDB更为持久，因为它可以配置为每秒或一次查询fsnyc。万一发生故障，AOF可以遍历日志并重播所有操作。如果Redis太大，Redis也可以在后台自动安全地重写AOF。 AOF的缺点是文件大小和速度。启用复制后，有时从服务器无法与主服务器同步得足够快，无法恢复所有数据。根据fsync策略，AOF也可能比RDB慢得多。

Redis Helm chart enables AOF and disables RDB by default, but you can override the configmap with different fsync strategies or RDB persistence:

Redis Helm图表默认启用AOF并禁用RDB，但是您可以使用不同的fsync策略或RDB持久性覆盖configmap：

configmap: |-  
  # Enable AOF https://redis.io/topics/persistence#append-only-file 
  appendonly yes  # Disable RDB persistence, AOF persistence already enabled.  
  save ""

For a deep-dive into Redis persistence, make sure to read the Redis Persistence Post on the official Redis website.

要深入了解Redis持久性，请确保阅读Redis官方网站上的Redis Persistence Post 。

禁用THP (Disable THP)

After deploying Redis to Kubernetes, you will most likely see the following warning message:

将Redis部署到Kubernetes之后，您很可能会看到以下警告消息：

WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled

In order to disable THP, you can add an init-container to run the command or deploy a DaemonSet to the node pool running Redis. For example, I have a database node pool running on GKE so I deployed a DaemonSet like the following:

为了禁用THP，您可以添加init容器来运行命令或将DaemonSet部署到运行Redis的节点池。例如，我有一个在GKE上运行的数据库节点池，因此我部署了一个DaemonSet，如下所示：

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: thp-disable
  namespace: kube-system
spec:
  selector:
    matchLabels:
      name: thp-disable
  template:
    metadata:
      labels:
        name: thp-disable
    spec:
      tolerations: 
      - key: "database"
        operator: "Equal"
        value: "true"
        effect: "NoSchedule"
      affinity:
        nodeAffinity: 
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: cloud.google.com/gke-nodepool
                operator: In
                values:
                - database
      restartPolicy: Always
      terminationGracePeriodSeconds: 1
      volumes:
        - name: host-sys
          hostPath:
            path: /sys
      initContainers:
        - name: disable-thp
          image: busybox
          volumeMounts:
            - name: host-sys
              mountPath: /host-sys
          command: ["sh", "-c", "echo never >/host-sys/kernel/mm/transparent_hugepage/enabled"]
      containers:
        - name: busybox
          image: busybox
          command: ["watch", "-n", "600", "cat", "/sys/kernel/mm/transparent_hugepage/enabled"]

标杆绩效 (Benchmarking Performance)

After setting up the Redis cluster, you can now benchmark the performance using various, well-maintained tools:

设置Redis集群之后，您现在可以使用各种维护良好的工具对性能进行基准测试：

Redis Benchmark: included with Redis
Redis基准测试：Redis附带
Memtier Benchmark: also developed by Redis Labs
Memtier Benchmark ：也由Redis Labs开发
Redis Memory Analyzer: Python tool by GameNet
Redis内存分析器：GameNet的Python工具
YCSB: from Yahoo Cloud
YCSB ：来自Yahoo Cloud
PerfKit Benchmarker: from Google Cloud
PerfKit Benchmarker ：来自Google Cloud
Redis RDB tools: parses Redis dump.rdb files
Redis RDB工具：解析Redis dump.rdb文件
Harvest: samples Redis keys and shows top key prefixes
Harvest ：对Redis键进行采样并显示最高键前缀

Chie Hayashida from Google Cloud wrote an excellent guide to using YCSB to benchmark Redis performance for Memorystore (GCP’s managed Redis). The same tools can be used to test Redis in Kubernetes. Use port forwarding to map Redis to localhost and run YCSB with various usage patterns. Combine this result with memory analyzer tools to fine-tune Redis performance:

Google Cloud的Chie Hayashida撰写了一篇出色的指南，介绍了如何使用YCSB来对Memorystore(GCP的托管Redis)的Redis性能进行基准测试。可以使用相同的工具在Kubernetes中测试Redis。使用端口转发将Redis映射到localhost并以各种使用模式运行YCSB。将此结果与内存分析器工具结合使用以微调Redis性能：

Compress data using Snappy/LZO (low latency) or GZIP (maximum compression) for long strings or JSON/XML values.
使用Snappy / LZO(低延迟)或GZIP(最大压缩)压缩数据以获取长字符串或JSON / XML值。
Use MessagePack format instead of JSON for efficient serialization.
使用MessagePack格式而不是JSON进行有效的序列化。
Set an appropriate eviction policy: use allkeys to evict all the keys or volatile for those with TTL/expiration field set (you can specify in the extra flags section in the Helm chart)
设置适当的驱逐策略：对于设置了TTL /过期字段的密钥，请使用allkeys驱逐所有密钥或volatile (可以在Helm图表的Extra标志部分中指定)

master:
  ## Redis command arguments
  ##
  ## Can be used to specify command line arguments, for example:
  ##
  command: "/run.sh"
  ## Additional Redis configuration for the master nodes
  ## ref: https://redis.io/topics/config
  ##
  configmap:
  ## Redis additional command line flags
  ##
  ## Can be used to specify command line flags, for example:
  ## extraFlags:
  ##  - "--maxmemory-policy volatile-ttl"
  ##  - "--repl-backlog-size 1024mb"

监控方式 (Monitoring)

Finally, connect Redis metrics to Prometheus (or a monitoring tool of your choice) to detect performance degradations and alerts. Bitnami Helm Chart uses Bitnami’s Redis Exporter by default, but you can also use Oliver006’s popular Redis Exporter chart. There is also a companion Grafana chart to visualize all the metrics.

最后，将Redis指标连接到Prometheus(或您选择的监视工具)以检测性能下降和警报。 Bitnami Helm Chart默认情况下使用Bitnami的Redis Exporter，但是您也可以使用Oliver006的流行Redis Exporter图表。还有一个随附的Grafana图表，以可视化所有指标。

As for configuring alerts, Bitnami provides some example rules:

关于配置警报，Bitnami提供了一些示例规则：

## Custom PrometheusRule to be defined
  ## The value is evaluated as a template, so, for example, the value can depend on .Release or .Chart
  ## ref: https://github.com/coreos/prometheus-operator#customresourcedefinitions
  prometheusRule:
    enabled: false
    additionalLabels: {}
    namespace: ""
    ## Redis prometheus rules
    ## These are just examples rules, please adapt them to your needs.
    ## Make sure to constraint the rules to the current postgresql service.
    # rules:
    #   - alert: RedisDown
    #     expr: redis_up{service="{{ template "redis.fullname" . }}-metrics"} == 0
    #     for: 2m
    #     labels:
    #       severity: error
    #     annotations:
    #       summary: Redis instance {{ "{{ $labels.instance }}" }} down
    #       description: Redis instance {{ "{{ $labels.instance }}" }} is down
    #    - alert: RedisMemoryHigh
    #      expr: >
    #        redis_memory_used_bytes{service="{{ template "redis.fullname" . }}-metrics"} * 100
    #        /
    #        redis_memory_max_bytes{service="{{ template "redis.fullname" . }}-metrics"}
    #        > 90 =< 100
    #      for: 2m
    #      labels:
    #        severity: error
    #      annotations:
    #        summary: Redis instance {{ "{{ $labels.instance }}" }} is using too much memory
    #        description: |
    #          Redis instance {{ "{{ $labels.instance }}" }} is using {{ "{{ $value }}" }}% of its available memory.
    #    - alert: RedisKeyEviction
    #      expr: |
    #        increase(redis_evicted_keys_total{service="{{ template "redis.fullname" . }}-metrics"}[5m]) > 0
    #      for: 1s
    #      labels:
    #        severity: error
    #      annotations:
    #        summary: Redis instance {{ "{{ $labels.instance }}" }} has evicted keys
    #        description: |
    #          Redis instance {{ "{{ $labels.instance }}" }} has evicted {{ "{{ $value }}" }} keys in the last 5 minutes.
    rules: []

Or you can use Redis alerts from Awesome Prometheus project. If you need a guide on setting up Prometheus on Kubernetes, you can check out the Practical Monitoring with Prometheus and Grafana series.

或者，您可以使用Awesome Prometheus项目中的Redis警报。如果您需要有关在Kubernetes上设置Prometheus的指南，可以查看Prometheus和Grafana系列的实用监视。

At this point, you should have a production-ready Redis cluster running on Kubernetes. As usage grows, some performance degradation is expected. Make sure to run the benchmark and memory analyzer tool periodically to deal with the new load.

此时，您应该在Kubernetes上运行可用于生产的Redis集群。随着使用量的增长，预计性能会有所下降。确保定期运行基准测试和内存分析器工具以应对新负载。

翻译自: https://medium.com/swlh/production-checklist-for-redis-on-kubernetes-60173d5a5325

weixin_26637765

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Kubernetes上Redis的生产清单

Redis is a popular open-source in-memory data store and cache that has become an integral part of building a scalable microservice system. While all the major cloud providers offer a fully-managed Red...
复制链接

扫一扫