High memory usage with small database

Etcd 3.4.3. Very small (about 8 keys, probably just a few KB) database used for Traefik, reproducible both on single-node test cluster and in our production (3 nodes) cluster.

Upon startup, memory (RSS) immediately balloons to about 6GB (!), despite there being essentially no transaction traffic. Problem persists after compacting and defragmenting.

Command line from our Kubernetes statefulset:

      /usr/local/bin/etcd \
        --name=${POD_NAME} \
        --enable-v2=false \
        --logger=zap \
        --data-dir=/var/data/etcd \
        --max-wals=5 \
        --max-snapshots=5 \
        --snapshot-count=1000 \
        --auto-compaction-retention=1 \
        --listen-client-urls=http://0.0.0.0:2379 \
        --listen-peer-urls=http://0.0.0.0:2380 \
        --advertise-client-urls=http://0.0.0.0:2379 \
        --initial-cluster-token=t11e-staging \
        --initial-cluster-state=new \
        --initial-advertise-peer-urls=http://${POD_NAME}.etcd-headless:2380 \
        --initial-cluster=etcd-0=http://etcd-0.etcd-headless:2380

Prometheus metrics.

We were running 3.3 before. Initially I noticed there were maybe 20 WAL files lying around, despite the --max-wals setting. After deploying 3.4 with my config a few times, I still saw the same memory usage, but then I suddenly got some WAL file purges, and memory usage dropped to 150MB around the same time. I assume this to be connected.

However, I'm still seeing hugely excessive memory usage (about 2GB) in our production setup after upgrading to 3.4, even after the purging has happened. After purging, there are still 9 WAL files, not 5, though.

Current production status:

$ etcdctl endpoint status -w table
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|    ENDPOINT    |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| 127.0.0.1:2379 | ae8462dd9b20dd9e |   3.4.3 |   67 MB |     false |      false |      7649 |     148530 |             148530 |        |
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

 

xiang90 added the area/question label on 28 Nov 2019

 

evgenii-petrov-arrival commented on 10 Feb

I'm having a similar issue. An etcd cluster used for grafana/loki which uses a single key has unbounded memory growth and eventually is OOMKilled due to kubernetes memory limit(128Mi, which seems sensible for a single key + rate of change + compaction retention settings).

I've set following ENV variables:

etcdEnv:
- name: GOGC
  value: "50"
- name: ETCD_AUTO_COMPACTION_MODE
  value: "revision"
- name: ETCD_AUTO_COMPACTION_RETENTION
  value: "50"
- name: ETCD_QUOTA_BACKEND_BYTES
  # 64*1024*1024
  value: "67108864"
- name: ETCD_ENABLE_PPROF
  value: "true"

Cluster status:

# etcdctl --endpoints http://loki-etcd-2zh8s7hzfs.loki-etcd:2379/,http://loki-etcd-7n4xpflp9r.loki-etcd:2379/,http://loki-etcd-pjtgl82q4q.loki-etcd:2379/ endpoint status -w table
+---------------------------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|                  ENDPOINT                   |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+---------------------------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| http://loki-etcd-2zh8s7hzfs.loki-etcd:2379/ | 104f273760f75ae6 |   3.4.3 |  2.0 MB |      true |      false |        25 |       5425 |               5425 |        |
| http://loki-etcd-7n4xpflp9r.loki-etcd:2379/ | e41c03e60c0c8ef6 |   3.4.3 |  4.3 MB |     false |      false |        25 |       5425 |               5425 |        |
| http://loki-etcd-pjtgl82q4q.loki-etcd:2379/ | 337d428bd67e6189 |   3.4.3 |  3.8 MB |     false |      false |        25 |       5425 |               5425 |        |
+---------------------------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

Pprof shows that `` is the largest consumer of heap space.

On leader:

/ # go tool pprof http://loki-etcd-2zh8s7hzfs.loki-etcd:2379/debug/pprof/heap
Fetching profile over HTTP from http://loki-etcd-2zh8s7hzfs.loki-etcd:2379/debug/pprof/heap
Saved profile in /root/pprof/pprof.etcd.alloc_objects.alloc_space.inuse_objects.inuse_space.002.pb.gz
File: etcd
Type: inuse_space
Time: Feb 10, 2020 at 2:41pm (UTC)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top10
Showing nodes accounting for 57.12MB, 100% of 57.12MB total
Showing top 10 nodes out of 55
      flat  flat%   sum%        cum   cum%
   24.70MB 43.24% 43.24%    24.70MB 43.24%  go.etcd.io/etcd/raft/raftpb.(*Entry).Unmarshal
   13.10MB 22.94% 66.17%    13.10MB 22.94%  go.etcd.io/etcd/etcdserver/etcdserverpb.(*InternalRaftRequest).Marshal
    5.98MB 10.48% 76.65%    11.97MB 20.95%  go.etcd.io/etcd/etcdserver/api/rafthttp.startPeer
    5.98MB 10.48% 87.13%     5.98MB 10.48%  go.etcd.io/etcd/etcdserver/api/rafthttp.startStreamWriter
    2.31MB  4.05% 91.18%     2.31MB  4.05%  go.etcd.io/etcd/etcdserver/api/rafthttp.newMsgAppV2Decoder
    2.31MB  4.05% 95.23%     2.31MB  4.05%  go.etcd.io/etcd/etcdserver/api/rafthttp.newMsgAppV2Encoder
    1.16MB  2.02% 97.25%     1.73MB  3.02%  go.etcd.io/etcd/wal.newEncoder
    0.57MB     1% 98.25%     0.57MB     1%  go.etcd.io/etcd/pkg/ioutil.NewPageWriter
    0.50MB  0.88% 99.12%     0.50MB  0.88%  github.com/prometheus/client_golang/prometheus.newHistogram
    0.50MB  0.88%   100%     0.50MB  0.88%  go.etcd.io/etcd/etcdserver/api/v2stats.(*ServerStats).RecvAppendReq

On follower:

/ # go tool pprof http://loki-etcd-7n4xpflp9r.loki-etcd:2379/debug/pprof/heap
Fetching profile over HTTP from http://loki-etcd-7n4xpflp9r.loki-etcd:2379/debug/pprof/heap
Saved profile in /root/pprof/pprof.etcd.alloc_objects.alloc_space.inuse_objects.inuse_space.003.pb.gz
File: etcd
Type: inuse_space
Time: Feb 10, 2020 at 2:41pm (UTC)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top10
Showing nodes accounting for 69.19MB, 100% of 69.19MB total
Showing top 10 nodes out of 71
      flat  flat%   sum%        cum   cum%
   51.90MB 75.02% 75.02%    51.90MB 75.02%  go.etcd.io/etcd/raft/raftpb.(*Entry).Unmarshal
    4.49MB  6.49% 81.51%     4.49MB  6.49%  go.etcd.io/etcd/etcdserver/api/rafthttp.startStreamWriter
    2.99MB  4.32% 85.83%     7.48MB 10.81%  go.etcd.io/etcd/etcdserver/api/rafthttp.startPeer
    2.31MB  3.34% 89.17%     2.31MB  3.34%  go.etcd.io/etcd/etcdserver/api/rafthttp.newMsgAppV2Decoder
    2.31MB  3.34% 92.52%     2.31MB  3.34%  go.etcd.io/etcd/etcdserver/api/rafthttp.newMsgAppV2Encoder
       2MB  2.89% 95.41%        2MB  2.89%  github.com/prometheus/client_golang/prometheus.makeLabelPairs
    1.16MB  1.67% 97.08%     1.16MB  1.67%  go.etcd.io/etcd/wal.newEncoder
    0.77MB  1.11% 98.19%     0.77MB  1.11%  go.etcd.io/etcd/raft.(*MemoryStorage).Append
    0.75MB  1.08% 99.28%     0.75MB  1.08%  go.uber.org/zap/zapcore.newCounters
    0.50MB  0.72%   100%     0.50MB  0.72%  github.com/golang/protobuf/proto.getPropertiesLocked

Prometheus metrics on leader are here

How do I debug this further?

 

 

Contributor

xiang90 commented on 11 Feb

you need to limit the snapshot count (https://github.com/etcd-io/etcd/blob/master/Documentation/op-guide/maintenance.md#raft-log-retention), which reduces the number of log entires in memory.

 

 

evgenii-petrov-arrival commented on 11 Feb

@xiang90 , thanks! Somehow I thought that auto compaction takes care of this.

Is it possible to add something similar to etcd_debugging_mvcc_(compact|current)_revision for snapshots? I think it might have been possible for me to correlate an ever-increasing metric with ever-increasing memory, if there was one. This is actually what happened with auto revision compaction for me.

 

 

evgenii-petrov-arrival commented on 12 Feb

 

I've set ETCD_SNAPSHOT_COUNT=100 and tried again. go_memstats_heap_inuse_bytes oscillated between 70 and 90 MiB while put rate was a steady 0.6 puts per second, but then started growing again after put rate temporarily increased to 1 put per second and reached configured limit of 128 MiB, resulting in containers being OOMKilled.

Is it possible to provide an approximate memory consumption function for etcd? Currently it is too hard to understand what memory use will be and as such it is hard to set container resource limits in kubernetes.

Update: I noticed that etcd_debugging_mvcc_current_revision - etcd_debugging_mvcc_compact_revision was oscillating between 50 and 200 during steady state, but after put spike it started oscillating between 50 and 350. I guess combination of increased put rate and the fact that revision based compaction runs every 5m fully explains what happened.

 

 

Contributor

xiang90 commented on 12 Feb

not really. etcd is not designed to run in a memory limited environment. and go runtime does not release memory to OS as soon as GC finishes. there are many factors there.

128MB is kind of too small to run any meaningful go program in practice.

 

 

evgenii-petrov-arrival commented on 12 Feb

go runtime does not release memory to OS as soon as GC finishes

I was referring to go_memstats_heap_inuse_bytes, rather than os level metrics, which, as far as I understand, is independent from GC.

Number of live revisions, number of keys, sizes of keys and values and put rate seem to be straightforward inputs into the heap inuse bytes formula. Snapshots also fit into this somehow. While number of variables is relatively high and their relationships are complex, I think this is doable, for a set of workloads with some parameters fixed(for example, assuming that all values are of the same specific size).

While it is on users of etcd to load test their installation with a workload representative of their use case, it would be a lot easier to make sense of the results, if the above proposed formula was provided.

 

 

Contributor

xiang90 commented on 13 Feb

Number of live revisions, number of keys, sizes of keys and values and put rate seem to be straightforward inputs into the heap inuse bytes formula. Snapshots also fit into this somehow. While number of variables is relatively high and their relationships are complex, I think this is doable, for a set of workloads with some parameters fixed(for example, assuming that all values are of the same specific size).

it can be, but i am afraid that it will not be so accurate for a 128mb limited memory environment. as i said, too many moving parts affects it in the 100mb scale. but you can give it a try, and hopefully make it work. it would be great if you can improve the documentation on operational memory size with your finding! thanks!

 

 

stale bot commented on 13 May

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值