Etcd 3.4.3. Very small (about 8 keys, probably just a few KB) database used for Traefik, reproducible both on single-node test cluster and in our production (3 nodes) cluster.
Upon startup, memory (RSS) immediately balloons to about 6GB (!), despite there being essentially no transaction traffic. Problem persists after compacting and defragmenting.
We were running 3.3 before. Initially I noticed there were maybe 20 WAL files lying around, despite the --max-wals setting. After deploying 3.4 with my config a few times, I still saw the same memory usage, but then I suddenly got some WAL file purges, and memory usage dropped to 150MB around the same time. I assume this to be connected.
However, I'm still seeing hugely excessive memory usage (about 2GB) in our production setup after upgrading to 3.4, even after the purging has happened. After purging, there are still 9 WAL files, not 5, though.
Current production status:
$ etcdctl endpoint status -w table
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| 127.0.0.1:2379 | ae8462dd9b20dd9e | 3.4.3 | 67 MB | false | false | 7649 | 148530 | 148530 | |
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
I'm having a similar issue. An etcd cluster used for grafana/loki which uses a single key has unbounded memory growth and eventually is OOMKilled due to kubernetes memory limit(128Mi, which seems sensible for a single key + rate of change + compaction retention settings).
@xiang90 , thanks! Somehow I thought that auto compaction takes care of this.
Is it possible to add something similar to etcd_debugging_mvcc_(compact|current)_revision for snapshots? I think it might have been possible for me to correlate an ever-increasing metric with ever-increasing memory, if there was one. This is actually what happened with auto revision compaction for me.
I've set ETCD_SNAPSHOT_COUNT=100 and tried again. go_memstats_heap_inuse_bytes oscillated between 70 and 90 MiB while put rate was a steady 0.6 puts per second, but then started growing again after put rate temporarily increased to 1 put per second and reached configured limit of 128 MiB, resulting in containers being OOMKilled.
Is it possible to provide an approximate memory consumption function for etcd? Currently it is too hard to understand what memory use will be and as such it is hard to set container resource limits in kubernetes.
Update: I noticed that etcd_debugging_mvcc_current_revision - etcd_debugging_mvcc_compact_revision was oscillating between 50 and 200 during steady state, but after put spike it started oscillating between 50 and 350. I guess combination of increased put rate and the fact that revision based compaction runs every 5m fully explains what happened.
not really. etcd is not designed to run in a memory limited environment. and go runtime does not release memory to OS as soon as GC finishes. there are many factors there.
128MB is kind of too small to run any meaningful go program in practice.
go runtime does not release memory to OS as soon as GC finishes
I was referring to go_memstats_heap_inuse_bytes, rather than os level metrics, which, as far as I understand, is independent from GC.
Number of live revisions, number of keys, sizes of keys and values and put rate seem to be straightforward inputs into the heap inuse bytes formula. Snapshots also fit into this somehow. While number of variables is relatively high and their relationships are complex, I think this is doable, for a set of workloads with some parameters fixed(for example, assuming that all values are of the same specific size).
While it is on users of etcd to load test their installation with a workload representative of their use case, it would be a lot easier to make sense of the results, if the above proposed formula was provided.
Number of live revisions, number of keys, sizes of keys and values and put rate seem to be straightforward inputs into the heap inuse bytes formula. Snapshots also fit into this somehow. While number of variables is relatively high and their relationships are complex, I think this is doable, for a set of workloads with some parameters fixed(for example, assuming that all values are of the same specific size).
it can be, but i am afraid that it will not be so accurate for a 128mb limited memory environment. as i said, too many moving parts affects it in the 100mb scale. but you can give it a try, and hopefully make it work. it would be great if you can improve the documentation on operational memory size with your finding! thanks!
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.
Etcd 3.4.3. Very small (about 8 keys, probably just a few KB) database used for Traefik, reproducible both on single-node test cluster and in our production (3 nodes) cluster. Upon startup, memory (RSS) immediately balloons to about 6GB (!), despit...