ZooKeeper does not recover

最新推荐文章于 2024-05-23 20:25:12 发布

꧁꫞ND꫞꧂

最新推荐文章于 2024-05-23 20:25:12 发布

阅读量668

点赞数

分类专栏： # zookeeper

本文链接：https://blog.csdn.net/Baron_ND/article/details/100080749

版权

zookeeper 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

ZooKeeper does not recover from crash when disk was full

Description

The disk that ZooKeeper was using filled up. During a snapshot write, I got the following exception

2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - Severe unrecoverable error, exiting
java.io.IOException: No space left on device
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:282)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
at org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)

Then many subsequent exceptions like:

2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was partial.
2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected exception, exiting abnormally
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:375)
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
at org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:504)
at org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)

It seems to me that writing the transaction log should be fully atomic to avoid such situations. Is this not the case?

Answers
delete zookeeper snapshot file its solved.

1.I was able to workaround the issue by deleting the partially written snapshot file

2.I believe the exception is being thrown while reading the snapshot and the partial transaction message is not an indication of what is causing it to crash. It sounds right that we should try a different snapshot, but according to the log messages you posted, it sounds like the problem is that we are not catching EOFException.
3.So there exceptions are thrown when ZooKeeper is running? Am not sure why its exiting so many times. Do you guys restart the ZK server if it dies?

4.We run ZooKeeper with runit, so yes it is restarted when it dies. It ends up in a loop of:

No space left on device
Starting server
Last transaction was partial
Snapshotting: 0x19a3d to /opt/zookeeper-3.4.3/data/version-2/snapshot.19a3d
No space left on device

5.I thought you said it does not recover when disk was full, but looks like the disk is still full? No?

6.Here is the full sequence of events (sorry for the confusion):

Noticed disk was full
Cleaned up disk space
Tried zkCli.sh, got errors
Checked ZK log, loop of:

2013-01-16 15:01:35,194 - ERROR [main:Util@239] - Last transaction was partial.
2013-01-16 15:01:35,196 - ERROR [main:ZooKeeperServerMain@63] - Unexpected exception, exiting abnormally
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:375)
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
at org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:504)
at org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)

Stopped ZK
Listed ZK data directory

ubuntu@ip-10-78-19-254:/opt/zookeeper-3.4.3/data/version-2$ ls -lat
total 18096
drwxr-xr-x 2 zookeeper zookeeper 4096 Jan 16 06:41 .
rw-rr- 1 zookeeper zookeeper 0 Jan 16 06:41 log.19a3e
rw-rr- 1 zookeeper zookeeper 585377 Jan 16 06:41 snapshot.19a3d
rw-rr- 1 zookeeper zookeeper 67108880 Jan 16 03:11 log.19a2a
rw-rr- 1 zookeeper zookeeper 585911 Jan 16 03:11 snapshot.19a29
rw-rr- 1 zookeeper zookeeper 67108880 Jan 16 03:11 log.11549
rw-rr- 1 zookeeper zookeeper 585190 Jan 15 17:28 snapshot.11547
rw-rr- 1 zookeeper zookeeper 67108880 Jan 15 17:28 log.1
rw-rr- 1 zookeeper zookeeper 296 Jan 14 16:44 snapshot.0
drwxr-xr-x 3 zookeeper zookeeper 4096 Jan 14 16:44 ..

Removed log.19a3e and snapshot.19a3d

ubuntu@ip-10-78-19-254:/opt/zookeeper-3.4.3/data/version-2$ sudo rm log.19a3e
ubuntu@ip-10-78-19-254:/opt/zookeeper-3.4.3/data/version-2$ sudo rm snapshot.19a3d

Started ZK
Back to normal

1. Attaching zookeeper.log

2.FYI, this issue is a duplication of ZOOKEEPER-1612 (curiously, a permutation of the last two digits, heh). I'd suggest to close 1612 as dup instead, if possible.

3.Ill makr 1612 as dup. Thanks for pointing that out Edward.

4.Looks like the header was incomplete. Unfortunately we do not handle corrupt header but do handle corrupt txn's later. Am suprised that this happened twice in a row for 2 users. Ill upload a patch and test case.

5.Should FileTxnIterator.goToNextLog() return false if the header is corrupted/incomplete, or should it skip the log file and go to the next log file if it exists?

6.-1 overall. Here are the results of testing the latest attachment

http://issues.apache.org/jira/secure/attachment/12645856/ZOOKEEPER-1621.patch
against trunk revision 1596284.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

+1 release audit. The applied patch does not increase the total number of release audit warnings.

-1 core tests. The patch failed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2105//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2105//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2105//console

This message is automatically generated.

1.Here's a different option - intuitively once zookeeper fails to write to disk, by continuing to operate normally it violates its promises to users (which is that if a majority acked the data is always there even if reboots happen). Once we realize the promise can't be kept it may be better to crash the server at that point and violate liveness (no availability) rather than to continue and risk coming up with a partial log at a later point violating safety (inconsistent state, lost transactions, etc).

2.I'm fine with Alex's suggestion. We should document how to manually recover when the server doesn't start because the log file doesn't contain the complete header.

3.I actually like Alexander Shraer's suggestion. However, if this is going to be the way you recommended recovering a corrupt log file, there should be a script that does it for users: zk-recover.sh or some such. In this world of deployment automation, it's not a nice thing to say "go delete the most recent log segment from ZK's data dir". Much better for the application to handle it through a script or command.

4.Apart from these corrective measures there should be some preventive measures as well.

Can we have disk space availability checker which check periodically whether disk space is available or not and if not available then close the Zookeeper gracefully.

5.You mean, like a ZK thread dedicated to this? What would the behavior be, only shutdown if it's the leader?

6.Yes, dedicated thread for this like org.apache.zookeeper.server.DatadirCleanupManager

shut-down in every case, because without disk space zookeeper can not serve any purpose
The idea is as follows
- add two new zookeeper properties
  diskspace.min.threshold=5% (values can be % of data directory available space or in GB)
  diskspace.check.interval=5 second (default:5,min:1,max:Long.MAX_VALUE)
- add dedicated disk check thread
  - which runs on every {{diskspace.check.interval)) second
  - if disk space is less than diskspace.min.threshold then shutdown zookeeper instance
Some clarifications:
- Query: Suppose diskspace.check.interval=5 and disk space can be full within 5 second by zookeeper or by other process. What is handling for this?
  Ans: User should know what is their usage scenario, and what other processes are using the same disk space and based on that they should optimize the diskspace.check.interval values
- Query: let say diskspace.check.interval=1 but disk space can be filled even within 1 second by zookeeper and other process
  Ans: yes it can be filled if diskspace.min.threshold is less, again based on disk space usage user need to optimize diskspace.min.threshold

7.Reviving this old thread. Alexander Shraer has a valid concern about trading off consistency for availability. However, for the specific issue being addressed here, we can have both.

The patch skips transaction logs with an incomplete header (the first 16 bytes). Skipping such files should not cause any loss of data as the header is an internal bookkeeping write from Zookeeper and does not contain any user data. This avoids the current behavior of Zookeeper crashing on encountering an incomplete header, which compromises availability.

This has been a recurring problem for us in production because our app's operating environment occasionally causes a Zookeeper server's disk to become full. After that, the server invariably runs into this problem - perhaps because there's something else that deterministically triggers a log rotation when the previous txn log throws an IOException due to disk full?

That said, we can tighten the exception being caught in Michi Mutsuzaki's patch to EOFException instead of IOException to make sure that the log we are skipping indeed only has a partially written header and nothing else (in FileTxnLog.goToNextLog).

Additionally, I have written a test to verify that EOFException is thrown if and only if the header is truncated. Zookeeper already ignores any other partially written transactions in the txn log. If that's useful, I can upload the test, thanks.

8.Agreed. Forcing users to manually clean up the partial/empty header in this scenario seems undesirable, and if we only catch EOFException instead of IOException, we shouldn't run into any problems with correctness. Additionally, since this issue should only occur "legitimately" in the most recent txn log file, we can be even more conservative and only continue in that case.

9.Thanks Meyer Kizner. Your suggestion of doing this only for the most recent txn log file is sound. Are you also suggesting that we delete this truncated txn log file?Cause, if we skip it and don't delete, then in the future, newer txn log files will get created. So, the truncated txn log file will no longer be the latest txn log when we do a purge afterwards.Deletion seems consistent with this approach as well as consistent with PurgeTxnLog's behavior.

10.Yes, we would have to delete such a log file upon encountering it. I don't believe this would cause any problems, and it seems desirable to have the extra check this enables.

11.he proposal of the fix makes sense to me.

Is it feasible to make a stronger guarantee for the ZooKeeper serialization semantics - that is, under no cases (disk full, power failure, hardware failure) would ZooKeeper generates invalid persistent files (for both snapshot and tx logs)? This might be possible by serializing things to a swap file first and then at one point do an atomic rename of the file. With the guarantee of the sanity of the on disk formats the deserializing logic would be simplified, as there will not be many corner cases to consider, besides the existing basic checksum check logic.

I can think two potential drawback of this approach:

Performance: if we write to swap file and then rename for every writes, we will be making more sys calls per write. Might impact performance / latency of write?
Potential data loss during recover: to improve performance, we could batch writes and only do rename at certain points - (i.e. every 1000 writes). In case of a failure, part of the data might loss as those data (possibly corrupted / partially serialized) living in swap file will not be parsed by ZK during start up (we will only load and parse renamed files.).

My feeling is the best approach might be a mix of efforts on both serialization and deserialization side:

When serializing, we do our best efforts to avoid generate corrupted files (i.e. through atomic writes to files.).
When deserializing, we do best efforts to detect corrupt files and recover conservatively - the success of recovery might be case by case - for example for this disk full case the proposed fix sounds pretty safe to perform while in other cases it might not be straightforward to tell which data is good and which is bad.
As a result - the expectation is when things crash and files corrupted, ZK should be able to recover later without manual intervention. This would be good for users.

详情查看：https://issues.apache.org/jira/browse/ZOOKEEPER-1621

꧁꫞ND꫞꧂

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
打赏
0
评论
ZooKeeper does not recover

ZooKeeper does not recover from crash when disk was fullDescriptionThe disk that ZooKeeper was using filled up. During a snapshot write, I got the following exception2013-01-16 03:11:14,098 - ER...
复制链接

扫一扫