故障现象:
一个副本集下四个节点,一个primary,两个Secondary,一个arbiter,其中将一个Secondary关闭后,修改primary节点的密码,这时修改命令会卡住直到超时失败。
udb-aqmp5a:PRIMARY> db.changeUserPassword("root","123123")
2016-08-23T17:05:30.879+0800 E QUERY Error: Updating user failed: timeout
at Error (<anonymous>)
at DB.updateUser (src/mongo/shell/db.js:1152:11)
at DB.changeUserPassword (src/mongo/shell/db.js:1156:10)
at (shell):1:4 at src/mongo/shell/db.js:1152
故障原因:
查看mongodb的错误日志
2016-08-19T12:37:08.897+0800 W NETWORK [ReplExecNetThread-12] Failed to connect to 10.19.66.62:27017, reason: errno:115 Operation now in progress
2016-08-19T12:37:08.897+0800 I REPL [ReplicationExecutor] Error in heartbeat request to 10.19.66.62:27017; Location18915 Failed attempt to connect to 10.19.66.62:27017; couldn't connect to server 10.19.66.62:27017 (10.19.66.62), connection attempt failed
2016-08-19T12:37:15.524+0800 I COMMAND [conn601] command admin.$cmd command: getLastError { getLastError: 1, w: "majority", wtimeout: 30000.0 } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:270 locks:{ Global: { acquireCount: { r: 3, w: 3 } }, Database: { acquireCount: { w: 1, W: 2 } }, Collection: { acquireCount: { w: 2 } }, oplog: { acquireCount: { w: 1 } } } 30001ms
可以看到writeconcern为write majority,这种情况下修改密码不符合“大多数”原则。可能是majority在计算时需要符合"大多数数据节点"的需求,包括了仲裁节点,但是如果有仲裁节点存在,因为它无法实际写入数据,所以它却永远站在对立面。
故障复现:
准备条件:一个primary,两个Secondary,一个arbiter,并关闭其中一台Secondary
方法1 采用普通写入,比如往一个db写入一条数据,