2.停止主 HMaster 所在节点的进程,停止该节点上的 HMaster 进程;
3. 主备切换过程中,观察 HBase 客户端日志,读写操作执行状态和 HMaster
倒换时间;
4.倒换完成,检查倒换告警信息是否上报正确;
杀掉hbase主节点
[hadoop@masternode1 test]$ jps
28268 NameNode
30892 Jps
26006 DFSZKFailoverController
26534 HMaster
26193 ResourceManager
[hadoop@masternode1 test]$ kill -9 26534
[hadoop@masternode1 test]$ jps
28268 NameNode
26006 DFSZKFailoverController
30924 HMaster
26193 ResourceManager
31033 Jps
每隔 1 秒并行地对 t1 执行 scan 和 put 数据操作;
hbase(main):004:0> desc 't1'
Table t1 is ENABLED
t1
COLUMN FAMILIES DESCRIPTION
{NAME => 'f1', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATI
ON_SCOPE => '0', VERSIONS => '2', COMPRESSION => 'NONE', MIN_VERSIONS => '0',
TTL => 'FOREVER', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_ME
MORY => 'false', BLOCKCACHE => 'true'}
{NAME => 'f2', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATI
ON_SCOPE => '0', VERSIONS => '2', COMPRESSION => 'NONE', MIN_VERSIONS => '0',
TTL => 'FOREVER', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_ME
MORY => 'false', BLOCKCACHE => 'true'}
2 row(s) in 0.0520 seconds
hbase(main):005:0> put 't1','rowkey001','f1:col1','value01'
0 row(s) in 0.1050 seconds
hbase(main):006:0> scan 't1'
ROW COLUMN+CELL
rowkey001 column=f1:col1, timestamp=1478682508077, value=value01
1 row(s) in 0.0280 seconds
hbase(main):007:0> put 't1','rowkey001','f1:col1','value01'
0 row(s) in 0.0110 seconds
hbase(main):008:0> scan 't1'
ROW COLUMN+CELL
rowkey001 column=f1:col1, timestamp=1478682516022, value=value01
1 row(s) in 0.0100 seconds
hbase(main):009:0> put 't1','rowkey001','f1:col1','value02'
0 row(s) in 0.0070 seconds
hbase(main):010:0> scan 't1'
ROW COLUMN+CELL
rowkey001 column=f1:col1, timestamp=1478682530267, value=value02
1 row(s) in 0.0130 seconds
hbase(main):011:0> scan 't1'
ROW COLUMN+CELL
rowkey001 column=f1:col1, timestamp=1478682530267, value=value02
1 row(s) in 0.0160 seconds
hbase(main):012:0> scan 't1'
ROW COLUMN+CELL
rowkey001 column=f1:col1, timestamp=1478682530267, value=value02
1 row(s) in 0.0160 seconds
hbase(main):013:0> put 't1','rowkey001','f1:col1','value03'
0 row(s) in 0.0080 seconds
hbase(main):014:0> put 't1','rowkey001','f1:col1','value04'
0 row(s) in 0.0070 seconds
hbase(main):015:0> put 't1','rowkey001','f1:col1','value05'
0 row(s) in 0.0080 seconds
hbase(main):016:0> scan 't1'
ROW COLUMN+CELL
rowkey001 column=f1:col1, timestamp=1478682581351, value=value05
1 row(s) in 0.0160 seconds
hbase(main):017:0> put 't1','rowkey001','f1:col1','value06'
0 row(s) in 0.0060 seconds
hbase(main):018:0> put 't1','rowkey001','f1:col1','value07'
0 row(s) in 0.0120 seconds
hbase(main):019:0> scan 't1'
ROW COLUMN+CELL
rowkey001 column=f1:col1, timestamp=1478682595754, value=value07
1 row(s) in 0.0170 seconds
hbase(main):020:0>
经测试hbase可以自动切换主备,并且之前主节点挂掉仍可以正常对数据库操作,测试案例通过
2016-11-09 17:10:52,695 INFO [masternode2:16000.activeMasterManager] master.ActiveMasterManager: Deleting ZNode for /hbase/backup-masters/masternode2,16000,1478674057517 from backup master directory 2016-11-09 17:10:52,707 INFO [masternode2:16000.activeMasterManager] master.ActiveMasterManager: Registered Active Master=masternode2,16000,1478674057517 2016-11-09 17:10:53,053 INFO [masternode2:16000.activeMasterManager] fs.HFileSystem: Added intercepting call to namenode#getBlockLocations so can do block reordering using class class org.apache.hadoop.hbase.fs.HFileSystem$ReorderWALBlocks 2016-11-09 17:10:53,061 INFO [masternode2:16000.activeMasterManager] coordination.SplitLogManagerCoordination: Found 0 orphan tasks and 0 rescan nodes 2016-11-09 17:10:53,123 INFO [masternode2:16000.activeMasterManager] zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x29059ca1 connecting to ZooKeeper ensemble=slavenode1:2181,slavenode2:2181,slavenode3:2181,slavenode4:2181,slavenode5:2181,slavenode6:2181,slavenode7:2181 2016-11-09 17:10:53,123 INFO [masternode2:16000.activeMasterManager] zookeeper.ZooKeeper: Initiating client connection, connectString=slavenode1:2181,slavenode2:2181,slavenode3:2181,slavenode4:2181,slavenode5:2181,slavenode6:2181,slavenode7:2181 sessionTimeout=30000 watcher=hconnection-0x29059ca10x0, quorum=slavenode1:2181,slavenode2:2181,slavenode3:2181,slavenode4:2181,slavenode5:2181,slavenode6:2181,slavenode7:2181, baseZNode=/hbase 2016-11-09 17:10:53,126 INFO [masternode2:16000.activeMasterManager-SendThread(slavenode5:2181)] zookeeper.ClientCnxn: Opening socket connection to server slavenode5/192.168.237.236:2181. Will not attempt to authenticate using SASL (unknown error) 2016-11-09 17:10:53,128 INFO [masternode2:16000.activeMasterManager-SendThread(slavenode5:2181)] zookeeper.ClientCnxn: Socket connection established to slavenode5/192.168.237.236:2181, initiating session 2016-11-09 17:10:53,137 INFO [masternode2:16000.activeMasterManager-SendThread(slavenode5:2181)] zookeeper.ClientCnxn: Session establishment complete on server slavenode5/192.168.237.236:2181, sessionid = 0x55846d84cf0007f, negotiated timeout = 30000 2016-11-09 17:10:53,158 INFO [masternode2:16000.activeMasterManager] balancer.StochasticLoadBalancer: loading config 2016-11-09 17:10:53,216 INFO [masternode2:16000.activeMasterManager] master.HMaster: Server active/primary master=masternode2,16000,1478674057517, sessionid=0x75846d7777e0071, setting cluster-up flag (Was=true) 2016-11-09 17:10:53,244 INFO [masternode2:16000.activeMasterManager] procedure.ZKProcedureUtil: Clearing all procedure znodes: /hbase/online-snapshot/acquired /hbase/online-snapshot/reached /hbase/online-snapshot/abort 2016-11-09 17:10:53,261 INFO [masternode2:16000.activeMasterManager] procedure.ZKProcedureUtil: Clearing all procedure znodes: /hbase/flush-table-proc/acquired /hbase/flush-table-proc/reached /hbase/flush-table-proc/abort 2016-11-09 17:10:53,293 INFO [masternode2:16000.activeMasterManager] master.MasterCoprocessorHost: System coprocessor loading is enabled 2016-11-09 17:10:53,306 INFO [masternode2:16000.activeMasterManager] procedure2.ProcedureExecutor: Starting procedure executor threads=25 2016-11-09 17:10:53,306 INFO [masternode2:16000.activeMasterManager] wal.WALProcedureStore: Starting WAL Procedure Store lease recovery 2016-11-09 17:10:53,310 INFO [masternode2:16000.activeMasterManager] util.FSHDFSUtils: Recovering lease on dfs file hdfs://cluster-ha/hbase/MasterProcWALs/state-00000000000000000153.log 2016-11-09 17:10:53,343 INFO [masternode2:16000.activeMasterManager] util.FSHDFSUtils: recoverLease=true, attempt=0 on file=hdfs://cluster-ha/hbase/MasterProcWALs/state-00000000000000000153.log after 32ms 2016-11-09 17:10:53,357 INFO [masternode2:16000.activeMasterManager] util.FSHDFSUtils: Recovering lease on dfs file hdfs://cluster-ha/hbase/MasterProcWALs/state-00000000000000000154.log 2016-11-09 17:10:53,373 INFO [masternode2:16000.activeMasterManager] util.FSHDFSUtils: recoverLease=true, attempt=0 on file=hdfs://cluster-ha/hbase/MasterProcWALs/state-00000000000000000154.log after 16ms 2016-11-09 17:10:53,373 WARN [masternode2:16000.activeMasterManager] wal.WALProcedureStore: Remove uninitialized log: FileStatus{path=hdfs://cluster-ha/hbase/MasterProcWALs/state-00000000000000000154.log; isDirectory=false; length=0; replication=3; blocksize=134217728; modification_time=1478673921926; access_time=1478673921926; owner=hadoop; group=supergroup; permission=rw-r--r--; isSymlink=false} 2016-11-09 17:10:53,438 INFO [masternode2:16000.activeMasterManager] wal.WALProcedureStore: Lease acquired for flushLogId: 155 2016-11-09 17:10:53,456 INFO [masternode2:16000.activeMasterManager] zookeeper.RecoverableZooKeeper: Process identifier=replicationLogCleaner connecting to ZooKeeper ensemble=slavenode1:2181,slavenode2:2181,slavenode3:2181,slavenode4:2181,slavenode5:2181,slavenode6:2181,slavenode7:2181 2016-11-09 17:10:53,456 INFO [masternode2:16000.activeMasterManager] zookeeper.ZooKeeper: Initiating client connection, connectString=slavenode1:2181,slavenode2:2181,slavenode3:2181,slavenode4:2181,slavenode5:2181,slavenode6:2181,slavenode7:2181 sessionTimeout=30000 watcher=replicationLogCleaner0x0, quorum=slavenode1:2181,slavenode2:2181,slavenode3:2181,slavenode4:2181,slavenode5:2181,slavenode6:2181,slavenode7:2181, baseZNode=/hbase 2016-11-09 17:10:53,458 INFO [masternode2:16000.activeMasterManager-SendThread(slavenode3:2181)] zookeeper.ClientCnxn: Opening socket connection to server slavenode3/192.168.237.234:2181. Will not attempt to authenticate using SASL (unknown error) 2016-11-09 17:10:53,458 INFO [masternode2:16000.activeMasterManager-SendThread(slavenode3:2181)] zookeeper.ClientCnxn: Socket connection established to slavenode3/192.168.237.234:2181, initiating session 2016-11-09 17:10:53,468 INFO [masternode2:16000.activeMasterManager-SendThread(slavenode3:2181)] zookeeper.ClientCnxn: Session establishment complete on server slavenode3/192.168.237.234:2181, sessionid = 0x35846d7bab3006e, negotiated timeout = 30000 2016-11-09 17:10:53,485 INFO [masternode2:16000.activeMasterManager] master.ServerManager: Waiting for region servers count to settle; currently checked in 0, slept for 0 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms. 2016-11-09 17:10:54,990 INFO [masternode2:16000.activeMasterManager] master.ServerManager: Waiting for region servers count to settle; currently checked in 0, slept for 1505 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms. 2016-11-09 17:10:56,494 INFO [masternode2:16000.activeMasterManager] master.ServerManager: Waiting for region servers count to settle; currently checked in 0, slept for 3009 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms. 2016-11-09 17:10:57,999 INFO [masternode2:16000.activeMasterManager] master.ServerManager: Waiting for region servers count to settle; currently checked in 0, slept for 4514 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms. 2016-11-09 17:10:58,179 INFO [PriorityRpcServer.handler=1,queue=1,port=16000] master.ServerManager: Registering server=slavenode4,16020,1478673919642 2016-11-09 17:10:58,179 INFO [PriorityRpcServer.handler=2,queue=0,port=16000] master.ServerManager: Registering server=slavenode2,16020,1478673931809 2016-11-09 17:10:58,180 INFO [PriorityRpcServer.handler=0,queue=0,port=16000] master.ServerManager: Registering server=slavenode3,16020,1478673936169 2016-11-09 17:10:58,199 INFO [masternode2:16000.activeMasterManager] master.ServerManager: Waiting for region servers count to settle; currently checked in 3, slept for 4714 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms. 2016-11-09 17:10:59,704 INFO [masternode2:16000.activeMasterManager] master.ServerManager: Finished waiting for region servers count to settle; checked in 3, slept for 6219 ms, expecting minimum of 1, maximum of 2147483647, master is running 2016-11-09 17:10:59,713 INFO [masternode2:16000.activeMasterManager] master.MasterFileSystem: Log folder hdfs://cluster-ha/hbase/WALs/slavenode2,16020,1475891578930-splitting doesn't belong to a known region server, splitting 2016-11-09 17:10:59,718 INFO [masternode2:16000.activeMasterManager] master.MasterFileSystem: Log folder hdfs://cluster-ha/hbase/WALs/slavenode2,16020,1478485353232-splitting doesn't belong to a known region server, splitting 2016-11-09 17:10:59,720 INFO [masternode2:16000.activeMasterManager] master.MasterFileSystem: Log folder hdfs://cluster-ha/hbase/WALs/slavenode2,16020,1478673931809 belongs to an existing region server 2016-11-09 17:10:59,722 INFO [masternode2:16000.activeMasterManager] master.MasterFileSystem: Log folder hdfs://cluster-ha/hbase/WALs/slavenode3,16020,1475975653763-splitting doesn't belong to a known region server, splitting 2016-11-09 17:10:59,727 INFO [masternode2:16000.activeMasterManager] master.MasterFileSystem: Log folder hdfs://cluster-ha/hbase/WALs/slavenode3,16020,1478673936169 belongs to an existing region server 2016-11-09 17:10:59,729 INFO [masternode2:16000.activeMasterManager] master.MasterFileSystem: Log folder hdfs://cluster-ha/hbase/WALs/slavenode4,16020,1478673919642 belongs to an existing region server 2016-11-09 17:10:59,799 INFO [masternode2:16000.activeMasterManager] master.RegionStates: Transition {1588230740 state=OFFLINE, ts=1478682659745, server=null} to {1588230740 state=OPEN, ts=1478682659798, server=slavenode2,16020,1478673931809} 2016-11-09 17:10:59,799 INFO [masternode2:16000.activeMasterManager] master.ServerManager: AssignmentManager hasn't finished failover cleanup; waiting 2016-11-09 17:10:59,800 INFO [masternode2:16000.activeMasterManager] master.HMaster: hbase:meta with replicaId 0 assigned=0, rit=false, location=slavenode2,16020,1478673931809 2016-11-09 17:10:59,871 INFO [masternode2:16000.activeMasterManager] hbase.MetaMigrationConvertingToPB: META already up-to date with PB serialization 2016-11-09 17:10:59,906 INFO [masternode2:16000.activeMasterManager] master.AssignmentManager: Found regions out on cluster or in RIT; presuming failover 2016-11-09 17:10:59,918 INFO [masternode2:16000.activeMasterManager] master.AssignmentManager: Joined the cluster in 47ms, failover=true 2016-11-09 17:10:59,926 INFO [MASTER_SERVER_OPERATIONS-masternode2:16000-1] handler.ServerShutdownHandler: Splitting logs for slavenode2,16020,1475891578930 before assignment; region count=0 2016-11-09 17:10:59,926 INFO [MASTER_SERVER_OPERATIONS-masternode2:16000-2] handler.ServerShutdownHandler: Splitting logs for slavenode2,16020,1478485353232 before assignment; region count=0 2016-11-09 17:10:59,932 INFO [MASTER_SERVER_OPERATIONS-masternode2:16000-1] master.SplitLogManager: dead splitlog workers [slavenode2,16020,1475891578930] 2016-11-09 17:10:59,935 INFO [MASTER_SERVER_OPERATIONS-masternode2:16000-0] handler.ServerShutdownHandler: Splitting logs for slavenode3,16020,1475975653763 before assignment; region count=0 2016-11-09 17:10:59,938 INFO [MASTER_SERVER_OPERATIONS-masternode2:16000-1] master.SplitLogManager: hdfs://cluster-ha/hbase/WALs/slavenode2,16020,1475891578930-splitting is empty dir, no logs to split 2016-11-09 17:10:59,938 INFO [MASTER_SERVER_OPERATIONS-masternode2:16000-2] master.SplitLogManager: dead splitlog workers [slavenode2,16020,1478485353232] 2016-11-09 17:10:59,939 INFO [MASTER_SERVER_OPERATIONS-masternode2:16000-1] master.SplitLogManager: started splitting 0 logs in [hdfs://cluster-ha/hbase/WALs/slavenode2,16020,1475891578930-splitting] for [slavenode2,16020,1475891578930] 2016-11-09 17:10:59,942 INFO [MASTER_SERVER_OPERATIONS-masternode2:16000-0] master.SplitLogManager: dead splitlog workers [slavenode3,16020,1475975653763] 2016-11-09 17:10:59,942 INFO [MASTER_SERVER_OPERATIONS-masternode2:16000-2] master.SplitLogManager: hdfs://cluster-ha/hbase/WALs/slavenode2,16020,1478485353232-splitting is empty dir, no logs to split 2016-11-09 17:10:59,942 INFO [MASTER_SERVER_OPERATIONS-masternode2:16000-2] master.SplitLogManager: started splitting 0 logs in [hdfs://cluster-ha/hbase/WALs/slavenode2,16020,1478485353232-splitting] for [slavenode2,16020,1478485353232] 2016-11-09 17:10:59,946 INFO [MASTER_SERVER_OPERATIONS-masternode2:16000-0] master.SplitLogManager: hdfs://cluster-ha/hbase/WALs/slavenode3,16020,1475975653763-splitting is empty dir, no logs to split 2016-11-09 17:10:59,946 INFO [MASTER_SERVER_OPERATIONS-masternode2:16000-0] master.SplitLogManager: started splitting 0 logs in [hdfs://cluster-ha/hbase/WALs/slavenode3,16020,1475975653763-splitting] for [slavenode3,16020,1475975653763] 2016-11-09 17:10:59,950 WARN [MASTER_SERVER_OPERATIONS-masternode2:16000-1] master.SplitLogManager: returning success without actually splitting and deleting all the log files in path hdfs://cluster-ha/hbase/WALs/slavenode2,16020,1475891578930-splitting 2016-11-09 17:10:59,950 INFO [MASTER_SERVER_OPERATIONS-masternode2:16000-1] master.SplitLogManager: finished splitting (more than or equal to) 0 bytes in 0 log files in [hdfs://cluster-ha/hbase/WALs/slavenode2,16020,1475891578930-splitting] in 11ms 2016-11-09 17:10:59,950 WARN [MASTER_SERVER_OPERATIONS-masternode2:16000-2] master.SplitLogManager: returning success without actually splitting and deleting all the log files in path hdfs://cluster-ha/hbase/WALs/slavenode2,16020,1478485353232-splitting 2016-11-09 17:10:59,950 INFO [MASTER_SERVER_OPERATIONS-masternode2:16000-2] master.SplitLogManager: finished splitting (more than or equal to) 0 bytes in 0 log files in [hdfs://cluster-ha/hbase/WALs/slavenode2,16020,1478485353232-splitting] in 8ms 2016-11-09 17:10:59,951 INFO [MASTER_SERVER_OPERATIONS-masternode2:16000-1] handler.ServerShutdownHandler: Reassigning 0 region(s) that slavenode2,16020,1475891578930 was carrying (and 0 regions(s) that were opening on this server) 2016-11-09 17:10:59,951 INFO [MASTER_SERVER_OPERATIONS-masternode2:16000-2] handler.ServerShutdownHandler: Reassigning 0 region(s) that slavenode2,16020,1478485353232 was carrying (and 0 regions(s) that were opening on this server) 2016-11-09 17:10:59,951 INFO [MASTER_SERVER_OPERATIONS-masternode2:16000-1] handler.ServerShutdownHandler: Finished processing of shutdown of slavenode2,16020,1475891578930 2016-11-09 17:10:59,951 INFO [MASTER_SERVER_OPERATIONS-masternode2:16000-2] handler.ServerShutdownHandler: Finished processing of shutdown of slavenode2,16020,1478485353232 2016-11-09 17:10:59,952 WARN [MASTER_SERVER_OPERATIONS-masternode2:16000-0] master.SplitLogManager: returning success without actually splitting and deleting all the log files in path hdfs://cluster-ha/hbase/WALs/slavenode3,16020,1475975653763-splitting 2016-11-09 17:10:59,952 INFO [MASTER_SERVER_OPERATIONS-masternode2:16000-0] master.SplitLogManager: finished splitting (more than or equal to) 0 bytes in 0 log files in [hdfs://cluster-ha/hbase/WALs/slavenode3,16020,1475975653763-splitting] in 6ms 2016-11-09 17:10:59,952 INFO [MASTER_SERVER_OPERATIONS-masternode2:16000-0] handler.ServerShutdownHandler: Reassigning 0 region(s) that slavenode3,16020,1475975653763 was carrying (and 0 regions(s) that were opening on this server) 2016-11-09 17:10:59,952 INFO [MASTER_SERVER_OPERATIONS-masternode2:16000-0] handler.ServerShutdownHandler: Finished processing of shutdown of slavenode3,16020,1475975653763 2016-11-09 17:11:00,020 INFO [masternode2:16000.activeMasterManager] master.HMaster: Master has completed initialization 2016-11-09 17:11:00,030 INFO [masternode2:16000.activeMasterManager] quotas.MasterQuotaManager: Quota support disabled 2016-11-09 17:11:00,031 INFO [masternode2:16000.activeMasterManager] zookeeper.ZooKeeperWatcher: not a secure deployment, proceeding 2016-11-09 17:21:17,254 INFO [515912987@qtp-1859203454-0] logs: Aliases are enabled