Hbase 源码分析之当regionserver挂掉以后

当Master通过ZK获知Regionserver挂掉以后,通过调用[b][size=large][color=red]expireServer[/color][/size][/b]接口进行处理
public synchronized void Hmaster.ServerManager.expireServer(final HServerInfo hsi){
获取serverName
从onlineserverlist中获取serverinfo
如果serverinfo为空,则抛sever不在线上的警告,return
如果已经存在于deadservers中,则抛警告,return
加到deadservers中
从onlineservers中删除
删除与该server的rpc连接
如果集群处于关闭中,则return
通过CatalogTracker判断挂掉的server之前是否有分配root或meta region
然后分别调用
this.services.getExecutorService().submit(new MetaServerShutdownHandler(this.master,this.services, this.deadservers, info, carryingRoot, carryingMeta));
或者
1 this.services.getExecutorService().submit(ServerShutdownHandler(this.master,this.services, this.deadservers, info));
//这两者的区别是 MetaServerShutdownHandler恢复时需要assignRoot()或assignMeta()
}


[color=red][size=large]1.ServerShutdownHandler的procss接口[/size][/color]如下
public void ServerShutdownHandler.process(){
1.1 split hlog
获取挂掉的regionserver上的regionstate列表
判断是否需要assign root 或者 meta region
while(!this.server.isStopped()){
try{
等待meta恢复
通过meta表获取挂掉的regionserver上所有Hregioninfo信息hris
break
}catch(){}
}
遍历获取到的regionstate列表,hris中删除那些state不是CLOSING且不是PENDING_CLOSE的region
遍历hris中的region:
//如果region所在的表disable或者该region已经split 则返回否
if(检查是否需要assign)
1.2 this.services.getAssignmentManager().assign(e.getKey(), true);
}


[color=red][b][size=large]1.1 split hlog[/size][/b][/color]
public void hbase.master. MasterFileSystem. splitLog(final String serverName){
获取splitLog锁
获取hdfs中挂掉的regionserver的hlog目录所在位置(默认为.logs/servername)
创建对象HLogSplitter splitter
1.1.1 调用splitter.splitLog()切分hlog
释放splitLog锁
}


[color=red][size=large]1.1.1 splitter.splitLog()[/size][/color]
public List<Path> hbase.regionserver.wal.HLogSplitter.splitLog(){
检查是否hasSplit
hasSplit置为true
对hdfs上的hlog目录做一些检查,如是否存在,目录下是否存在文件,没问题的话就调用
1.1.1.a List<Path> splitLog(logfiles)
}


[color=red][size=large][b]1.1.1.a[/b][/size][/color] splitlog时,会有一个entryBuffers,读线程从hlog中读取edit放入entryBuffers,而写线程则从中获取buffer,写到hdfs中对应的region目录下recover.edits目录中
private List<Path> hbase.regionserver.wal.HLogSplitter.splitLog(final FileStatus[] logfiles){
new两个list processedLogs和corruptedLogs 用于存放已正常处理的logs和已损坏的logs
获取是否skipErrors = conf.getBoolean("hbase.hlog.split.skip.errors", true);
//负责将entryBuffers中的数据写入到hdfs中各个region目录下的recover.edits目录中
启动写线程outputSink.startWriterThreads(entryBuffers);默认为3个("hbase.regionserver.hlog.splitlog.writer.threads", 3);
遍历logfiles中的logfile{
通过lease机制检查logfile是否能够append
解析logfile,将其中的内容加载到内存,即放入entryBuffers中
}
对logfiles进行归档,把有损坏并且跳过的移到.corrupt目录中
把处理好的移到.oldlogs中,然后删除.logs/挂掉的regionserver目录
等待写线程完毕
返回新写的在region目录/recover.edits目录下的各个logfile路径
}


[size=large][color=red]1.2 分配region[/color][/size],分配前先检查region所在的表是否disable和是否集群处于关闭中,是的话就跳过,不然就加到rit后(RegionInTransaction),开始分配
private void hbase.master.AssignmentManager.assign(final RegionState state, final boolean setOfflineInZK,final boolean forceNewPlan)
{
将region在zk上的状态置为offline
获取regionplan(将region随机分配给在线的一个regionserver)
将region状态置为PENDING_OPEN
1.2.1向regionserver发送openregion的rpc请求(regionserver调用openregionopenRegion(HRegionInfo region)接口)
}


[size=large][b][color=red]1.2.1 openRegion处理[/color][/b][/size]
regionserver异步地执行openRegionHandler
regionserver到zk上将该region的状态置为RS_ZK_REGION_OPENING
master检测到zk上该region状态为RS_ZK_REGION_OPENING后,会将该region状态置为OPENING,并定期检查opening状态有没有timeout
hregion执行replayRecoveredEditsIfAny(),即从log中将数据还原到table中。方法是将log解析出来后写到内存中(store),然后累积到一定值后直接刷到磁盘上
更新META表,写入该region信息
到zk上将该region状态置为RS_ZK_REGION_OPENED
master检测到RS_ZK_REGION_OPENED后,再把region状态更新为OPEN
展开阅读全文

hbase regionserver启动失败

07-31

三台主机:rn192.168.1.121 sp.soft.pc1 作为主机masterrn192.168.1.122 sp.soft.pc2rn192.168.1.123 sp.soft.pc3rn搭建了hadoop集群,zookeeper集群,启动后各服务后进程如下:rn[img=https://img-bbs.csdn.net/upload/201807/31/1533028510_624154.png][/img]rnrn在sp.soft.pc1下启动hbase,结果如下:rn[img=https://img-bbs.csdn.net/upload/201807/31/1533028465_520977.png][/img]rn另外两台机器中的hregionserver启动后很快就停止了,查看日志有如下错误:rn2018-07-31 16:53:12,908 ERROR [regionserver/sp:16020] regionserver.HRegionServer: pache/hadoop/fs/ContentSummary; @98: invokestaticrn Reason:rn Type 'org/apache/hadoop/fs/ContentSummary$Builder' (current frame, stack[1]) is not assignable to 'org/apache/hadoop/fs/QuotaUsage$Builder'rn Current Frame:rn bci: @98rn flags: rn locals: 'org/apache/hadoop/hdfs/protocol/proto/HdfsProtos$ContentSummaryProto', 'org/apache/hadoop/fs/ContentSummary$Builder' rn stack: 'org/apache/hadoop/hdfs/protocol/proto/HdfsProtos$StorageTypeQuotaInfosProto', 'org/apache/hadoop/fs/ContentSummary$Builder' rn Bytecode:rn 0x0000000: 2ac7 0005 01b0 bb03 4159 b703 424c 2b2arn 0x0000010: b603 43b6 0344 2ab6 0345 b603 462a b603rn 0x0000020: 47b6 0348 2ab6 0349 b603 4a2a b603 4bb6rn 0x0000030: 034c 2ab6 034d b603 4e2a b603 4fb6 0350rn 0x0000040: 2ab6 0351 b603 522a b603 53b6 0354 2ab6rn 0x0000050: 0355 b603 5657 2ab6 0357 9900 0b2a b603rn 0x0000060: 582b b803 592b b603 5ab0 rn Stackmap Table:rn same_frame(@6)rn append_frame(@101,Object[#2149])rn *****rnjava.lang.VerifyError: Bad type on operand stackrn..................................rn2018-07-31 16:53:13,146 ERROR [main] regionserver.HRegionServerCommandLine: Region server exitingrnjava.lang.RuntimeException: HRegionServer Abortedrn at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.start(HRegionServerCommandLine.java:67)rn at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.run(HRegionServerCommandLine.java:87)rn at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)rn at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:149)rn at org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:2968)rnrnrn日志内容太多,没有完全贴出来。有哪位大神来瞅瞅,我这是反了什么低级错误导致的。rn 论坛

没有更多推荐了,返回首页