Hbase-HMaster架构

本文详细介绍了HBase的HMaster架构,包括对外接口、执行服务、Zookeeper相关线程、文件接口和其他组件。深入剖析了RPC服务、Master操作线程池、ZooKeeperWatcher、RegionServerTracker、健康检查、平衡器、归档文件清理和日志清理等核心功能。此外,还涵盖了HMaster的启动流程、线程结构和关键任务的执行逻辑。
摘要由CSDN通过智能技术生成

 

HMaster的整体结构


一个master包含如下部分:

1.对外的接口

   RPC服务

   jetty web服务

   Master MBean

  其中RPC服务包括了若干listener,reader,以及handler线程(IPC Handler和 用于replication的IPC Handler)

2.执行服务

都是一些线程池,当有任务出现时就就会交给这些类来处理

这些线程有

MASTER_SERVER_OPERATIONS

MASTER_META_SERVER_OPERATIONS

MASTER_CLOSE_REGION

MASTER_OPEN_REGION

MASTER_TABLE_OPERATIONS

相关的hanlder有:

OpenRegionHandler

ClosedRegionHandler

ServerShutdownHandler

MetaServerShutdownHandler

DeleteTableHandler

DisableTableHandler

EnableTableHandler

ModifyTableHandler 

CreateTableHandler 

 

Executor Service

EventEvent Handler

Threads

(Default)

Master Open Region
RS_ZK_REGION_OPENED
OpenRegionHandler
5
Master Close Region
RS_ZK_REGION_CLOSED
ClosedRegionHandler
5
Master Server Operations
 
RS_ZK_REGION_SPLIT
M_SERVER_SHUTDOWN
SplitRegionHandler
ServerShutdownHandler
3
Master Meta Server Operations
M_META_SERVER_SHUTDOWN
MetaServerShutdownHandler
5
Master Table Operations
 
C_M_DELETE_TABLE C_M_DISABLE_TABLE C_M_ENABLE_TABLE C_M_MODIFY_TABLE C_M_CREATE_TABLE
DeleteTableHandler DisableTableHandler EnableTableHandler ModifyTableHandler CreateTableHandler
1

 

3.和zookeeper相关的线程

1.ActiveMasterManager
会在ZK中创建/hbase/master短暂节点,master将其信息记录到这个节点下
如果是备份的master会在这里阻塞,直到这个节点为空

2.RegionServerTracker
用于监控region server,通过监控ZK的/hbase/rs节点,获取region server的状态
当region server上线或者下线,ZK都会触发通知事件

3.DrainingServerTracker
没太明白,貌似是处理RS增加和删除事件用的

4.CatalogTracker
用来监控META表和ROOT表

5.ClusterStatusTracker
用于监控ZK的/shutdown节点,监控是否有机器宕机了

6.AssignmentManager
用于管理和分配region的

7.RootRegionTracker
用于管理和监控/root-region-server 节点的

8.LoadBalancer
用于平衡各个regoin server上的region

9.MetaNodeTracker
监控/unassigned 节点,分配那些未在META表中存在的region

此外在 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher类中还负责管理一些ZK节点
baseZNode			/hbase
assignmentZNode			/unassigned
rsZNode				/rs
drainingZNode			/draining
masterTableZNode		/table
masterTableZNode92		/table92	(用于hbase0.92版本)
splitLogZNode			/splitlog
backupMasterAddressesZNode	/backup-masters
clusterStateZNode		/shutdown
masterAddressZNode		/master
clusterIdZNode         		/hbaseid

 

ZK监听相关的类图


4.文件接口和其他

MasterFileSystem

用于创建META表和ROOT表,.oldlog目录,hbase.version文件等

 

LogCleaner

用于定期的清理.oldlog目录中的内容

 

HFileCleaner

用于定期清理归档目录下的内容

 

其他包括后台线程如LogCleaner和HFileCleaner等

ServerManager 维护一个在线和下线的RS列表

Balancer 用于执行region均衡的后台线程

 

 

 

 

 

HMaster的相关配置

参数名称默认值含义
hbase.master.handler.count25工作线程大小
hbase.master.buffer.for.rs.fatals1M 
mapred.task.id  
hbase.master.wait.for.log.splittingfalse 
zookeeper.session.timeout180秒 
hbase.master.backup  
hbase.master.impl  
hbase.master.event.waiting.time1000 

 

 

 

 

 

HMaster的启动入口类

org.apache.hadoop.hbase.master.HMaster

hbase-site.xml中可以配置参数 hbase.master.impl来自定自己的实现,但必须继承HMaster

之后调用HMasterCommandLine (这个类继承自ServerCommandLine)

HMasterCommandLine使用hadoop提供的ToolRunner去运行

ToolRunner#run(Configuration,Tool,String[])

ToolRunner会调用GenericOptionsParser,解析一些固定的参数,如-conf,-D,-fs,-files 这样的参数

解析好之后,配置configuration对象,然后将启动参数传给Tool接口的实现

所以ToolRunner 就是一个启动参数解析,配置configuration对象的工具类,然后将这些信息交给Tool实现类

 

调用顺序是

1.HMaster#main()

2.HMasterCommandLine#doMain()

3.ToolRunner#run()

4.HMasterCommandLine#run()

5.HMasterCommandLine#startMaster()

6.HMaster#constructMaster()

7.反射调用HMaster的构造函数


 

 

 

 

 

初始化-调用HRgionServer构造函数

1.配置host,NDS相关

2.配置RPC连接,创建RPC连接

3.初始化ZK认证

4.创建ZooKeeperWatcher(和ZK相关的线程),RPC服务,metrics

5.创建HealthCheckChore

6.配置splitlog相关

 

 

 

 

 

启动,HMaster#run (在新线程中启动)

//将当前的master变成active状态(如果是备份master则一直等待)
//完成初始化
HMaster#run() {
	becomeActiveMaster(startupStatus);
	finishInitialization(startupStatus, false);	
}


//如果当前的master不是活跃的则一直等待
HMaster#becomeActiveMaster() {
    this.activeMasterManager = new ActiveMasterManager(zooKeeper, this.serverName,this);
    this.zooKeeper.registerListener(activeMasterManager);
    while (!amm.isActiveMaster()) {
    	Thread.sleep(c.getInt("zookeeper.session.timeout", 180 * 1000));	
    }
    this.clusterStatusTracker = new ClusterStatusTracker(getZooKeeper(), this);
    this.clusterStatusTracker.start();
    return this.activeMasterManager.blockUntilBecomingActiveMaster(startupStatus,this.clusterStatusTracker);
}

//初始化master组件,文件系统,ServerManager
//AssignmentManager,RegionServerTracker,CatalogTracker等
//设置Zookeeper的集群状态
//等待RegionServer的检查完毕
//如果.log目录下有文件,则执行split log任务
//分配ROOT和META的region
//处理可以运行的RegionServer和宕机的RegionServer
HMaster#finishInitialization() {
	//检查ROOT和META表是否存在,不存在则创建,还会创建tmp目录,oldlog目录
	fileSystemManager = new MasterFileSystem();
	tableDescriptors = new FSTableDescriptors(fileSystemManager.getFileSystem(),fileSystemManager.getRootDir());
	
	//创建CatalogTracker,LoadBalancer,AssignmentManager
	//RegionServerTracker,DrainingServerTracker
	//ClusterStatusTracker,SnapshotManager
	initializeZKBasedSystemTrackers();
	
	//开启service线程,如openregion线程,closeregion线程,serveroptions线程等
	//再开启jetty服务和RPC服务
	startServiceThreads();
	
	//将所有的RegionServer加入到ServerManager中,ServerManager负责管理
	//所有在线宕机的server,并负责启动和关闭
	for (ServerName sn: regionServerTracker.getOnlineServers()) {
		ServerManager.recordNewServer(sn, HServerLoad.EMPTY_HSERVERLOAD);
	}
	
	//如果有log日志则进行预处理然后挂到ZK上,再由所有RS处理
	if (waitingOnLogSplitting) {
		fileSystemManager.splitAllLogs(servers);	
	}
	
	//如果ROOT表和META为分配则先分配
	assignRoot();
	assignMeta();
	enableServerShutdownHandler();
	
	//处理所有宕机的server
	for (ServerName curServer : failedServers) {
		serverManager.expireServer(curServer);
    }	
    DefaultLoadBalancer.setMasterServices();
	startCatalogJanitorChore();
	registerMBean();
}



HMaster#assignRoot() {
	//先看一下分区正在转换状态当中,
	//如果处于转换状态当中则先处理相关的状态,并等待体处理结束后再往下进行
	processRegionInTransitionAndBlockUntilAssigned();
	verifyRootRegionLocation();
	getRootLocation();
	expireIfOnline();
	//先删掉"/hbase/root-region-server",不管它存不存在
	//KeeperException.NoNodeException被忽略了  
	//写入EventType.M_ZK_REGION_OFFLINE、当前时间戳、跟分区名(-ROOT-,,0)
	//master的版本化ServerName  
	//到/hbase/unassigned/70236052, payload为null,所以不写入 
}

HMaster#run的时序图如下


 

 

 

 

 

HMaster包含的一些变量

InfoServer

ZooKeeperWatcher

ActiveMasterManager

RegionServerTracker

DrainingServerTracker

RPCServer

MasterMetrics

MasterFileSystem

ServerManager

AssignmentManager

CatalogTracker

ClusterStatusTracker

CatalogJanitor

LogCleaner

HFileCleaner

TableDescriptors

SnapshotManager

HealthCheckChore

 

 

 

 

 

HMaster的线程

RPC相关的的listener线程,reader线程,handler线程

Daemon Thread [IPC Server listener on 60000] (Suspended)

Daemon Thread [IPC Reader 3 on port 60000] (Suspended)

Daemon Thread [IPC Server handler 0 on 60000] (Suspended)

Daemon Thread [REPL IPC Server handler 2 on 60000] (Running)

Daemon Thread [IPC Server Responder] (Running)

 

ZK相关线程

Daemon Thread [main-EventThread] (Suspended)

Daemon Thread [main-SendThread(myhost:2181)] (Suspended) 

 

后台线程

Daemon Thread [myhost,60000,1427458363875-BalancerChore] (Running)

Daemon Thread [myhost,60000,1427458363875-CatalogJanitor] (Running)

Daemon Thread [master-myhost,60000,1427458363875.archivedHFileCleaner] (Running)

Daemon Thread [master-myhost,60000,1427458363875.oldLogCleaner] (Running)

Daemon Thread [myhost,60000,1427458363875.splitLogManagerTimeoutMonitor] (Running)

Daemon Thread [myhost,60000,1427458363875.timerUpdater] (Running)

 

监控线程

Daemon Thread [Timer thread for monitoring hbase] (Running)

Daemon Thread [Timer thread for monitoring jvm] (Running)

Daemon Thread [Timer thread for monitoring rpc] (Running)

Daemon Thread [myhost,60000,1427458363875.timeoutMonitor] (Running)

 

 

jetty相关线程

Thread [1008881877@qtp-314160763-0] (Running)

 

 

 

 

 

timeoutMonitor(用于分配region)线程执行原理(AssignmentManager$TimeoutMonitor)


执行逻辑如下:

//在独立的线程中运行
//从Chore#run()函数调到这里的
AssignmentManager$TimeoutMonitor#chore() {
	for (RegionState regionState : regionsInTransition.values()) {
		if (regionState.getStamp() + timeout <= now) {
			//decide on action upon timeout
            actOnTimeOut(regionState);
		} else if (this.allRegionServersOffline && !allRSsOffline) {
			RegionPlan existingPlan = regionPlans.get(regionState.getRegion().getEncodedName());
			if (existingPlan == null || !this.serverManager.isServerOnline(existingPlan.getDestination())) {
				actOnTimeOut(regionState);
			}
		}
	}
}

//判断当前region的状态,如果下线了则分配
AssignmentManager$TimeoutMonitor#actOnTimeOut() {
	HRegionInfo regionInfo = regionState.getRegion();
	switch (regionState.getState()) {
	case CLOSED:
		regionState.updateTimestampToNow();
		break;
	case OFFLINE:
		invokeAssign(regionInfo);
        break;			
	case PENDING_OPEN:
        invokeAssign(regionInfo);
        break;
	case OPENING:
        processOpeningState(regionInfo);
        break;        	                
	case OPEN:
		regionState.updateTimestampToNow();
		break;
	case PENDING_CLOSE:
		invokeUnassign(regionInfo);
		break;
	case CLOSING:
		invokeUnassign(regionInfo);
		break;		
}

//通过AssignCallable#call()调用
//分配region,先修改ZK的znode信息
//然后调用sendRegionOpen(),这里会触发HRegionServer#openRegion()函数
//最后创建OpenRegionHandler放到线程池中执行,
//再调用HRegion#openRegion()函数
AssignmentManager#assign() {
	for (int i = 0; i < this.maximumAssignmentAttempts; i++) {
		String tableName = region.getTableNameAsString();
		if (!zkTable.isEnablingTable(tableName) && !zkTable.isEnabledTable(tableName)) {
			setEnabledTable(region);				
		}	
		RegionOpeningState regionOpenState = ServerManager.sendRegionOpen();	
		if (regionOpenState == RegionOpeningState.OPENED) {
			return;	
		} else if (regionOpenState == RegionOpeningState.ALREADY_OPENED) {
			ZKAssign.deleteOfflineNode(master.getZooKeeper(), encodedRegionName);	
		}
	}
}

//处理未分配的region,将其关闭
AssignmentManager#unassign() {
	state = regionsInTransition.get(encodedName);
	if (state == null) {
		ZKAssign.createNodeClosing(master.getZooKeeper(), region, master.getServerName());	
	} else if (force && (state.isPendingClose() || state.isClosing())) {
		state.update(state.getState());	
	} else {
		return;	
	}
	ServerName server = regions.get(region);
	if (server == null) {
		deleteClosingOrClosedNode(region);	
	}
	ServerManager.sendRegionClose();
}

 

 

 

 

 

CatalogJanitor线程(CatalogJanitor)

这个线程用于扫描split后残留的部分,比如split之后父region的META信息可以删除了

同样split之后,info:splitA和info:splitB这两个META表中的信息也可以删除了

主要逻辑如下:

//在独立的线程中运行
//从Chore#run()函数调到这里的
CatalogJanitor#scan() {
	Pair<Integer, Map<HRegionInfo, Result>> pair = getSplitParents();
    Map<HRegionInfo, Result> splitParents = pair.getSecond();
    int cleaned = 0;
    for (Map.Entry<HRegionInfo, Result> e : splitParents.entrySet()) {
    	if (!parentNotCleaned.contains(e.getKey().getEncodedName())) {
    		cleanParent(e.getKey(), e.getValue());
    		cleaned++;    			
    	} else {
    		//info:splitA 和 info:splitB 列
    		parentNotCleaned.add(getDaughterRegionInfo("splitA");
    		parentNotCleaned.add(getDaughterRegionInfo("splitB");	
    	}
    }
}

//如果分割之后的splitA和splitB两个新region不再引用
//父region,则将父region删除
//最后创建Delete对象删除父对象,再将其从META表中删除
CatalogJanitor#cleanParent() {
	HRegionInfo a_region = getDaughterRegionInfo(rowContent, "splitA");
    HRegionInfo b_region = getDaughterRegionInfo(rowContent, "splitB");
    Pair<Boolean, Boolean> a = checkDaughterInFs(parent, a_region, "splitA");
    Pair<Boolean, Boolean> b = checkDaughterInFs(parent, b_region, "splitB");
    removeDaughtersFromParent(parent);
    FileSystem fs = this.services.getMasterFileSystem().getFileSystem();
	HFileArchiver.archiveRegion(this.services.getConfiguration(), fs, parent);
	Delete delete = new Delete(regionInfo.getRegionName());
    deleteFromMetaTable(catalogTracker, delete);
}

//检查splitA和splitB两个新region是否还引用父region
CatalogJanitor#checkDaughterInFs() {
	FileSystem fs = this.services.getMasterFileSystem().getFileSystem();
    Path rootdir = this.services.getMasterFileSystem().getRootDir();
    Path tabledir = new Path(rootdir, split.getTableNameAsString());
    Path regiondir = new Path(tabledir, split.getEncodedName());
    exists = fs.exists(regiondir);	
    HTableDescriptor parentDescriptor = getTableDescriptor(parent.getTableName());
	for (HColumnDescriptor family: parentDescriptor.getFamilies()) {
		Path p = Store.getStoreHomedir(tabledir, split.getEncodedName(),family.getName());
      	if (!fs.exists(p)) {
			continue;	
		}
		// Look for reference files.  Call listStatus with anonymous instance of PathFilter.
      	FileStatus [] ps = FSUtils.listStatus(fs, p,
		new PathFilter () {
			public boolean accept(Path path) {
				return StoreFile.isReference(path);
            }
		});		
	}
}

//创建Delete对象,将META表中的splitA和splitB
//这些在split时候创建的已经无用的列删除
CatalogJanitor#removeDaughtersFromParent() [
	Delete delete = new Delete(parent.getRegionName());
    delete.deleteColumns("info","splitA");
    delete.deleteColumns("info","splitB");
    deleteFromMetaTable(catalogTracker, delete);
}

 

 

 

 

 

BalancerChore线程(HMaster#balance)

这个类负责执行balance过程,具体逻辑如下:

//在单独线程中执行,通过HMaster$2#run()调用到这里的
//收集所有的region然后执行balance()
//具体细节没看明白
HMaster#balance() {
	Map<String, Map<ServerName, List<HRegionInfo>>> assignmentsByTable =
        this.assignmentManager.getAssignmentsByTable();	
 	List<RegionPlan> plans = new ArrayList<RegionPlan>();
	for (Map<ServerName, List<HRegionInfo>> assignments : assignmentsByTable.values()) {
		List<RegionPlan> partialPlans = this.balancer.balanceCluster(assignments);
        if (partialPlans != null) {
        	plans.addAll(partialPlans);
        }
	}
	for (RegionPlan plan: plans) {
		AssignmentManager.balance(plan);
	}        
}

//执行balance过程,将待执行的region放到map中
//最后执行unassign()函数没看懂
AssignmentManager#balance() {
	synchronized (this.regionPlans) {
		this.regionPlans.put(plan.getRegionName(), plan);
    }
    unassign(plan.getRegionInfo());	
}

 

 

 

 

 

archivedHFileCleaner线程(HFileCleaner#chore)

这个类用于删除archive目录下的归档文件,具体逻辑如下:

//这里是调用父类CleanerChore#chore()函数
//用来清理.archive目录下的归档文件
HFileCleaner#chore() {
	FileStatus[] files = FSUtils.listStatus(this.fs, this.oldFileDir, null);
	for (FileStatus file : files) {
		if (file.isDir()) {
			checkAndDeleteDirectory(file.getPath());
		} else {
			checkAndDelete(file.getPath());
		}
	}
}

//检查并删除目录
CleanerChore#checkAndDeleteDirectory() {
	FileStatus[] children = FSUtils.listStatus(fs, toCheck, null);
	HBaseFileSystem.deleteFileFromFileSystem(fs, toCheck);
}

//检查并删除文件
CleanerChore#checkAndDelete() {
	HBaseFileSystem.deleteDirFromFileSystem(fs, filePath);
}

 

 

 

 

 

oldLogCleaner线程(LogCleaner)

这个类用于oldlog目录下文件

具体执行逻辑和archivedHFileCleaner线程一样

都是调用父类CleanerChore#chore()函数去执行的

 

 

 

 

 

timerUpdater线程(AssignmentManager$TimerUpdater#chore)

这个类用于更新region的时间戳,这些region都是出于事务中的region

主要逻辑如下:

//在单独线程中执行,通过Chore#run()调用到这里的
AssignmentManager$TimerUpdater#chore() {
	while (!serversInUpdatingTimer.isEmpty() && !stopper.isStopped()) {
		if (serverToUpdateTimer == null) {
			serverToUpdateTimer = serversInUpdatingTimer.first();
		} else {
			serverToUpdateTimer = serversInUpdatingTimer.higher(serverToUpdateTimer);
		}
		updateTimers(serverToUpdateTimer);
	}		        
}

//更新处于事务中的region的时间戳
//这里会迭代所有机器,然后更新每个机器上的region
AssignmentManager#updateTimers() {
	for (Map.Entry<String, RegionPlan> e: copy.entrySet()) {
		rs = this.regionsInTransition.get(e.getKey());
		rs.updateTimestampToNow();
	}
}

 

 

 

 

 

splitLogManagerTimeoutMonitor线程(SplitLogManager$TimeoutMonitor#chore)

这个类用于周期性的检查是否有执行超时的任务(获取ZK的split节点的任务,然后执行切分日志工作),如果有则

需要重新提交这个任务,如果出现region下线,server宕机等情况也需要重新提交,最后删除失败的任务

具体逻辑如下:

//在单独线程中执行,通过Chore#run()调用到这里的
//周期性的检查是否有处理splitlog超时的region,或者
//出现某些region下线了,这时候需要重新提交splitlog
//最后将失败的任务删除掉
SplitLogManager$TimeoutMonitor#chore() {
	for (Map.Entry<String, Task> e : tasks.entrySet()) {
		if (localDeadWorkers != null && localDeadWorkers.contains(cur_worker)) {
			if (resubmit(path, task, FORCE)) {
				resubmitted++;
			} else {
				//将死掉的工作regoin server放入列表中
				handleDeadWorker(cur_worker);
	        }	
		} else if (resubmit(path, task, CHECK)) {
          resubmitted++;
        }	
	}
	for (Map.Entry<String, Task> e : tasks.entrySet()) {
		String path = e.getKey();
		Task task = e.getValue();
		if (task.isUnassigned() && (task.status != FAILURE)) {
			// We just touch the znode to make sure its still there
            tryGetDataSetWatch(path);
		}			
	}	
	createRescanNode(Long.MAX_VALUE);
	
	// Retry previously failed deletes
	if (failedDeletions.size() > 0) {
		for (String tmpPath : tmpPaths) {
			// deleteNode is an async call
			deleteNode(tmpPath, zkretries);
		}      	
    }
}

//异步删除节点
SplitLogManager#deleteNode() {
	ZooKeeper.delete(path, -1, new DeleteAsyncCallback(),retries);	
}

 

 

 

 

 

参考

HMaster架构

master和regionserver启动过程

 

 

 

 

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值