先贴一个总体架构图吧:
我们的客户端程序读取数据(以get为例)通过HTable和Get进行操作,我们从客户端代码开始分析读取数据的流程
conf 里主要是配置zookeeper的连接配置的:
HTable的构造函数:
public HTable(Configuration conf, final TableName tableName)
throws IOException {
this.tableName = tableName;
this.cleanupPoolOnClose = this.cleanupConnectionOnClose = true;
if (conf == null) {
this.connection = null;
return;
}
this.connection = HConnectionManager.getConnection(conf);
this.configuration = conf;
this.pool = getDefaultExecutor(conf);
this.finishSetup();
}
这个方法有三个比较重要的操作:
1、获取HConnection
HConnectionManager 内部缓存着 HConnectionKey 和 HConnectionImplementation 的映射,如果之前已经有连接,就直接从缓存中取就行,如果没有直接创建一个连接:
HConnectionImplementation connection = CONNECTION_INSTANCES.get(connectionKey);
if (connection == null) {
connection = (HConnectionImplementation)createConnection(conf, true);
CONNECTION_INSTANCES.put(connectionKey, connection);
createConnection 方法通过反射生成HConnectionImplementation对象,通过这个反射对象进行连接
String className = conf.get("hbase.client.connection.impl",
HConnectionManager.HConnectionImplementation.class.getName());
Class<?> clazz = null;
try {
clazz = Class.forName(className);
反射之后,创建连接:
Constructor<?> constructor =
clazz.getDeclaredConstructor(Configuration.class,
boolean.class, ExecutorService.class, User.class);
constructor.setAccessible(true);
return (HConnection) constructor.newInstance(conf, managed, pool, user);
2、获取ExecutorService,这是一个线程池,这个线程是的一个线程对应一个regionserver
3、finishSetup:配置HTable相关参数,创建 rpcCallerFactory 和 rpcCallerFactory,用于和 regionserver 进行 rpc 调用;还会初始化一个AsyncProcess,用于处理autoflush为false 或者 multiputs 的操作:
/** The Async process for puts with autoflush set to false or multiputs */
protected AsyncProcess<Object> ap;
……
this.rpcCallerFactory = RpcRetryingCallerFactory.instantiate(configuration);
this.rpcControllerFactory = RpcControllerFactory.instantiate(configuration);
ap = new AsyncProcess<Object>(connection, tableName, pool, null,
configuration, rpcCallerFactory, rpcControllerFactory);
ok,HTable初始化基本完成,进入HTable.get 方法
public Result get(final Get get) throws IOException {
// have to instanatiate this and set the priority here since in protobuf util we don't pass in
// the tablename... an unfortunate side-effect of public interfaces :-/ In 0.99+ we put all the
// logic back into HTable
final PayloadCarryingRpcController controller = rpcControllerFactory.newController();
controller.setPriority(tableName);
RegionServerCallable<Result>