metastore作用_Hive metastore整体代码分析及详解

从上一篇对Hive metastore表结构的简要分析中,我再根据数据设计的实体对象,再进行整个代码结构的总结。那么我们先打开metadata的目录,其目录结构:

可以看到,整个hivemeta的目录包含metastore(客户端与服务端调用逻辑)、events(事件目录包含table生命周期中的检查、权限认证等listener实现)、hooks(这里的hooks仅包含了jdo connection的相关接口)、parser(对于表达树的解析)、spec(partition的相关代理类)、tools(jdo execute相关方法)及txn及model,下来我们从整个metadata分逐一进行代码分析及注释:

没有把包打开,很多类?是不是感觉害怕很想死?我也想死,咱们继续。。一开始,我们可能觉得一团乱麻烦躁,这是啥玩意儿啊这。。冷静下来,我们从Hive这个大类开始看,因为它是metastore元数据调用的入口。整个生命周期分析流程为: HiveMetaStoreClient客户端的创建及加载、HiveMetaStore服务端的创建及加载、createTable、dropTable、AlterTable、createPartition、dropPartition、alterPartition。当然,这只是完整metadata的一小部分。

1、HiveMetaStoreClient客户端的创建及加载

那么我们从Hive这个类一点点开始看:

1 private HiveConf conf = null;2 privateIMetaStoreClient metaStoreClient;3 privateUserGroupInformation owner;4

5 //metastore calls timing information

6 private final Map metaCallTimeMap = new HashMap();7

8 private static ThreadLocal hiveDB = new ThreadLocal() {9 @Override10 protected synchronizedHive initialValue() {11 return null;12 }13

14 @Override15 public synchronized voidremove() {16 if (this.get() != null) {17 this.get().close();18 }19 super.remove();20 }21 };

这里声明的有hiveConf对象、metaStoreClient 、操作用户组userGroupInfomation以及调用时间Map,这里存成一个map,用来记录每一个动作的运行时长。同时维护了一个本地线程hiveDB,如果db为空的情况下,会重新创建一个Hive对象,代码如下:

1 public static Hive get(HiveConf c, boolean needsRefresh) throwsHiveException {2 Hive db =hiveDB.get();3 if (db == null || needsRefresh || !db.isCurrentUserOwner()) {4 if (db != null) {5 LOG.debug("Creating new db. db = " + db + ", needsRefresh = " + needsRefresh +

6 ", db.isCurrentUserOwner = " +db.isCurrentUserOwner());7 }8 closeCurrent();9 c.set("fs.scheme.class", "dfs");10 Hive newdb = newHive(c);11 hiveDB.set(newdb);12 returnnewdb;13 }14 db.conf =c;15 returndb;16 }

随后我们会发现,在创建Hive对象时,便已经将function进行注册,什么是function呢,通过上次的表结构分析,可以理解为所有udf等jar包的元数据存储。代码如下:

1 //register all permanent functions. need improvement

2 static{3 try{4 reloadFunctions();5 } catch(Exception e) {6 LOG.warn("Failed to access metastore. This class should not accessed in runtime.",e);7 }8 }9

10 public static void reloadFunctions() throwsHiveException {

//获取 Hive对象,用于后续方法的调用11 Hive db =Hive.get();

//通过遍历每一个dbName12 for(String dbName : db.getAllDatabases()) {

//通过dbName查询挂在该db下的所有function的信息。13 for (String functionName : db.getFunctions(dbName, "*")) {14 Function function =db.getFunction(dbName, functionName);15 try{

//这里的register便是将查询到的function的数据注册到Registry类中的一个Map中,以便计算引擎在调用时,不必再次查询数据库。16 FunctionRegistry.registerPermanentFunction(17 FunctionUtils.qualifyFunctionName(functionName, dbName), function.getClassName(),18 false, FunctionTask.toFunctionResource(function.getResourceUris()));19 } catch(Exception e) {20 LOG.warn("Failed to register persistent function " +

21 functionName + ":" + function.getClassName() + ". Ignore and continue.");22 }23 }24 }25 }

调用getMSC()方法,进行metadataClient客户端的创建,代码如下:

1 1 private IMetaStoreClient createMetaStoreClient() throwsMetaException {2 2

3     //这里实现接口HiveMetaHookLoader

4 3 HiveMetaHookLoader hookLoader = newHiveMetaHookLoader() {5 4@Override6 5 publicHiveMetaHook getHook(7 6org.apache.hadoop.hive.metastore.api.Table tbl)8 7 throwsMetaException {9 8

10 9 try{11 10 if (tbl == null) {12 11 return null;13 12}14          //根据tble的kv属性加载不同storage的实例,比如hbase、redis等等拓展存储,作为外部表进行存储

15 13 HiveStorageHandler storageHandler =

16 14HiveUtils.getStorageHandler(conf,17 15tbl.getParameters().get(META_TABLE_STORAGE));18 16 if (storageHandler == null) {19 17 return null;20 18}21 19 returnstorageHandler.getMetaHook();22 20 } catch(HiveException ex) {23 21LOG.error(StringUtils.stringifyException(ex));24 22 throw newMetaException(25 23 "Failed to load storage handler: " +ex.getMessage());26 24}27 25}28 26};29 27 returnRetryingMetaStoreClient.getProxy(conf, hookLoader, metaCallTimeMap,30 28 SessionHiveMetaStoreClient.class.getName());31 29 }

2、HiveMetaStore服务端的创建及加载

在HiveMetaStoreClient初始化时,会初始化HiveMetaStore客户端,代码如下:

1 publicHiveMetaStoreClient(HiveConf conf, HiveMetaHookLoader hookLoader)2 throwsMetaException {3

4 this.hookLoader =hookLoader;5 if (conf == null) {6 conf = new HiveConf(HiveMetaStoreClient.class);7 }8 this.conf =conf;9 filterHook =loadFilterHooks();10    //根据hive-site.xml中的hive.metastore.uris配置,如果配置该参数,则认为是远程连接,否则为本地连接

11 String msUri =conf.getVar(HiveConf.ConfVars.METASTOREURIS);12 localMetaStore =HiveConfUtil.isEmbeddedMetaStore(msUri);13 if(localMetaStore) {

//本地连接直接连接HiveMetaStore

16 client = HiveMetaStore.newRetryingHMSHandler("hive client", conf, true);17 isConnected = true;18 snapshotActiveConf();19 return;20 }21

22 //获取配置中的重试次数及timeout时间

23 retries =HiveConf.getIntVar(conf, HiveConf.ConfVars.METASTORETHRIFTCONNECTIONRETRIES);24 retryDelaySeconds =conf.getTimeVar(25 ConfVars.METASTORE_CLIENT_CONNECT_RETRY_DELAY, TimeUnit.SECONDS);26

27 //拼接metastore uri

28 if (conf.getVar(HiveConf.ConfVars.METASTOREURIS) != null) {29 String metastoreUrisString[] =conf.getVar(30 HiveConf.ConfVars.METASTOREURIS).split(",");31 metastoreUris = newURI[metastoreUrisString.length];32 try{33 int i = 0;34 for(String s : metastoreUrisString) {35 URI tmpUri = newURI(s);36 if (tmpUri.getScheme() == null) {37 throw new IllegalArgumentException("URI: " +s38 + " does not have a scheme");39 }40 metastoreUris[i++] =tmpUri;41

42 }43 } catch(IllegalArgumentException e) {44 throw(e);45 } catch(Exception e) {46 MetaStoreUtils.logAndThrowMetaException(e);47 }48 } else{49 LOG.error("NOT getting uris from conf");50 throw new MetaException("MetaStoreURIs not found in conf file");51 }52 调用open方法创建连接

53 open();54 }

从上面代码中可以看出,如果我们是远程连接,需要配置hive-site.xml中的hive.metastore.uri,是不是很熟悉?加入你的client与server不在同一台机器,就需要配置进行远程连接。那么我们继续往下面看,创建连接的open方法:

1 private void open() throwsMetaException {2 isConnected = false;3 TTransportException tte = null;

//是否使用Sasl4 boolean useSasl =conf.getBoolVar(ConfVars.METASTORE_USE_THRIFT_SASL);

//If true, the metastore Thrift interface will use TFramedTransport. When false (default) a standard TTransport is used.5 boolean useFramedTransport =conf.getBoolVar(ConfVars.METASTORE_USE_THRIFT_FRAMED_TRANSPORT);

//If true, the metastore Thrift interface will use TCompactProtocol. When false (default) TBinaryProtocol will be used 具体他们之间的区别我们后续再讨论6 boolean useCompactProtocol =conf.getBoolVar(ConfVars.METASTORE_USE_THRIFT_COMPACT_PROTOCOL);

//获取socket timeout时间7 int clientSocketTimeout = (int) conf.getTimeVar(8 ConfVars.METASTORE_CLIENT_SOCKET_TIMEOUT, TimeUnit.MILLISECONDS);9

10 for (int attempt = 0; !isConnected && attempt < retries; ++attempt) {11 for(URI store : metastoreUris) {12 LOG.info("Trying to connect to metastore with URI " +store);13 try{14 transport = newTSocket(store.getHost(), store.getPort(), clientSocketTimeout);15 if(useSasl) {16 //Wrap thrift connection with SASL for secure connection.

17 try{

//创建HadoopThriftAuthBridge client18 HadoopThriftAuthBridge.Client authBridge =

19 ShimLoader.getHadoopThriftAuthBridge().createClient();20          //权限认证相关

21 //check if we should use delegation tokens to authenticate22 //the call below gets hold of the tokens if they are set up by hadoop23 //this should happen on the map/reduce tasks if the client added the24 //tokens into hadoop's credential store in the front end during job25 //submission.

26 String tokenSig = conf.get("hive.metastore.token.signature");27 //tokenSig could be null

28 tokenStrForm =Utils.getTokenStrForm(tokenSig);29 if(tokenStrForm != null) {30 //authenticate using delegation tokens via the "DIGEST" mechanism

31 transport = authBridge.createClientTransport(null, store.getHost(),32 "DIGEST", tokenStrForm, transport,33 MetaStoreUtils.getMetaStoreSaslProperties(conf));34 } else{35 String principalConfig =

36 conf.getVar(HiveConf.ConfVars.METASTORE_KERBEROS_PRINCIPAL);37 transport =authBridge.createClientTransport(38 principalConfig, store.getHost(), "KERBEROS", null,39 transport, MetaStoreUtils.getMetaStoreSaslProperties(conf));40 }41 } catch(IOException ioe) {42 LOG.error("Couldn't create client transport", ioe);43 throw newMetaException(ioe.toString());44 }45 } else if(useFramedTransport) {46 transport = newTFramedTransport(transport);47 }48 finalTProtocol protocol;

//后续详细说明两者的区别(因为俺还没看,哈哈)49 if(useCompactProtocol) {50 protocol = newTCompactProtocol(transport);51 } else{52 protocol = newTBinaryProtocol(transport);53 }

//创建ThriftHiveMetastore client54 client = newThriftHiveMetastore.Client(protocol);55 try{56 transport.open();57 isConnected = true;58 } catch(TTransportException e) {59 tte =e;60 if(LOG.isDebugEnabled()) {61 LOG.warn("Failed to connect to the MetaStore Server...", e);62 } else{63 //Don't print full exception trace if DEBUG is not on.

64 LOG.warn("Failed to connect to the MetaStore Server...");65 }66 }67       //用户组及用户的加载

68 if (isConnected && !useSasl &&conf.getBoolVar(ConfVars.METASTORE_EXECUTE_SET_UGI)){69 //Call set_ugi, only in unsecure mode.

70 try{71 UserGroupInformation ugi =Utils.getUGI();72 client.set_ugi(ugi.getUserName(), Arrays.asList(ugi.getGroupNames()));73 } catch(LoginException e) {74 LOG.warn("Failed to do login. set_ugi() is not successful, " +

75 "Continuing without it.", e);76 } catch(IOException e) {77 LOG.warn("Failed to find ugi of client set_ugi() is not successful, " +

78 "Continuing without it.", e);79 } catch(TException e) {80 LOG.warn("set_ugi() not successful, Likely cause: new client talking to old server. "

81 + "Continuing without it.", e);82 }83 }84 } catch(MetaException e) {85 LOG.error("Unable to connect to metastore with URI " +store86 + " in attempt " +attempt, e);87 }88 if(isConnected) {89 break;90 }91 }92 //Wait before launching the next round of connection retries.

93 if (!isConnected && retryDelaySeconds > 0) {94 try{95 LOG.info("Waiting " + retryDelaySeconds + " seconds before next connection attempt.");96 Thread.sleep(retryDelaySeconds * 1000);97 } catch(InterruptedException ignore) {}98 }99 }100

101 if (!isConnected) {102 throw new MetaException("Could not connect to meta store using any of the URIs provided." +

103 " Most recent failure: " +StringUtils.stringifyException(tte));104 }105

106 snapshotActiveConf();107

108 LOG.info("Connected to metastore.");109 }

本篇先对对protocol的原理放置一边。从代码中可以看出HiveMetaStore服务端是通过ThriftHiveMetaStore创建,它本是一个class类,但其中定义了接口Iface、AsyncIface,这样做的好处是利于继承实现。那么下来,我们看一下HMSHandler的初始化。如果是在本地调用的过程中,直接调用newRetryingHMSHandler,便会直接进行HMSHandler的初始化。代码如下:

1 public HMSHandler(String name, HiveConf conf, boolean init) throwsMetaException {2 super(name);3 hiveConf =conf;4 if(init) {5 init();6 }7 }

下俩我们继续看它的init方法:

1 public void init() throwsMetaException {

//获取与数据交互的实现类className,该类为objectStore,是RawStore的实现,负责JDO与数据库的交互。2 rawStoreClassName =hiveConf.getVar(HiveConf.ConfVars.METASTORE_RAW_STORE_IMPL);

//加载Listeners,来自hive.metastore.init.hooks,可自行实现并加载3 initListeners =MetaStoreUtils.getMetaStoreListeners(4 MetaStoreInitListener.class, hiveConf,5 hiveConf.getVar(HiveConf.ConfVars.METASTORE_INIT_HOOKS));6 for(MetaStoreInitListener singleInitListener: initListeners) {7 MetaStoreInitContext context = newMetaStoreInitContext();8 singleInitListener.onInit(context);9 }10     //初始化alter的实现类

11 String alterHandlerName = hiveConf.get("hive.metastore.alter.impl",12 HiveAlterHandler.class.getName());13 alterHandler =(AlterHandler) ReflectionUtils.newInstance(MetaStoreUtils.getClass(14 alterHandlerName), hiveConf);

//初始化warehouse15 wh = newWarehouse(hiveConf);16     //创建默认db以及用户,同时加载currentUrl

17 synchronized (HMSHandler.class) {18 if (currentUrl == null || !currentUrl.equals(MetaStoreInit.getConnectionURL(hiveConf))) {19 createDefaultDB();20 createDefaultRoles();21 addAdminUsers();22 currentUrl =MetaStoreInit.getConnectionURL(hiveConf);23 }24 }25     //计数信息的初始化

26 if (hiveConf.getBoolean("hive.metastore.metrics.enabled", false)) {27 try{28 Metrics.init();29 } catch(Exception e) {30 //log exception, but ignore inability to start

31 LOG.error("error in Metrics init: " + e.getClass().getName() + " "

32 +e.getMessage(), e);33 }34 }35     //Listener的PreList

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值