Hive metastore(1)

Hive metastore整体代码分析及详解

metadata的目录结构:
在这里插入图片描述
  整个hivemeta的目录包含metastore(客户端与服务端调用逻辑)、events(事件目录包含table生命周期中的检查、权限认证等listener实现)、hooks(这里的hooks仅包含了jdo connection的相关接口)、parser(对于表达树的解析)、spec(partition的相关代理类)、tools(jdo execute相关方法)及txn及model。
  下来从整个metadata分类逐一进行代码分析及注释。从Hive这个大类开始看,因为它是metastore元数据调用的入口。整个生命周期分析流程为: HiveMetaStoreClient客户端的创建及加载、HiveMetaStore服务端的创建及加载、createTable、dropTable、AlterTable、createPartition、dropPartition、alterPartition。当然,这只是完整metadata的一小部分。

1、HiveMetaStoreClient客户端的创建及加载
从Hive这个类开始看起:

  private HiveConf conf = null;
  private IMetaStoreClient metaStoreClient;
  private UserGroupInformation owner;

  // metastore calls timing information
  private final Map<String, Long> metaCallTimeMap = new HashMap<String, Long>();
  private static ThreadLocal<Hive> hiveDB = new ThreadLocal<Hive>() {
   
    @Override
    protected synchronized Hive initialValue() {
   
      return null;
    }
    @Override
    public synchronized void remove() {
   
      if (this.get() != null) {
   
        this.get().close();
      }
      super.remove();
    }
  };

  这里声明的有hiveConf对象、metaStoreClient 、操作用户组userGroupInfomation以及调用时间Map,这里存成一个map,用来记录每一个动作的运行时长。同时维护了一个本地线程hiveDB,如果db为空的情况下,会重新创建一个Hive对象,代码如下:

public static Hive get(HiveConf c, boolean needsRefresh) throws HiveException {
   
    Hive db = hiveDB.get();
    if (db == null || needsRefresh || !db.isCurrentUserOwner()) {
   
      if (db != null) {
   
        LOG.debug("Creating new db. db = " + db + ", needsRefresh = " + needsRefresh +
          ", db.isCurrentUserOwner = " + db.isCurrentUserOwner());
      }
      closeCurrent();
      c.set("fs.scheme.class", "dfs");
      Hive newdb = new Hive(c);
      hiveDB.set(newdb);
      return newdb;
    }
    db.conf = c;
    return db;
  }

  随后我们会发现,在创建Hive对象时,便已经将function进行注册,什么是function呢,通过上次的表结构分析,可以理解为所有udf等jar包的元数据存储。代码如下:

// register all permanent functions. need improvement
  static {
   
    try {
   
      reloadFunctions();
    } catch (Exception e) {
   
      LOG.warn("Failed to access metastore. This class should not accessed in runtime.",e);
    }
  }

  public static void reloadFunctions() throws HiveException {
       //获取 Hive对象,用于后续方法的调用
    Hive db = Hive.get();    //通过遍历每一个dbName
    for (String dbName : db.getAllDatabases()) {
         //通过dbName查询挂在该db下的所有function的信息。
      for (String functionName : db.getFunctions(dbName, "*")) {
   
        Function function = db.getFunction(dbName, functionName);
        try {
        //这里的register便是将查询到的function的数据注册到Registry类中的一个Map<String,FunctionInfo>中,以便计算引擎在调用时,不必再次查询数据库。
      FunctionRegistry.registerPermanentFunction(
          FunctionUtils.qualifyFunctionName(functionName, dbName), function.getClassName(),
          false, FunctionTask.toFunctionResource(function.getResourceUris()));
        } catch (Exception e) {
   
          LOG.warn("Failed to register persistent function " +
              functionName + ":" + function.getClassName() + ". Ignore and continue.");
        }
      }
    }
  }

调用getMSC()方法,进行metadataClient客户端的创建,代码如下:

   private IMetaStoreClient createMetaStoreClient() throws MetaException {
   
    //这里实现接口HiveMetaHookLoader
     HiveMetaHookLoader hookLoader = new HiveMetaHookLoader() {
   
         @Override
         public HiveMetaHook getHook(org.apache.hadoop.hive.metastore.api.Table tbl)throws MetaException {
   
           try {
   
             if (tbl == null) {
   
               return null;
             }
         //根据tble的kv属性加载不同storage的实例,比如hbase、redis等等拓展存储,作为外部表进行存储
             HiveStorageHandler storageHandler =HiveUtils.getStorageHandler(conf,tbl.getParameters().get(META_TABLE_STORAGE));
             if (storageHandler == null) {
   
               return null;
             }
             return storageHandler.getMetaHook();
           } catch (HiveException ex) {
   
             LOG.error(StringUtils.stringifyException(ex));
             throw new MetaException(
               "Failed to load storage handler:  " + ex.getMessage());
           }
         }
       };
     return RetryingMetaStoreClient.getProxy(conf, hookLoader, metaCallTimeMap,
         SessionHiveMetaStoreClient.class.getName());
   }

可以看到,创建MetaStoreClient中,创建了HiveMetaHook,这个Hook的作用在于,每次对meta进行操作的时候,比如createTable的时候,如果建表的存储方式不是文件,比如集成hbase,HiveMetaStoreClient会调用hook的接口方法preCreateTable,进行建表前的准备,用来判断外部表与内部表,如果中途有失败的话,依旧调用hook中的rollbackCreateTable进行回滚。
2、HiveMetaStore服务端的创建及加载

  public HiveMetaStoreClient(HiveConf conf, HiveMetaHookLoader hookLoader)throws MetaException {
   
      this.hookLoader = hookLoader;
      if (conf == null) {
   
        conf = new HiveConf(HiveMetaStoreClient.class);
      }
      this.conf = conf;
      filterHook = loadFilterHooks();
    //根据hive-site.xml中的hive.metastore.uris配置,如果配置该参数,则认为是远程连接,否则为本地连接
      String msUri = conf.getVar(HiveConf.ConfVars.METASTOREURIS);
      localMetaStore = HiveConfUtil.isEmbeddedMetaStore(msUri);
      if (localMetaStore) {
   
      //本地连接直接连接HiveMetaStore
         client = HiveMetaStore.newRetryingHMSHandler("hive client", conf, true);
         isConnected = true;
         snapshotActiveConf();
         return;
      } 
      //获取配置中的重试次数及timeout时间
      retries = HiveConf.getIntVar(conf, HiveConf.ConfVars.METASTORETHRIFTCONNECTIONRETRIES);
      retryDelaySeconds = conf.getTimeVar(
          ConfVars.METASTORE_CLIENT_CONNECT_RETRY_DELAY, TimeUnit.SECONDS); 
      //拼接metastore uri
      if (conf.getVar(HiveConf.ConfVars.METASTOREURIS) != null) {
   
         String metastoreUrisString[] = conf.getVar(HiveConf.ConfVars.METASTOREURIS).split(",");
         metastoreUris = new URI[metastoreUrisString.length];
         try {
   
           int i = 0;
           for (String s : metastoreUrisString) {
   
             URI tmpUri = new URI(s);
             if (tmpUri.getScheme() == null) {
   
               throw new IllegalArgumentException("URI: " + s+ " does not have a scheme");
             }
             metastoreUris[i++] = tmpUri;
          }
        } catch (IllegalArgumentException e) {
   
           throw (e);
        } catch (Exception e) {
   
           MetaStoreUtils.logAndThrowMetaException(e);
        }
      } else {
   
         LOG.error("NOT getting uris from conf");
         throw new MetaException("MetaStoreURIs not found in conf file");
      }
      调用open方法创建连接
      open();
   }

  从上面代码中可以看出,如果我们是远程连接,需要配置hive-site.xml中的hive.metastore.uri。如果client与server不在同一台机器,就需要配置进行远程连接。那么继续往下面看,创建连接的open方法:

private void open() throws MetaException {
   
    isConnected = false;
    TTransportException tte = null;     //是否使用Sasl
    boolean useSasl = conf.getBoolVar(ConfVars.METASTORE_USE_THRIFT_SASL);     //If true, the metastore Thrift interface will use TFramedTransport. When false (default) a standard TTransport is used.
    boolean useFramedTransport = conf.getBoolVar(ConfVars.METASTORE_USE_THRIFT_FRAMED_TRANSPORT);     //If true, the metastore Thrift interface will use TCompactProtocol. When false (default) TBinaryProtocol will be used.
    boolean useCompactProtocol = conf.getBoolVar(ConfVars.METASTORE_USE_THRIFT_COMPACT_PROTOCOL);     //获取socket timeout时间
    int clientSocketTimeout = (int) conf.getTimeVar(ConfVars.METASTORE_CLIENT_SOCKET_TIMEOUT, TimeUnit.MILLISECONDS);
    for (int attempt = 0; !isConnected && attempt < retries; ++attempt) {
   
      for (URI store : metastoreUris) {
   
        LOG.info("Trying to connect to metastore with URI " + store);
        try {
   
          transport = new TSocket(store.getHost(), store.getPort(), clientSocketTimeout);
          if (useSasl) {
   
            // Wrap thrift connection with SASL for secure connection.
            try {
               //创建HadoopThriftAuthBridge client
              HadoopThriftAuthBridge.Client authBridge =ShimLoader.getHadoopThriftAuthBridge().createClient();
        //权限认证相关
              // check if we should use delegation tokens to authenticate
              // the call below gets hold of the tokens if they are set up by hadoop
              // this should happen on the map/reduce tasks if the client added the
              // tokens into hadoop's credential store in the front end during job
              // submission.
              String tokenSig = conf.get("hive.metastore.token.signature");
              // tokenSig could be null
              tokenStrForm = Utils.getTokenStrForm(tokenSig);
              if(tokenStrForm != null) {
   
                // authenticate using delegation tokens via the "DIGEST" mechanism
                transport = authBridge.createClientTransport(null, store.getHost(),"DIGEST", tokenStrForm, transport,MetaStoreUtils.getMetaStoreSaslProperties(conf));
              } else {
   
                String principalConfig =conf.getVar(HiveConf.ConfVars.METASTORE_KERBEROS_PRINCIPAL);
                transport = authBridge.createClientTransport(principalConfig, store.getHost(), "KERBEROS", null,transport,MetaStoreUtils.getMetaStoreSaslProperties(conf));
              }
            } catch (IOException ioe) {
   
              LOG.error("Couldn't create client transport", ioe);
              throw new MetaException(ioe.toString());
            }
          } else if (useFramedTransport) {
   
            transport = new TFramedTransport(transport);
          }
          final TProtocol protocol;
          if (useCompactProtocol) {
   
            protocol = new TCompactProtocol(transport);
          } else {
   
            protocol = new TBinaryProtocol(transport);
          }         //创建ThriftHiveMetastore client
          client = new ThriftHiveMetastore.Client(protocol);
          try {
   
            transport.open();
            isConnected = true;
          } catch (TTransportException e) {
   
            tte = e;
            if (LOG.isDebugEnabled()) {
   
              LOG.warn("Failed to connect to the MetaStore Server...", e);
            } else {
   
              // Don't print full exception trace if DEBUG is not on.
              LOG.warn("Failed to connect to the MetaStore Server...");
            }
          }
      //用户组及用户的加载
          if (isConnected && !useSasl && conf.getBoolVar(ConfVars.METASTORE_EXECUTE_SET_UGI)){
   
            // Call set_ugi, only in unsecure mode.
            try {
   
              UserGroupInformation ugi = Utils.getUGI();
              client.set_ugi(ugi.getUserName(), Arrays.asList(ugi.getGroupNames()));
            } catch (LoginException e) {
   
              LOG.warn("Failed to do login. set_ugi() is not successful, " +
                       "Continuing without it.", e);
            } catch (IOException e) {
   
              LOG.warn("Failed to find ugi of client set_ugi() is not successful, " +
                  "Continuing without it.", e);
            } catch (TException e) {
   
              LOG.warn("set_ugi() not successful, Likely cause: new client talking to old server. "
                  + "Continuing without it.", e);
            }
          }
        } catch (MetaException e) {
   
          LOG.error("Unable to connect to metastore with URI " + store
                    + " in attempt " + attempt, e);
        }
        if (isConnected) {
   
          break;
        }
      }
      // Wait before launching the next round of connection retries.
      if (!isConnected && retryDelaySeconds > 0) {
   
        try {
   
          LOG.info("Waiting " + retryDelaySeconds + " seconds before next connection attempt.");
          Thread.sleep(retryDelaySeconds * 1000);
        } catch (InterruptedException ignore) {
   }
      }
    }
    if (!isConnected) {
   
      throw new MetaException("Could not connect to meta store using any of the URIs provided." +
        " Most recent failure: " + StringUtils.stringifyException(tte));
    }
    snapshotActiveConf();
    LOG.info("Connected to metastore.");
  }

  从代码中可以看出HiveMetaStore服务端是通过ThriftHiveMetaStore创建,它本是一个class类,但其中定义了接口Iface、AsyncIface,这样做的好处是利于继承实现。那么下来,我们看一下HMSHandler的初始化。如果是在本地调用的过程中,直接调用newRetryingHMSHandler,便会直接进行HMSHandler的初始化。代码如下:

public HMSHandler(String name, HiveConf conf, boolean init) throws MetaException {
   
      super(name);
      hiveConf = conf;
      if (init) {
   
        init();
      }
    }

下俩我们继续看它的init方法:

public void init() throws MetaException {
         //获取与数据交互的实现类className,该类为objectStore,是RawStore的实现,负责JDO与数据库的交互。
      rawStoreClassName = hiveConf.getVar(HiveConf.ConfVars.METASTORE_RAW_STORE_IMPL);      //加载Listeners,来自hive.metastore.init.hooks,可自行实现并加载
      initListeners = MetaStoreUtils.getMetaStoreListeners(MetaStoreInitListener.class, hiveConf,hiveConf.getVar(HiveConf.ConfVars.METASTORE_INIT_HOOKS));
      for (MetaStoreInitListener singleInitListener: initListeners) {
   
          MetaStoreInitContext context = new MetaStoreInitContext();
          singleInitListener.onInit(context);
      }
    //初始化alter的实现类
      String alterHandlerName = hiveConf.get("hive.metastore.alter.impl",HiveAlterHandler.class.getName());
      alterHandler = (AlterHandler) ReflectionUtils.newInstance(MetaStoreUtils.getClass(alterHandlerName), hiveConf);//初始化warehouse
      wh = new Warehouse(hiveConf);
    //创建默认db以及用户,同时加载currentUrl
      synchronized (HMSHandler.class) {
   
        if (currentUrl == null || !currentUrl.equals(MetaStoreInit.getConnectionURL(hiveConf))) {
   
          createDefaultDB();
          createDefaultRoles();
          addAdminUsers();
          currentUrl = MetaStoreInit.getConnectionURL(hiveConf);
        }
      }
    //计数信息的初始化
      if (hiveConf.getBoolean("hive.metastore.metrics.enabled", false)) {
   
        try {
   
          Metrics.init();
        } catch (Exception e) {
   
          // log exception, but ignore inability to start
          LOG.error("error in Metrics init: " + e.getClass().getName() + " "+ e.getMessage(), e);
        }
      }
    //Listener的PreListener的初始化
      preListeners = MetaStoreUtils.getMetaStoreListeners(MetaStorePreEventListener.class,hiveConf,hiveConf.getVar(HiveConf.ConfVars.METASTORE_PRE_EVENT_LISTENERS));
      listeners = MetaStoreUtils.getMetaStoreListeners(MetaStoreEventListener.class, hiveConf,hiveConf.getVar(HiveConf.ConfVars.METASTORE_EVENT_LISTENERS));
      listeners.add(new SessionPropertiesListener(hiveConf));
      endFunctionListeners = MetaStoreUtils.getMetaStoreListeners(
          MetaStoreEndFunctionListener.class, hiveConf,
          hiveConf.getVar(HiveConf.ConfVars.METASTORE_END_FUNCTION_LISTENERS));
    //针对partitionName的正则校验,可自行设置,根据hive.metastore.partition.name.whitelist.pattern进行设置
      String partitionValidationRegex =hiveConf.getVar(HiveConf.ConfVars.METASTORE_PARTITION_NAME_WHITELIST_PATTERN);
      if (partitionValidationRegex != null && !partitionValidationRegex.isEmpty()) {
   
        partitionValidationPattern = Pattern.compile(partitionValidationRegex);
      } else {
   
        partitionValidationPattern = null;
      }
      long cleanFreq = hiveConf.getTimeVar(ConfVars.METASTORE_EVENT_CLEAN_FREQ, TimeUnit.MILLISECONDS);
      if (cleanFreq > 0) {
   
        // In default config, there is no timer.
        Timer cleaner = new Timer("Metastore Events Cleaner Thread", true);
        cleaner.schedule(new EventCleanerTask(this), cleanFreq, cleanFreq);
      }
    }

它初始化了与数据库交互的rawStore的实现类、物理操作的warehouse以及Event与Listener。从而通过接口调用相关meta生命周期方法进行表的操作。

3、createTable

public void createTable(String tableName, List<String> columns, List<String> partCols,
                          Class<? extends InputFormat> fileInputFormat,
                          Class<?> fileOutputFormat, int bucketCount, List<String> bucketCols,
                          Map<String, String> parameters) throws HiveException {
   
    if (columns == null) {
   
      throw 
  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值