Hive metastore整体代码分析及详解

      配置:我们的hive server2 目前配置有2个节点,每个节点上都有metastore和hiveeserver2 服务。 这就几天在排查hivemetastore 告警问题,由于hivemetastore 采用的是thrift 结构,对thrift 进行了了解,而本告警日志中timed out wait request for id 11202249. Server Stacktrace: java.util.concurrent.TimeoutException 这样错误,因此猜测可能是thrift的time继续排查,hive metastore 有hive.metastore.client.socket.timeout 这个参数,估计由于是凌晨负载高峰期导致,估计应该是这个参数太小导致的,由于是随机的,很难进行测试'。另外在处理过程中,我们有2台独立的metastore,但是发现第一台日志很多,而另外一台很少几乎没有日志,于是猜测好像2台hivesever2 都连接到第一台metastore 上,自己只是猜测,总想找到证据。由于这两个问题,自己找了一些资料,了解hive 一些源码,最好终于在这篇帖子的源码找到答案。cdh 版版本与开源的有些差别,Hive配置文件时自动生成的,默认会将hive metasore 的所有的节点自动增加到hive-site.xml 中hive.metastore.uris 参数中,hiveserver2 会将这个配置文件作为参数文件加载启动。在下面的HiveMetaStoreClient和open()方法找到了答案 。另外感觉网上朋友提供的详细资料。 

从上一篇对Hive metastore表结构的简要分析中,我再根据数据设计的实体对象,再进行整个代码结构的总结。那么我们先打开metadata的目录,其目录结构:

  可以看到,整个hivemeta的目录包含metastore(客户端与服务端调用逻辑)、events(事件目录包含table生命周期中的检查、权限认证等listener实现)、hooks(这里的hooks仅包含了jdo connection的相关接口)、parser(对于表达树的解析)、spec(partition的相关代理类)、tools(jdo execute相关方法)及txn及model,下来我们从整个metadata分逐一进行代码分析及注释:

  没有把包打开,很多类?是不是感觉害怕很想死?我也想死,咱们继续。。一开始,我们可能觉得一团乱麻烦躁,这是啥玩意儿啊这。。冷静下来,我们从Hive这个大类开始看,因为它是metastore元数据调用的入口。整个生命周期分析流程为: HiveMetaStoreClient客户端的创建及加载、HiveMetaStore服务端的创建及加载、createTable、dropTable、AlterTable、createPartition、dropPartition、alterPartition。当然,这只是完整metadata的一小部分。

  1、HiveMetaStoreClient客户端的创建及加载

  那么我们从Hive这个类一点点开始看:

 
  1. private HiveConf conf = null;
  2. private IMetaStoreClient metaStoreClient;
  3. private UserGroupInformation owner;
  4.  
  5. // metastore calls timing information
  6. private final Map<String, Long> metaCallTimeMap = new HashMap<String, Long>();
  7.  
  8. private static ThreadLocal<Hive> hiveDB = new ThreadLocal<Hive>() {
  9. @Override
  10. protected synchronized Hive initialValue() {
  11. return null;
  12. }
  13.  
  14. @Override
  15. public synchronized void remove() {
  16. if (this.get() != null) {
  17. this.get().close();
  18. }
  19. super.remove();
  20. }
  21. };

  这里声明的有hiveConf对象、metaStoreClient 、操作用户组userGroupInfomation以及调用时间Map,这里存成一个map,用来记录每一个动作的运行时长。同时维护了一个本地线程hiveDB,如果db为空的情况下,会重新创建一个Hive对象,代码如下:

 
  1. public static Hive get(HiveConf c, boolean needsRefresh) throws HiveException {
  2. Hive db = hiveDB.get();
  3. if (db == null || needsRefresh || !db.isCurrentUserOwner()) {
  4. if (db != null) {
  5. LOG.debug("Creating new db. db = " + db + ", needsRefresh = " + needsRefresh +
  6. ", db.isCurrentUserOwner = " + db.isCurrentUserOwner());
  7. }
  8. closeCurrent();
  9. c.set("fs.scheme.class", "dfs");
  10. Hive newdb = new Hive(c);
  11. hiveDB.set(newdb);
  12. return newdb;
  13. }
  14. db.conf = c;
  15. return db;
  16. }

  随后我们会发现,在创建Hive对象时,便已经将function进行注册,什么是function呢,通过上次的表结构分析,可以理解为所有udf等jar包的元数据存储。代码如下:

 
  1. // register all permanent functions. need improvement
  2. static {
  3. try {
  4. reloadFunctions();
  5. } catch (Exception e) {
  6. LOG.warn("Failed to access metastore. This class should not accessed in runtime.",e);
  7. }
  8. }
  9.  
  10. public static void reloadFunctions() throws HiveException {
        //获取 Hive对象,用于后续方法的调用
  11. Hive db = Hive.get();
        //通过遍历每一个dbName
  12. for (String dbName : db.getAllDatabases()) {
          //通过dbName查询挂在该db下的所有function的信息。
  13. for (String functionName : db.getFunctions(dbName, "*")) {
  14. Function function = db.getFunction(dbName, functionName);
  15. try {
         //这里的register便是将查询到的function的数据注册到Registry类中的一个Map<String,FunctionInfo>中,以便计算引擎在调用时,不必再次查询数据库。
  16. FunctionRegistry.registerPermanentFunction(
  17. FunctionUtils.qualifyFunctionName(functionName, dbName), function.getClassName(),
  18. false, FunctionTask.toFunctionResource(function.getResourceUris()));
  19. } catch (Exception e) {
  20. LOG.warn("Failed to register persistent function " +
  21. functionName + ":" + function.getClassName() + ". Ignore and continue.");
  22. }
  23. }
  24. }
  25. }

  调用getMSC()方法,进行metadataClient客户端的创建,代码如下:

 
  1. 1 private IMetaStoreClient createMetaStoreClient() throws MetaException {
  2. 2   
  3.     //这里实现接口HiveMetaHookLoader
  4. 3 HiveMetaHookLoader hookLoader = new HiveMetaHookLoader() {
  5. 4 @Override
  6. 5 public HiveMetaHook getHook(
  7. 6 org.apache.hadoop.hive.metastore.api.Table tbl)
  8. 7 throws MetaException {
  9. 8
  10. 9 try {
  11. 10 if (tbl == null) {
  12. 11 return null;
  13. 12 }
  14.          //根据tble的kv属性加载不同storage的实例,比如hbase、redis等等拓展存储,作为外部表进行存储
  15. 13 HiveStorageHandler storageHandler =
  16. 14 HiveUtils.getStorageHandler(conf,
  17. 15 tbl.getParameters().get(META_TABLE_STORAGE));
  18. 16 if (storageHandler == null) {
  19. 17 return null;
  20. 18 }
  21. 19 return storageHandler.getMetaHook();
  22. 20 } catch (HiveException ex) {
  23. 21 LOG.error(StringUtils.stringifyException(ex));
  24. 22 throw new MetaException(
  25. 23 "Failed to load storage handler: " + ex.getMessage());
  26. 24 }
  27. 25 }
  28. 26 };
  29. 27 return RetryingMetaStoreClient.getProxy(conf, hookLoader, metaCallTimeMap,
  30. 28 SessionHiveMetaStoreClient.class.getName());
  31. 29 }

  2、HiveMetaStore服务端的创建及加载

  在HiveMetaStoreClient初始化时,会初始化HiveMetaStore客户端,代码如下:

 
  1. public HiveMetaStoreClient(HiveConf conf, HiveMetaHookLoader hookLoader)
  2. throws MetaException {
  3.  
  4. this.hookLoader = hookLoader;
  5. if (conf == null) {
  6. conf = new HiveConf(HiveMetaStoreClient.class);
  7. }
  8. this.conf = conf;
  9. filterHook = loadFilterHooks();
  10.    //根据hive-site.xml中的hive.metastore.uris配置,如果配置该参数,则认为是远程连接,否则为本地连接
  11. String msUri = conf.getVar(HiveConf.ConfVars.METASTOREURIS);
  12. localMetaStore = HiveConfUtil.isEmbeddedMetaStore(msUri);
  13. if (localMetaStore) {
  14.       //本地连接直接连接HiveMetaStore
  15. client = HiveMetaStore.newRetryingHMSHandler("hive client", conf, true);
  16. isConnected = true;
  17. snapshotActiveConf();
  18. return;
  19. }
  20.  
  21. //获取配置中的重试次数及timeout时间
  22. retries = HiveConf.getIntVar(conf, HiveConf.ConfVars.METASTORETHRIFTCONNECTIONRETRIES);
  23. retryDelaySeconds = conf.getTimeVar(
  24. ConfVars.METASTORE_CLIENT_CONNECT_RETRY_DELAY, TimeUnit.SECONDS);
  25.  
  26. //拼接metastore uri
  27. if (conf.getVar(HiveConf.ConfVars.METASTOREURIS) != null) {
  28. String metastoreUrisString[] = conf.getVar(
  29. HiveConf.ConfVars.METASTOREURIS).split(",");
  30. metastoreUris = new URI[metastoreUrisString.length];
  31. try {
  32. int i = 0;
  33. for (String s : metastoreUrisString) {
  34. URI tmpUri = new URI(s);
  35. if (tmpUri.getScheme() == null) {
  36. throw new IllegalArgumentException("URI: " + s
  37. + " does not have a scheme");
  38. }
  39. metastoreUris[i++] = tmpUri;
  40.  
  41. }
  42. } catch (IllegalArgumentException e) {
  43. throw (e);
  44. } catch (Exception e) {
  45. MetaStoreUtils.logAndThrowMetaException(e);
  46. }
  47. } else {
  48. LOG.error("NOT getting uris from conf");
  49. throw new MetaException("MetaStoreURIs not found in conf file");
  50. }
  51. 调用open方法创建连接
  52. open();
  53. }

  从上面代码中可以看出,如果我们是远程连接,需要配置hive-site.xml中的hive.metastore.uri,是不是很熟悉?加入你的client与server不在同一台机器,就需要配置进行远程连接。那么我们继续往下面看,创建连接的open方法:

 
  1. private void open() throws MetaException {
  2. isConnected = false;
  3. TTransportException tte = null;
         //是否使用Sasl
  4. boolean useSasl = conf.getBoolVar(ConfVars.METASTORE_USE_THRIFT_SASL);
         //If true, the metastore Thrift interface will use TFramedTransport. When false (default) a standard TTransport is used.
  5. boolean useFramedTransport = conf.getBoolVar(ConfVars.METASTORE_USE_THRIFT_FRAMED_TRANSPORT);
         //If true, the metastore Thrift interface will use TCompactProtocol. When false (default) TBinaryProtocol will be used 具体他们之间的区别我们后续再讨论
  6. boolean useCompactProtocol = conf.getBoolVar(ConfVars.METASTORE_USE_THRIFT_COMPACT_PROTOCOL);
         //获取socket timeout时间
  7. int clientSocketTimeout = (int) conf.getTimeVar(
  8. ConfVars.METASTORE_CLIENT_SOCKET_TIMEOUT, TimeUnit.MILLISECONDS);
  9.  
  10. for (int attempt = 0; !isConnected && attempt < retries; ++attempt) {
  11. for (URI store : metastoreUris) {
  12. LOG.info("Trying to connect to metastore with URI " + store);
  13. try {
  14. transport = new TSocket(store.getHost(), store.getPort(), clientSocketTimeout);
  15. if (useSasl) {
  16. // Wrap thrift connection with SASL for secure connection.
  17. try {
               //创建HadoopThriftAuthBridge client
  18. HadoopThriftAuthBridge.Client authBridge =
  19. ShimLoader.getHadoopThriftAuthBridge().createClient();
  20.          //权限认证相关
  21. // check if we should use delegation tokens to authenticate
  22. // the call below gets hold of the tokens if they are set up by hadoop
  23. // this should happen on the map/reduce tasks if the client added the
  24. // tokens into hadoop's credential store in the front end during job
  25. // submission.
  26. String tokenSig = conf.get("hive.metastore.token.signature");
  27. // tokenSig could be null
  28. tokenStrForm = Utils.getTokenStrForm(tokenSig);
  29. if(tokenStrForm != null) {
  30. // authenticate using delegation tokens via the "DIGEST" mechanism
  31. transport = authBridge.createClientTransport(null, store.getHost(),
  32. "DIGEST", tokenStrForm, transport,
  33. MetaStoreUtils.getMetaStoreSaslProperties(conf));
  34. } else {
  35. String principalConfig =
  36. conf.getVar(HiveConf.ConfVars.METASTORE_KERBEROS_PRINCIPAL);
  37. transport = authBridge.createClientTransport(
  38. principalConfig, store.getHost(), "KERBEROS", null,
  39. transport, MetaStoreUtils.getMetaStoreSaslProperties(conf));
  40. }
  41. } catch (IOException ioe) {
  42. LOG.error("Couldn't create client transport", ioe);
  43. throw new MetaException(ioe.toString());
  44. }
  45. } else if (useFramedTransport) {
  46. transport = new TFramedTransport(transport);
  47. }
  48. final TProtocol protocol;
             //后续详细说明两者的区别(因为俺还没看,哈哈)
  49. if (useCompactProtocol) {
  50. protocol = new TCompactProtocol(transport);
  51. } else {
  52. protocol = new TBinaryProtocol(transport);
  53. }
             //创建ThriftHiveMetastore client
  54. client = new ThriftHiveMetastore.Client(protocol);
  55. try {
  56. transport.open();
  57. isConnected = true;
  58. } catch (TTransportException e) {
  59. tte = e;
  60. if (LOG.isDebugEnabled()) {
  61. LOG.warn("Failed to connect to the MetaStore Server...", e);
  62. } else {
  63. // Don't print full exception trace if DEBUG is not on.
  64. LOG.warn("Failed to connect to the MetaStore Server...");
  65. }
  66. }
  67.       //用户组及用户的加载
  68. if (isConnected && !useSasl && conf.getBoolVar(ConfVars.METASTORE_EXECUTE_SET_UGI)){
  69. // Call set_ugi, only in unsecure mode.
  70. try {
  71. UserGroupInformation ugi = Utils.getUGI();
  72. client.set_ugi(ugi.getUserName(), Arrays.asList(ugi.getGroupNames()));
  73. } catch (LoginException e) {
  74. LOG.warn("Failed to do login. set_ugi() is not successful, " +
  75. "Continuing without it.", e);
  76. } catch (IOException e) {
  77. LOG.warn("Failed to find ugi of client set_ugi() is not successful, " +
  78. "Continuing without it.", e);
  79. } catch (TException e) {
  80. LOG.warn("set_ugi() not successful, Likely cause: new client talking to old server. "
  81. + "Continuing without it.", e);
  82. }
  83. }
  84. } catch (MetaException e) {
  85. LOG.error("Unable to connect to metastore with URI " + store
  86. + " in attempt " + attempt, e);
  87. }
  88. if (isConnected) {
  89. break;
  90. }
  91. }
  92. // Wait before launching the next round of connection retries.
  93. if (!isConnected && retryDelaySeconds > 0) {
  94. try {
  95. LOG.info("Waiting " + retryDelaySeconds + " seconds before next connection attempt.");
  96. Thread.sleep(retryDelaySeconds * 1000);
  97. } catch (InterruptedException ignore) {}
  98. }
  99. }
  100.  
  101. if (!isConnected) {
  102. throw new MetaException("Could not connect to meta store using any of the URIs provided." +
  103. " Most recent failure: " + StringUtils.stringifyException(tte));
  104. }
  105.  
  106. snapshotActiveConf();
  107.  
  108. LOG.info("Connected to metastore.");
  109. }

  本篇先对对protocol的原理放置一边。从代码中可以看出HiveMetaStore服务端是通过ThriftHiveMetaStore创建,它本是一个class类,但其中定义了接口Iface、AsyncIface,这样做的好处是利于继承实现。那么下来,我们看一下HMSHandler的初始化。如果是在本地调用的过程中,直接调用newRetryingHMSHandler,便会直接进行HMSHandler的初始化。代码如下:

 
  1. public HMSHandler(String name, HiveConf conf, boolean init) throws MetaException {
  2. super(name);
  3. hiveConf = conf;
  4. if (init) {
  5. init();
  6. }
  7. }

  下俩我们继续看它的init方法:

 
  1. public void init() throws MetaException {
          //获取与数据交互的实现类className,该类为objectStore,是RawStore的实现,负责JDO与数据库的交互。
  2. rawStoreClassName = hiveConf.getVar(HiveConf.ConfVars.METASTORE_RAW_STORE_IMPL);
          //加载Listeners,来自hive.metastore.init.hooks,可自行实现并加载
  3. initListeners = MetaStoreUtils.getMetaStoreListeners(
  4. MetaStoreInitListener.class, hiveConf,
  5. hiveConf.getVar(HiveConf.ConfVars.METASTORE_INIT_HOOKS));
  6. for (MetaStoreInitListener singleInitListener: initListeners) {
  7. MetaStoreInitContext context = new MetaStoreInitContext();
  8. singleInitListener.onInit(context);
  9. }
  10.     //初始化alter的实现类
  11. String alterHandlerName = hiveConf.get("hive.metastore.alter.impl",
  12. HiveAlterHandler.class.getName());
  13. alterHandler = (AlterHandler) ReflectionUtils.newInstance(MetaStoreUtils.getClass(
  14. alterHandlerName), hiveConf);
          //初始化warehouse
  15. wh = new Warehouse(hiveConf);
  16.     //创建默认db以及用户,同时加载currentUrl
  17. synchronized (HMSHandler.class) {
  18. if (currentUrl == null || !currentUrl.equals(MetaStoreInit.getConnectionURL(hiveConf))) {
  19. createDefaultDB();
  20. createDefaultRoles();
  21. addAdminUsers();
  22. currentUrl = MetaStoreInit.getConnectionURL(hiveConf);
  23. }
  24. }
  25.     //计数信息的初始化
  26. if (hiveConf.getBoolean("hive.metastore.metrics.enabled", false)) {
  27. try {
  28. Metrics.init();
  29. } catch (Exception e) {
  30. // log exception, but ignore inability to start
  31. LOG.error("error in Metrics init: " + e.getClass().getName() + " "
  32. + e.getMessage(), e);
  33. }
  34. }
  35.     //Listener的PreListener的初始化
  36. preListeners = MetaStoreUtils.getMetaStoreListeners(MetaStorePreEventListener.class,
  37. hiveConf,
  38. hiveConf.getVar(HiveConf.ConfVars.METASTORE_PRE_EVENT_LISTENERS));
  39. listeners = MetaStoreUtils.getMetaStoreListeners(MetaStoreEventListener.class, hiveConf,
  40. hiveConf.getVar(HiveConf.ConfVars.METASTORE_EVENT_LISTENERS));
  41. listeners.add(new SessionPropertiesListener(hiveConf));
  42. endFunctionListeners = MetaStoreUtils.getMetaStoreListeners(
  43. MetaStoreEndFunctionListener.class, hiveConf,
  44. hiveConf.getVar(HiveConf.ConfVars.METASTORE_END_FUNCTION_LISTENERS));
  45.     //针对partitionName的正则校验,可自行设置,根据hive.metastore.partition.name.whitelist.pattern进行设置
  46. String partitionValidationRegex =
  47. hiveConf.getVar(HiveConf.ConfVars.METASTORE_PARTITION_NAME_WHITELIST_PATTERN);
  48. if (partitionValidationRegex != null && !partitionValidationRegex.isEmpty()) {
  49. partitionValidationPattern = Pattern.compile(partitionValidationRegex);
  50. } else {
  51. partitionValidationPattern = null;
  52. }
  53.  
  54. long cleanFreq = hiveConf.getTimeVar(ConfVars.METASTORE_EVENT_CLEAN_FREQ, TimeUnit.MILLISECONDS);
  55. if (cleanFreq > 0) {
  56. // In default config, there is no timer.
  57. Timer cleaner = new Timer("Metastore Events Cleaner Thread", true);
  58. cleaner.schedule(new EventCleanerTask(this), cleanFreq, cleanFreq);
  59. }
  60. }

  它初始化了与数据库交互的rawStore的实现类、物理操作的warehouse以及Event与Listener。从而通过接口调用相关meta生命周期方法进行表的操作。

  3、createTable

  我们从createTable方法开始。那么上代码:

 
  1. public void createTable(String tableName, List<String> columns, List<String> partCols,
  2. Class<? extends InputFormat> fileInputFormat,
  3. Class<?> fileOutputFormat, int bucketCount, List<String> bucketCols,
  4. Map<String, String> parameters) throws HiveException {
  5. if (columns == null) {
  6. throw new HiveException("columns not specified for table " + tableName);
  7. }
  8.  
  9. Table tbl = newTable(tableName);
        //SD表属性,设置该表的input及output class名,在计算引擎计算时,拉取相应的ClassName 通过反射进行input及output类的加载
  10. tbl.setInputFormatClass(fileInputFormat.getName());
  11. tbl.setOutputFormatClass(fileOutputFormat.getName());
  12.   
        //封装FileSchema对象,该为每个column的名称及字段类型,并加入到sd对象的的column属性中
  13. for (String col : columns) {
  14. FieldSchema field = new FieldSchema(col, STRING_TYPE_NAME, "default");
  15. tbl.getCols().add(field);
  16. }

  17.     //如果在创建表时,设置了分区信息,比如dt字段为该分区。则进行分区信息的记录,最终写入Partition表中
  18. if (partCols != null) {
  19. for (String partCol : partCols) {
  20. FieldSchema part = new FieldSchema();
  21. part.setName(partCol);
  22. part.setType(STRING_TYPE_NAME); // default partition key
  23. tbl.getPartCols().add(part);
  24. }
  25. }
        //设置序列化的方式
  26. tbl.setSerializationLib(LazySimpleSerDe.class.getName());
        //设置分桶信息
  27. tbl.setNumBuckets(bucketCount);
  28. tbl.setBucketCols(bucketCols);
        //设置table额外添加的kv信息
  29. if (parameters != null) {
  30. tbl.setParamters(parameters);
  31. }
  32. createTable(tbl);
  33. }

  从代码中可以看到,Hive 构造了一个Table的对象,该对象可以当做是一个model,包含了几乎所有以Tbls表为主表的所有以table_id为的外键表属性(具体可参考hive metastore表结构),封装完毕后在进行createTable的调用,接下来的调用如下:

 
  1. public void createTable(Table tbl, boolean ifNotExists) throws HiveException {
  2. try {
        //这里再次获取SessionState中的CurrentDataBase进行setDbName(安全)
  3. if (tbl.getDbName() == null || "".equals(tbl.getDbName().trim())) {
  4. tbl.setDbName(SessionState.get().getCurrentDatabase());
  5. }
        //这里主要对每一个column属性进行校验,比如是否有非法字符等等
  6. if (tbl.getCols().size() == 0 || tbl.getSd().getColsSize() == 0) {
  7. tbl.setFields(MetaStoreUtils.getFieldsFromDeserializer(tbl.getTableName(),
  8. tbl.getDeserializer()));
  9. }
        //该方法对table属性中的input、output以及column属性的校验
  10. tbl.checkValidity();
  11. if (tbl.getParameters() != null) {
  12. tbl.getParameters().remove(hive_metastoreConstants.DDL_TIME);
  13. }
  14. org.apache.hadoop.hive.metastore.api.Table tTbl = tbl.getTTable();
        //这里开始进行权限认证,牵扯到的便是我们再hive中配置的 hive.security.authorization.createtable.user.grants、hive.security.authorization.createtable.group.grants、
        hive.security.authorization.createtable.role.grants配置参数,来自于hive自己封装的 用户、角色、组的概念。
  15. PrincipalPrivilegeSet principalPrivs = new PrincipalPrivilegeSet();
  16. SessionState ss = SessionState.get();
  17. if (ss != null) {
  18. CreateTableAutomaticGrant grants = ss.getCreateTableGrants();
  19. if (grants != null) {
  20. principalPrivs.setUserPrivileges(grants.getUserGrants());
  21. principalPrivs.setGroupPrivileges(grants.getGroupGrants());
  22. principalPrivs.setRolePrivileges(grants.getRoleGrants());
  23. tTbl.setPrivileges(principalPrivs);
  24. }
  25. }
       //通过客户端链接服务端进行table的创建
  26. getMSC().createTable(tTbl);
  27. } catch (AlreadyExistsException e) {
  28. if (!ifNotExists) {
  29. throw new HiveException(e);
  30. }
  31. } catch (Exception e) {
  32. throw new HiveException(e);
  33. }
  34. }

  那么下来,我们来看一下受到调用的HiveMetaClient中createTable方法,代码如下:

 
  1. public void createTable(Table tbl, EnvironmentContext envContext) throws AlreadyExistsException,
  2. InvalidObjectException, MetaException, NoSuchObjectException, TException {
        //这里获取HiveMeetaHook对象,针对不同的存储引擎进行创建前的加载及验证
  3. HiveMetaHook hook = getHook(tbl);
  4. if (hook != null) {
  5. hook.preCreateTable(tbl);
  6. }
  7. boolean success = false;
  8. try {
  9.      //随即调用HiveMetaStore进行服务端与数据库的创建交互
  10. create_table_with_environment_context(tbl, envContext);
  11. if (hook != null) {
  12. hook.commitCreateTable(tbl);
  13. }
  14. success = true;
  15. } finally {
          如果创建失败的话,进行回滚操作
  16. if (!success && (hook != null)) {
  17. hook.rollbackCreateTable(tbl);
  18. }
  19. }
  20. }

  这里简要说下Hook的作用,HiveMetaHook为接口,接口方法包括preCreate、rollbackCreateTable、preDropTable等等操作,它的实现为不同存储类型的预创建加载及验证,以及失败回滚等动作。代码如下:

 
  1. public interface HiveMetaHook {
  2. /**
  3. * Called before a new table definition is added to the metastore
  4. * during CREATE TABLE.
  5. *
  6. * @param table new table definition
  7. */
  8. public void preCreateTable(Table table)
  9. throws MetaException;
  10.  
  11. /**
  12. * Called after failure adding a new table definition to the metastore
  13. * during CREATE TABLE.
  14. *
  15. * @param table new table definition
  16. */
  17. public void rollbackCreateTable(Table table)
  18. throws MetaException;
  19. public void preDropTale(Table table)
  20. throws MetaException;
    ...............................

  随后,我们再看一下HiveMetaStore服务端的createTable方法,如下:

 
  1. 1  private void create_table_core(final RawStore ms, final Table tbl,
    2 final EnvironmentContext envContext)
  2. throws AlreadyExistsException, MetaException,
  3. InvalidObjectException, NoSuchObjectException {
  4.     //名称正则校验,校验是否含有非法字符
  5. if (!MetaStoreUtils.validateName(tbl.getTableName())) {
  6. throw new InvalidObjectException(tbl.getTableName()
  7. + " is not a valid object name");
  8. }
          //改端代码属于校验代码,对于column的名称及column type类型j及partitionKey的名称校验
  9. String validate = MetaStoreUtils.validateTblColumns(tbl.getSd().getCols());
  10. if (validate != null) {
  11. throw new InvalidObjectException("Invalid column " + validate);
  12. }
  13. if (tbl.getPartitionKeys() != null) {
  14. validate = MetaStoreUtils.validateTblColumns(tbl.getPartitionKeys());
  15. if (validate != null) {
  16. throw new InvalidObjectException("Invalid partition column " + validate);
  17. }
  18. }
  19. SkewedInfo skew = tbl.getSd().getSkewedInfo();
  20. if (skew != null) {
  21. validate = MetaStoreUtils.validateSkewedColNames(skew.getSkewedColNames());
  22. if (validate != null) {
  23. throw new InvalidObjectException("Invalid skew column " + validate);
  24. }
  25. validate = MetaStoreUtils.validateSkewedColNamesSubsetCol(
  26. skew.getSkewedColNames(), tbl.getSd().getCols());
  27. if (validate != null) {
  28. throw new InvalidObjectException("Invalid skew column " + validate);
  29. }
  30. }
  31.  
  32. Path tblPath = null;
  33. boolean success = false, madeDir = false;
  34. try {
           //创建前的事件调用,metastore已实现的listner事件包含DummyPreListener、AuthorizationPreEventListener、AlternateFailurePreListener以及MetaDataExportListener。
           //这些Listener是干嘛的呢?详细解释由分析meta设计模式时,详细说明。
  35. firePreEvent(new PreCreateTableEvent(tbl, this));

  36.        //打开事务
  37. ms.openTransaction();

  38.        //如果db不存在的情况下,则抛异常
  39. Database db = ms.getDatabase(tbl.getDbName());
  40. if (db == null) {
  41. throw new NoSuchObjectException("The database " + tbl.getDbName() + " does not exist");
  42. }
  43.     
  44.  // 校验该db下,table是否存在
  45. if (is_table_exists(ms, tbl.getDbName(), tbl.getTableName())) {
  46. throw new AlreadyExistsException("Table " + tbl.getTableName()
  47. + " already exists");
  48. }
  49.      // 如果该表不为视图表,则组装完整的tbleParth ->fs.getUri().getScheme()+fs.getUri().getAuthority()+path.toUri().getPath())
 
  1. if (!TableType.VIRTUAL_VIEW.toString().equals(tbl.getTableType())) {
  2. if (tbl.getSd().getLocation() == null
  3. || tbl.getSd().getLocation().isEmpty()) {
  4. tblPath = wh.getTablePath(
  5. ms.getDatabase(tbl.getDbName()), tbl.getTableName());
  6. } else {
              //如果该表不是内部表同时tbl的kv中storage_handler为空时,则只是警告
  7. if (!isExternal(tbl) && !MetaStoreUtils.isNonNativeTable(tbl)) {
  8. LOG.warn("Location: " + tbl.getSd().getLocation()
  9. + " specified for non-external table:" + tbl.getTableName());
  10. }
  11. tblPath = wh.getDnsPath(new Path(tbl.getSd().getLocation()));
  12. }
            //将拼接完的tblPath set到sd的location中
  13. tbl.getSd().setLocation(tblPath.toString());
  14. }
  15.      //创建table的路径
  16. if (tblPath != null) {
  17. if (!wh.isDir(tblPath)) {
  18. if (!wh.mkdirs(tblPath, true)) {
  19. throw new MetaException(tblPath
  20. + " is not a directory or unable to create one");
  21. }
  22. madeDir = true;
  23. }
  24. }
           // hive.stats.autogather 配置判断
  25. if (HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVESTATSAUTOGATHER) &&
  26. !MetaStoreUtils.isView(tbl)) {
  27. if (tbl.getPartitionKeysSize() == 0) { // Unpartitioned table
  28. MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, madeDir);
  29. } else { // Partitioned table with no partitions.
  30. MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true);
  31. }
  32. }
  33.  
  34. // set create time
  35. long time = System.currentTimeMillis() / 1000;
  36. tbl.setCreateTime((int) time);
  37. if (tbl.getParameters() == null ||
  38. tbl.getParameters().get(hive_metastoreConstants.DDL_TIME) == null) {
  39. tbl.putToParameters(hive_metastoreConstants.DDL_TIME, Long.toString(time));
  40. }
           执行createTable数据库操作
  41. ms.createTable(tbl);
  42. success = ms.commitTransaction();
  43.  
  44. } finally {
  45. if (!success) {
  46. ms.rollbackTransaction();
             //如果由于某些原因没有创建,则进行已创建表路径的删除
  47. if (madeDir) {
  48. wh.deleteDir(tblPath, true);
  49. }
  50. }
           //进行create完成时的listener类发送 比如 noftify通知
  51. for (MetaStoreEventListener listener : listeners) {
  52. CreateTableEvent createTableEvent =
  53. new CreateTableEvent(tbl, success, this);
  54. createTableEvent.setEnvironmentContext(envContext);
  55. listener.onCreateTable(createTableEvent);
  56. }
  57. }
  58. }

  这里的listener后续会详细说明,那么我们继续垂直往下看,这里的 ms.createTable方法。ms便是RawStore接口对象,这个接口对象包含了所有生命周期的统一方法调用,部分代码如下:

 
  1. public abstract Database getDatabase(String name)
  2. throws NoSuchObjectException;
  3.  
  4. public abstract boolean dropDatabase(String dbname) throws NoSuchObjectException, MetaException;
  5.  
  6. public abstract boolean alterDatabase(String dbname, Database db) throws NoSuchObjectException, MetaException;
  7.  
  8. public abstract List<String> getDatabases(String pattern) throws MetaException;
  9.  
  10. public abstract List<String> getAllDatabases() throws MetaException;
  11.  
  12. public abstract boolean createType(Type type);
  13.  
  14. public abstract Type getType(String typeName);
  15.  
  16. public abstract boolean dropType(String typeName);
  17.  
  18. public abstract void createTable(Table tbl) throws InvalidObjectException,
  19. MetaException;
  20.  
  21. public abstract boolean dropTable(String dbName, String tableName)
  22. throws MetaException, NoSuchObjectException, InvalidObjectException, InvalidInputException;
  23.  
  24. public abstract Table getTable(String dbName, String tableName)
  25. throws MetaException;
  26.   ..................

  那么下来我们来看一下具体怎么实现的,首先hive metastore会通过调用getMS()方法,获取本地线程中的RawStore的实现,代码如下:

 
  1. public RawStore getMS() throws MetaException {
         //获取本地线程中已存在的RawStore
  2. RawStore ms = threadLocalMS.get();
         //如果不存在,则创建该对象的实现,并加入到本地线程中
  3. if (ms == null) {
  4. ms = newRawStore();
  5. ms.verifySchema();
  6. threadLocalMS.set(ms);
  7. ms = threadLocalMS.get();
  8. }
  9. return ms;
  10. }
  11.  

  看到这里,是不是很想看看newRawStore它干嘛啦?那么我们继续:

 
  1. public static RawStore getProxy(HiveConf hiveConf, Configuration conf, String rawStoreClassName,
  2. int id) throws MetaException {
  3.   //通过反射,创建baseClass,随后再进行该实现对象的创建
  4. Class<? extends RawStore> baseClass = (Class<? extends RawStore>) MetaStoreUtils.getClass(
  5. rawStoreClassName);
  6.  
  7. RawStoreProxy handler = new RawStoreProxy(hiveConf, conf, baseClass, id);
  8.  
  9. // Look for interfaces on both the class and all base classes.
  10. return (RawStore) Proxy.newProxyInstance(RawStoreProxy.class.getClassLoader(),
  11. getAllInterfaces(baseClass), handler);
  12. }

  那么问题来了,rawstoreClassName从哪里来呢?它是在HiveMetaStore进行初始化时加载的,来源于HiveConf中的METASTORE_RAW_STORE_IMPL,配置参数,也就是RawStore的实现类ObjectStore。好了,既然RawStore的实现类已经创建,那么我们继续深入ObjectStore,代码如下:

  

 
  1. @Override
  2. public void createTable(Table tbl) throws InvalidObjectException, MetaException {
  3. boolean commited = false;
  4. try {
         //创建事务
  5. openTransaction();
          //这里再次进行db 、table的校验,代码不再贴出来,具体为什么又要做一次校验,还需要深入思考
  6. MTable mtbl = convertToMTable(tbl);
         这里的pm为ObjectStore创建时,init的JDO PersistenceManage对象。这里便是提交Table对象的地方,具体可研究下JDO module对象与数据库的交互
  7. pm.makePersistent(mtbl);
         //封装权限用户、角色、组对象并写入
  8. PrincipalPrivilegeSet principalPrivs = tbl.getPrivileges();
  9. List<Object> toPersistPrivObjs = new ArrayList<Object>();
  10. if (principalPrivs != null) {
  11. int now = (int)(System.currentTimeMillis()/1000);
  12.  
  13. Map<String, List<PrivilegeGrantInfo>> userPrivs = principalPrivs.getUserPrivileges();
  14. putPersistentPrivObjects(mtbl, toPersistPrivObjs, now, userPrivs, PrincipalType.USER);
  15.  
  16. Map<String, List<PrivilegeGrantInfo>> groupPrivs = principalPrivs.getGroupPrivileges();
  17. putPersistentPrivObjects(mtbl, toPersistPrivObjs, now, groupPrivs, PrincipalType.GROUP);
  18.  
  19. Map<String, List<PrivilegeGrantInfo>> rolePrivs = principalPrivs.getRolePrivileges();
  20. putPersistentPrivObjects(mtbl, toPersistPrivObjs, now, rolePrivs, PrincipalType.ROLE);
  21. }
  22. pm.makePersistentAll(toPersistPrivObjs);
  23. commited = commitTransaction();
  24. } finally {
          //如果失败则回滚
  25. if (!commited) {
  26. rollbackTransaction();
  27. }
  28. }
  29. }

  总结:

  4、dropTable

  二话不说上从Hive类中上代码:

 
  1. public void dropTable(String tableName, boolean ifPurge) throws HiveException {
        //这里Hive 将dbName与TableName合并成一个数组
  2. String[] names = Utilities.getDbTableName(tableName);
  3. dropTable(names[0], names[1], true, true, ifPurge);
  4. }

  为什么要进行这样的处理呢,其实是因为 drop table的时候 我们的sql语句会是drop table dbName.tableName 或者是drop table tableName,这里进行tableName和DbName的组装,如果为drop table tableName,则获取当前session中的dbName,代码如下:

 
  1. public static String[] getDbTableName(String dbtable) throws SemanticException {
        //获取当前Session中的DbName
  2. return getDbTableName(SessionState.get().getCurrentDatabase(), dbtable);
  3. }
  4.  
  5. public static String[] getDbTableName(String defaultDb, String dbtable) throws SemanticException {
  6. if (dbtable == null) {
  7. return new String[2];
  8. }
  9. String[] names = dbtable.split("\\.");
  10. switch (names.length) {
  11. case 2:
  12. return names;
         //如果长度为1,则重新组装
  13. case 1:
  14. return new String [] {defaultDb, dbtable};
  15. default:
  16. throw new SemanticException(ErrorMsg.INVALID_TABLE_NAME, dbtable);
  17. }
  18. }

  随后通过getMSC()调用HiveMetaStoreClient中的dropTable,代码如下:

 
  1. public void dropTable(String dbname, String name, boolean deleteData,
  2. boolean ignoreUnknownTab, EnvironmentContext envContext) throws MetaException, TException,
  3. NoSuchObjectException, UnsupportedOperationException {
  4. Table tbl;
  5. try {
         //通过dbName与tableName获取正个Table对象,也就是通过dbName与TableName获取该Table存储的所有元数据
  6. tbl = getTable(dbname, name);
  7. } catch (NoSuchObjectException e) {
  8. if (!ignoreUnknownTab) {
  9. throw e;
  10. }
  11. return;
  12. }
        //根据table type来判断是否为IndexTable,如果为索引表则不允许删除  
  13. if (isIndexTable(tbl)) {
  14. throw new UnsupportedOperationException("Cannot drop index tables");
  15. }
        //这里的getHook 与create时getHook一致,获取对应table存储的hook
  16. HiveMetaHook hook = getHook(tbl);
  17. if (hook != null) {
  18. hook.preDropTable(tbl);
  19. }
  20. boolean success = false;
  21. try {
          调用HiveMetaStore服务端的dropTable方法
  22. drop_table_with_environment_context(dbname, name, deleteData, envContext);
  23. if (hook != null) {
  24. hook.commitDropTable(tbl, deleteData);
  25. }
  26. success=true;
  27. } catch (NoSuchObjectException e) {
  28. if (!ignoreUnknownTab) {
  29. throw e;
  30. }
  31. } finally {
  32. if (!success && (hook != null)) {
  33. hook.rollbackDropTable(tbl);
  34. }
  35. }
  36. }

  下面我们重点看下服务端HiveMetaStore干了些什么,代码如下:

 
  1. private boolean drop_table_core(final RawStore ms, final String dbname, final String name,
  2. final boolean deleteData, final EnvironmentContext envContext,
  3. final String indexName) throws NoSuchObjectException,
  4. MetaException, IOException, InvalidObjectException, InvalidInputException {
  5. boolean success = false;
  6. boolean isExternal = false;
  7. Path tblPath = null;
  8. List<Path> partPaths = null;
  9. Table tbl = null;
  10. boolean ifPurge = false;
  11. try {
  12. ms.openTransaction();
  13. // 获取正个Table的对象属性
  14. tbl = get_table_core(dbname, name);
  15. if (tbl == null) {
  16. throw new NoSuchObjectException(name + " doesn't exist");
  17. }
           //如果sd数据为空,则认为该表数据损坏
  18. if (tbl.getSd() == null) {
  19. throw new MetaException("Table metadata is corrupted");
  20. }
  21. ifPurge = isMustPurge(envContext, tbl);
  22.  
  23. firePreEvent(new PreDropTableEvent(tbl, deleteData, this));
  24.        //判断如果该表存在索引,则需要先删除该表的索引
  25. boolean isIndexTable = isIndexTable(tbl);
  26. if (indexName == null && isIndexTable) {
  27. throw new RuntimeException(
  28. "The table " + name + " is an index table. Please do drop index instead.");
  29. }
  30.        //如果不是索引表,则删除索引元数据
  31. if (!isIndexTable) {
  32. try {
  33. List<Index> indexes = ms.getIndexes(dbname, name, Short.MAX_VALUE);
  34. while (indexes != null && indexes.size() > 0) {
  35. for (Index idx : indexes) {
  36. this.drop_index_by_name(dbname, name, idx.getIndexName(), true);
  37. }
  38. indexes = ms.getIndexes(dbname, name, Short.MAX_VALUE);
  39. }
  40. } catch (TException e) {
  41. throw new MetaException(e.getMessage());
  42. }
  43. }
           //判断是否为外部表
  44. isExternal = isExternal(tbl);
  45. if (tbl.getSd().getLocation() != null) {
  46. tblPath = new Path(tbl.getSd().getLocation());
  47. if (!wh.isWritable(tblPath.getParent())) {
  48. String target = indexName == null ? "Table" : "Index table";
  49. throw new MetaException(target + " metadata not deleted since " +
  50. tblPath.getParent() + " is not writable by " +
  51. hiveConf.getUser());
  52. }
  53. }
  54.  
  55. checkTrashPurgeCombination(tblPath, dbname + "." + name, ifPurge);
  56. //获取所有partition的location path 这里有个奇怪的地方,为什么不将Table对象直接传入,而是又在该方法中重新getTable,同时校验上级目录的读写权限
  57. partPaths = dropPartitionsAndGetLocations(ms, dbname, name, tblPath,
  58. tbl.getPartitionKeys(), deleteData && !isExternal);
  59.      //调用ObjectStore进行meta数据的删除
  60. if (!ms.dropTable(dbname, name)) {
  61. String tableName = dbname + "." + name;
  62. throw new MetaException(indexName == null ? "Unable to drop table " + tableName:
  63. "Unable to drop index table " + tableName + " for index " + indexName);
  64. }
  65. success = ms.commitTransaction();
  66. } finally {
  67. if (!success) {
  68. ms.rollbackTransaction();
  69. } else if (deleteData && !isExternal) {
  70.         //删除物理partition
  71. deletePartitionData(partPaths, ifPurge);
  72. //删除Table路径
  73. deleteTableData(tblPath, ifPurge);
  74. // ok even if the data is not deleted

  75.        //Listener 处理
  76. for (MetaStoreEventListener listener : listeners) {
  77. DropTableEvent dropTableEvent = new DropTableEvent(tbl, success, deleteData, this);
  78. dropTableEvent.setEnvironmentContext(envContext);
  79. listener.onDropTable(dropTableEvent);
  80. }
  81. }
  82. return success;
  83. }

  我们继续深入ObjectStore中的dropTable,会发现 再一次通过dbName与tableName获取整个Table对象,随后逐一删除。也许代码并不是同一个人写的也可能是由于安全性考虑?很多可以通过接口传入的Table对象,都重新获取了,这样会不会加重数据库的负担呢?ObjectStore代码如下:

 
  1. public boolean dropTable(String dbName, String tableName) throws MetaException,
  2. NoSuchObjectException, InvalidObjectException, InvalidInputException {
  3. boolean success = false;
  4. try {
  5. openTransaction();
          //重新获取Table对象
  6. MTable tbl = getMTable(dbName, tableName);
  7. pm.retrieve(tbl);
  8. if (tbl != null) {
  9. //下列代码查询并删除所有的权限
  10. List<MTablePrivilege> tabGrants = listAllTableGrants(dbName, tableName);
  11. if (tabGrants != null && tabGrants.size() > 0) {
  12. pm.deletePersistentAll(tabGrants);
  13. }
          
  14. List<MTableColumnPrivilege> tblColGrants = listTableAllColumnGrants(dbName,
  15. tableName);
  16. if (tblColGrants != null && tblColGrants.size() > 0) {
  17. pm.deletePersistentAll(tblColGrants);
  18. }
  19.  
  20. List<MPartitionPrivilege> partGrants = this.listTableAllPartitionGrants(dbName, tableName);
  21. if (partGrants != null && partGrants.size() > 0) {
  22. pm.deletePersistentAll(partGrants);
  23. }
  24.  
  25. List<MPartitionColumnPrivilege> partColGrants = listTableAllPartitionColumnGrants(dbName,
  26. tableName);
  27. if (partColGrants != null && partColGrants.size() > 0) {
  28. pm.deletePersistentAll(partColGrants);
  29. }
  30. // delete column statistics if present
  31. try {
            //删除column统计表数据
  32. deleteTableColumnStatistics(dbName, tableName, null);
  33. } catch (NoSuchObjectException e) {
  34. LOG.info("Found no table level column statistics associated with db " + dbName +
  35. " table " + tableName + " record to delete");
  36. }
  37.      //删除mcd表数据
  38. preDropStorageDescriptor(tbl.getSd());
  39. //删除整个Table对象相关表数据
  40. pm.deletePersistentAll(tbl);
  41. }
  42. success = commitTransaction();
  43. } finally {
  44. if (!success) {
  45. rollbackTransaction();
  46. }
  47. }
  48. return success;
  49. }

  总结:

  5、AlterTable

  下来我们看下AlterTable,AlterTable包含的逻辑较多,因为牵扯到物理存储上的路径修改等,那么我们来一点点查看。还是从Hive类中开始,上代码:

 
  1. public void alterTable(String tblName, Table newTbl, boolean cascade)
  2. throws InvalidOperationException, HiveException {
  3. String[] names = Utilities.getDbTableName(tblName);
  4. try {
  5. //删除table kv中的DDL_TIME 因为要alterTable所以,该事件会被改变
  6. if (newTbl.getParameters() != null) {
  7. newTbl.getParameters().remove(hive_metastoreConstants.DDL_TIME);
  8. }
          //进行相关校验,包含dbName、tableName、column、inputOutClass、outputClass的校验等,如果校验不通过则抛出HiveException
  9. newTbl.checkValidity();
          //调用alterTable
  10. getMSC().alter_table(names[0], names[1], newTbl.getTTable(), cascade);
  11. } catch (MetaException e) {
  12. throw new HiveException("Unable to alter table. " + e.getMessage(), e);
  13. } catch (TException e) {
  14. throw new HiveException("Unable to alter table. " + e.getMessage(), e);
  15. }
  16. }

  对于HiveMetaClient,并没有做相应处理,所以我们直接来看HiveMetaStore服务端做了些什么呢?

 
  1. private void alter_table_core(final String dbname, final String name, final Table newTable,
  2. final EnvironmentContext envContext, final boolean cascade)
  3. throws InvalidOperationException, MetaException {
  4. startFunction("alter_table", ": db=" + dbname + " tbl=" + name
  5. + " newtbl=" + newTable.getTableName());
  6.  
  7. //更新DDL_Time
  8. if (newTable.getParameters() == null ||
  9. newTable.getParameters().get(hive_metastoreConstants.DDL_TIME) == null) {
  10. newTable.putToParameters(hive_metastoreConstants.DDL_TIME, Long.toString(System
  11. .currentTimeMillis() / 1000));
  12. }
  13. boolean success = false;
  14. Exception ex = null;
  15. try {
           //获取已有Table的整个对象
  16. Table oldt = get_table_core(dbname, name);
           //进行Event处理
  17. firePreEvent(new PreAlterTableEvent(oldt, newTable, this));
           //进行alterTable处理,后面详细说明
  18. alterHandler.alterTable(getMS(), wh, dbname, name, newTable, cascade);
  19. success = true;
  20.     
           //进行Listener处理
  21. for (MetaStoreEventListener listener : listeners) {
  22.  
  23. AlterTableEvent alterTableEvent =
  24. new AlterTableEvent(oldt, newTable, success, this);
  25. alterTableEvent.setEnvironmentContext(envContext);
  26. listener.onAlterTable(alterTableEvent);
  27. }
  28. } catch (NoSuchObjectException e) {
  29. // thrown when the table to be altered does not exist
  30. ex = e;
  31. throw new InvalidOperationException(e.getMessage());
  32. } catch (Exception e) {
  33. ex = e;
  34. if (e instanceof MetaException) {
  35. throw (MetaException) e;
  36. } else if (e instanceof InvalidOperationException) {
  37. throw (InvalidOperationException) e;
  38. } else {
  39. throw newMetaException(e);
  40. }
  41. } finally {
  42. endFunction("alter_table", success, ex, name);
  43. }
  44. }

  那么,我们重点看下alterHandler具体所做的事情,在这之前简要说下alterHandler的初始化,它是在HiveMetaStore init时获取的hive.metastore.alter.impl参数的className,也就是HiveAlterHandler的name,那么具体,我们来看下它alterTable时的实现,前方高能,小心火烛:)

 
  1. public void alterTable(RawStore msdb, Warehouse wh, String dbname,
  2. String name, Table newt, boolean cascade) throws InvalidOperationException, MetaException {
  3. if (newt == null) {
  4. throw new InvalidOperationException("New table is invalid: " + newt);
  5. }
  6.    //校验新的tableName是否合法
  7. if (!MetaStoreUtils.validateName(newt.getTableName())) {
  8. throw new InvalidOperationException(newt.getTableName()
  9. + " is not a valid object name");
  10. }
         //校验新的column Name type是否合法
  11. String validate = MetaStoreUtils.validateTblColumns(newt.getSd().getCols());
  12. if (validate != null) {
  13. throw new InvalidOperationException("Invalid column " + validate);
  14. }
  15.  
  16. Path srcPath = null;
  17. FileSystem srcFs = null;
  18. Path destPath = null;
  19. FileSystem destFs = null;
  20.  
  21. boolean success = false;
  22. boolean moveData = false;
  23. boolean rename = false;
  24. Table oldt = null;
  25. List<ObjectPair<Partition, String>> altps = new ArrayList<ObjectPair<Partition, String>>();
  26.  
  27. try {
  28. msdb.openTransaction();
          //这里直接转换小写,可以看出 代码不是一个人写的
  29. name = name.toLowerCase();
  30. dbname = dbname.toLowerCase();
  31.  
  32. //校验新的tableName是否存在
  33. if (!newt.getTableName().equalsIgnoreCase(name)
  34. || !newt.getDbName().equalsIgnoreCase(dbname)) {
  35. if (msdb.getTable(newt.getDbName(), newt.getTableName()) != null) {
  36. throw new InvalidOperationException("new table " + newt.getDbName()
  37. + "." + newt.getTableName() + " already exists");
  38. }
  39. rename = true;
  40. }
  41.  
  42. //获取老的table对象
  43. oldt = msdb.getTable(dbname, name);
  44. if (oldt == null) {
  45. throw new InvalidOperationException("table " + newt.getDbName() + "."
  46. + newt.getTableName() + " doesn't exist");
  47. }
  48.     //alter Table时 获取 METASTORE_DISALLOW_INCOMPATIBLE_COL_TYPE_CHANGES配置项,如果为true的话,将改变column的type类型,这里为false
  49. if (HiveConf.getBoolVar(hiveConf,
  50. HiveConf.ConfVars.METASTORE_DISALLOW_INCOMPATIBLE_COL_TYPE_CHANGES,
  51. false)) {
  52. // Throws InvalidOperationException if the new column types are not
  53. // compatible with the current column types.
  54. MetaStoreUtils.throwExceptionIfIncompatibleColTypeChange(
  55. oldt.getSd().getCols(), newt.getSd().getCols());
  56. }
  57.     //cascade参数由调用Hive altertable方法穿过来的,也就是引擎调用时参数的设置,这里用来查看是否需要alterPartition信息
  58. if (cascade) {
  59. //校验新的column是否与老的column一致,如不一致,说明进行了column的添加或删除操作
  60. if(MetaStoreUtils.isCascadeNeededInAlterTable(oldt, newt)) {
            //根据dbName与tableName获取整个partition的信息
  61. List<Partition> parts = msdb.getPartitions(dbname, name, -1);
  62. for (Partition part : parts) {
  63. List<FieldSchema> oldCols = part.getSd().getCols();
  64. part.getSd().setCols(newt.getSd().getCols());
  65. String oldPartName = Warehouse.makePartName(oldt.getPartitionKeys(), part.getValues());
              //如果columns不一致,则删除已有的column统计信息
  66. updatePartColumnStatsForAlterColumns(msdb, part, oldPartName, part.getValues(), oldCols, part);
              //更新整个Partition的信息
  67. msdb.alterPartition(dbname, name, part.getValues(), part);
  68. }
  69. } else {
  70. LOG.warn("Alter table does not cascade changes to its partitions.");
  71. }
  72. }
  73.  
  74. //判断parititonkey是否改变,也就是dt 或 hour等partName是否改变
  75. boolean partKeysPartiallyEqual = checkPartialPartKeysEqual(oldt.getPartitionKeys(),
  76. newt.getPartitionKeys());
  77.     
          //如果已有表为视图表,同时发现老的partkey与新的partKey不一致,则报错
  78. if(!oldt.getTableType().equals(TableType.VIRTUAL_VIEW.toString())){
  79. if (oldt.getPartitionKeys().size() != newt.getPartitionKeys().size()
  80. || !partKeysPartiallyEqual) {
  81. throw new InvalidOperationException(
  82. "partition keys can not be changed.");
  83. }
  84. }
  85.  
  86.       //如果该表不为视图表,同时,该表的location信息并未发生变化,同时新的location信息并不为空,并且已有的该表不为外部表,说明用户是想要移动数据到新的location地址,那么该操作
           // 为alter table rename操作
  87. if (rename
  88. && !oldt.getTableType().equals(TableType.VIRTUAL_VIEW.toString())
  89. && (oldt.getSd().getLocation().compareTo(newt.getSd().getLocation()) == 0
  90. || StringUtils.isEmpty(newt.getSd().getLocation()))
  91. && !MetaStoreUtils.isExternalTable(oldt)) {
  92.      //获取新的location信息
  93. srcPath = new Path(oldt.getSd().getLocation());
  94. srcFs = wh.getFs(srcPath);
  95.  
  96. // that means user is asking metastore to move data to new location
  97. // corresponding to the new name
  98. // get new location
  99. Database db = msdb.getDatabase(newt.getDbName());
  100. Path databasePath = constructRenamedPath(wh.getDatabasePath(db), srcPath);
  101. destPath = new Path(databasePath, newt.getTableName());
  102. destFs = wh.getFs(destPath);
  103.      //设置新的table location信息 用于后续更新动作
  104. newt.getSd().setLocation(destPath.toString());
  105. moveData = true;
  106.  
  107.        //校验物理目标地址是否存在,如果存在则会override所有数据,是不允许的。
  108. if (!FileUtils.equalsFileSystem(srcFs, destFs)) {
  109. throw new InvalidOperationException("table new location " + destPath
  110. + " is on a different file system than the old location "
  111. + srcPath + ". This operation is not supported");
  112. }
  113. try {
  114. srcFs.exists(srcPath); // check that src exists and also checks
  115. // permissions necessary
  116. if (destFs.exists(destPath)) {
  117. throw new InvalidOperationException("New location for this table "
  118. + newt.getDbName() + "." + newt.getTableName()
  119. + " already exists : " + destPath);
  120. }
  121. } catch (IOException e) {
  122. throw new InvalidOperationException("Unable to access new location "
  123. + destPath + " for table " + newt.getDbName() + "."
  124. + newt.getTableName());
  125. }
  126. String oldTblLocPath = srcPath.toUri().getPath();
  127. String newTblLocPath = destPath.toUri().getPath();
  128.     
  129. //获取old table中的所有partition信息
  130. List<Partition> parts = msdb.getPartitions(dbname, name, -1);
  131. for (Partition part : parts) {
  132. String oldPartLoc = part.getSd().getLocation();
            //这里,便开始新老partition地址的变换,修改partition元数据信息
  133. if (oldPartLoc.contains(oldTblLocPath)) {
  134. URI oldUri = new Path(oldPartLoc).toUri();
  135. String newPath = oldUri.getPath().replace(oldTblLocPath, newTblLocPath);
  136. Path newPartLocPath = new Path(oldUri.getScheme(), oldUri.getAuthority(), newPath);
  137. altps.add(ObjectPair.create(part, part.getSd().getLocation()));
  138. part.getSd().setLocation(newPartLocPath.toString());
  139. String oldPartName = Warehouse.makePartName(oldt.getPartitionKeys(), part.getValues());
  140. try {
  141. //existing partition column stats is no longer valid, remove them
  142. msdb.deletePartitionColumnStatistics(dbname, name, oldPartName, part.getValues(), null);
  143. } catch (InvalidInputException iie) {
  144. throw new InvalidOperationException("Unable to update partition stats in table rename." + iie);
  145. }
  146. msdb.alterPartition(dbname, name, part.getValues(), part);
  147. }
  148. }
           //更新stats相关信息
  149. } else if (MetaStoreUtils.requireCalStats(hiveConf, null, null, newt) &&
  150. (newt.getPartitionKeysSize() == 0)) {
  151. Database db = msdb.getDatabase(newt.getDbName());
  152. // Update table stats. For partitioned table, we update stats in
  153. // alterPartition()
  154. MetaStoreUtils.updateUnpartitionedTableStatsFast(db, newt, wh, false, true);
  155. }
  156. updateTableColumnStatsForAlterTable(msdb, oldt, newt);
  157. // now finally call alter table
  158. msdb.alterTable(dbname, name, newt);
  159. // commit the changes
  160. success = msdb.commitTransaction();
  161. } catch (InvalidObjectException e) {
  162. LOG.debug(e);
  163. throw new InvalidOperationException(
  164. "Unable to change partition or table."
  165. + " Check metastore logs for detailed stack." + e.getMessage());
  166. } catch (NoSuchObjectException e) {
  167. LOG.debug(e);
  168. throw new InvalidOperationException(
  169. "Unable to change partition or table. Database " + dbname + " does not exist"
  170. + " Check metastore logs for detailed stack." + e.getMessage());
  171. } finally {
  172. if (!success) {
  173. msdb.rollbackTransaction();
  174. }
  175. if (success && moveData) {
  176.        //开始更新hdfs路径,进行老路径的rename到新路径 ,调用fileSystem的rename操作
  177. try {
  178. if (srcFs.exists(srcPath) && !srcFs.rename(srcPath, destPath)) {
  179. throw new IOException("Renaming " + srcPath + " to " + destPath + " failed");
  180. }
  181. } catch (IOException e) {
  182. LOG.error("Alter Table operation for " + dbname + "." + name + " failed.", e);
  183. boolean revertMetaDataTransaction = false;
  184. try {
  185. msdb.openTransaction();
             //这里会发现,又一次进行了alterTable元数据动作,或许跟JDO的特性有关?还是因为安全?
  186. msdb.alterTable(newt.getDbName(), newt.getTableName(), oldt);
  187. for (ObjectPair<Partition, String> pair : altps) {
  188. Partition part = pair.getFirst();
  189. part.getSd().setLocation(pair.getSecond());
  190. msdb.alterPartition(newt.getDbName(), name, part.getValues(), part);
  191. }
  192. revertMetaDataTransaction = msdb.commitTransaction();
  193. } catch (Exception e1) {
  194. // we should log this for manual rollback by administrator
  195. LOG.error("Reverting metadata by HDFS operation failure failed During HDFS operation failed", e1);
  196. LOG.error("Table " + Warehouse.getQualifiedName(newt) +
  197. " should be renamed to " + Warehouse.getQualifiedName(oldt));
  198. LOG.error("Table " + Warehouse.getQualifiedName(newt) +
  199. " should have path " + srcPath);
  200. for (ObjectPair<Partition, String> pair : altps) {
  201. LOG.error("Partition " + Warehouse.getQualifiedName(pair.getFirst()) +
  202. " should have path " + pair.getSecond());
  203. }
  204. if (!revertMetaDataTransaction) {
  205. msdb.rollbackTransaction();
  206. }
  207. }
  208. throw new InvalidOperationException("Alter Table operation for " + dbname + "." + name +
  209. " failed to move data due to: '" + getSimpleMessage(e) + "' See hive log file for details.");
  210. }
  211. }
  212. }
  213. if (!success) {
  214. throw new MetaException("Committing the alter table transaction was not successful.");
  215. }
  216. }

  总结:

  6、createPartition
  在分区数据写入之前,会先进行partition的元数据注册及物理文件路径的创建(内部表),Hive类代码如下:

 
  1. public Partition createPartition(Table tbl, Map<String, String> partSpec) throws HiveException {
  2. try {
        //new出来一个Partition对象,传入Table对象,调用Partition的构造方法来initialize Partition的信息
  3. return new Partition(tbl, getMSC().add_partition(
  4. Partition.createMetaPartitionObject(tbl, partSpec, null)));
  5. } catch (Exception e) {
  6. LOG.error(StringUtils.stringifyException(e));
  7. throw new HiveException(e);
  8. }
  9. }

  这里的createMetaPartitionObject作用在于整个Partition传入对象的校验对对象的封装,代码如下:

 
  1. public static org.apache.hadoop.hive.metastore.api.Partition createMetaPartitionObject(
  2. Table tbl, Map<String, String> partSpec, Path location) throws HiveException {
  3. List<String> pvals = new ArrayList<String>();
        //遍历整个PartCols,并且校验partMap中是否一一对应
  4. for (FieldSchema field : tbl.getPartCols()) {
  5. String val = partSpec.get(field.getName());
  6. if (val == null || val.isEmpty()) {
  7. throw new HiveException("partition spec is invalid; field "
  8. + field.getName() + " does not exist or is empty");
  9. }
  10. pvals.add(val);
  11. }
  12.   //set相关的属性信息,包括DbName、TableName、PartValues、以及sd信息
  13. org.apache.hadoop.hive.metastore.api.Partition tpart =
  14. new org.apache.hadoop.hive.metastore.api.Partition();
  15. tpart.setDbName(tbl.getDbName());
  16. tpart.setTableName(tbl.getTableName());
  17. tpart.setValues(pvals);
  18.  
  19. if (!tbl.isView()) {
  20. tpart.setSd(cloneS d(tbl));
  21. tpart.getSd().setLocation((location != null) ? location.toString() : null);
  22. }
  23. return tpart;
  24. }

  随之MetaDataClient对于该对象调用MetaDataService的addPartition,并进行了深拷贝,这里不再详细说明,那么我们直接看下服务端干了什么:

 
  1. private Partition add_partition_core(final RawStore ms,
  2. final Partition part, final EnvironmentContext envContext)
  3. throws InvalidObjectException, AlreadyExistsException, MetaException, TException {
  4. boolean success = false;
  5. Table tbl = null;
  6. try {
  7. ms.openTransaction();
           //根据DbName、TableName获取整个Table对象信息
  8. tbl = ms.getTable(part.getDbName(), part.getTableName());
  9. if (tbl == null) {
  10. throw new InvalidObjectException(
  11. "Unable to add partition because table or database do not exist");
  12. }
  13.      //事件处理
  14. firePreEvent(new PreAddPartitionEvent(tbl, part, this));
  15.      //在创建Partition之前,首先会校验元数据中该partition是否存在
  16. boolean shouldAdd = startAddPartition(ms, part, false);
  17. assert shouldAdd; // start would throw if it already existed here
           //创建Partition路径
  18. boolean madeDir = createLocationForAddedPartition(tbl, part);
  19. try {
            //加载一些kv信息
  20. initializeAddedPartition(tbl, part, madeDir);
            //写入元数据
  21. success = ms.addPartition(part);
  22. } finally {
  23. if (!success && madeDir) {
             //如果没有成功,便删除物理路径
  24. wh.deleteDir(new Path(part.getSd().getLocation()), true);
  25. }
  26. }
  27. // we proceed only if we'd actually succeeded anyway, otherwise,
  28. // we'd have thrown an exception
  29. success = success && ms.commitTransaction();
  30. } finally {
  31. if (!success) {
  32. ms.rollbackTransaction();
  33. }
  34. fireMetaStoreAddPartitionEvent(tbl, Arrays.asList(part), envContext, success);
  35. }
  36. return part;
  37. }

  这里提及一个设计上的点,从之前的表结构设计上,没有直接存储PartName,而是将key与value单独存在与kv表中,这里我们看下createLocationForAddedPartition:

 
  1. private boolean createLocationForAddedPartition(
  2. final Table tbl, final Partition part) throws MetaException {
  3. Path partLocation = null;
  4. String partLocationStr = null;
          //如果sd不为null,则将sd的location信息作为表跟目录赋给partLocationStr
  5. if (part.getSd() != null) {
  6. partLocationStr = part.getSd().getLocation();
  7. }
  8.     //如果为null,则重新拼接part Location
  9. if (partLocationStr == null || partLocationStr.isEmpty()) {
  10. // set default location if not specified and this is
  11. // a physical table partition (not a view)
  12. if (tbl.getSd().getLocation() != null) {
            //如果不为null,则继续拼接文件路径及part的路径,组成完成的Partition location
  13. partLocation = new Path(tbl.getSd().getLocation(), Warehouse
  14. .makePartName(tbl.getPartitionKeys(), part.getValues()));
  15. }
  16. } else {
  17. if (tbl.getSd().getLocation() == null) {
  18. throw new MetaException("Cannot specify location for a view partition");
  19. }
  20. partLocation = wh.getDnsPath(new Path(partLocationStr));
  21. }
  22.  
  23. boolean result = false;
         //将location信息写入sd表
  24. if (partLocation != null) {
  25. part.getSd().setLocation(partLocation.toString());
  26.  
  27. // Check to see if the directory already exists before calling
  28. // mkdirs() because if the file system is read-only, mkdirs will
  29. // throw an exception even if the directory already exists.
  30. if (!wh.isDir(partLocation)) {
  31. if (!wh.mkdirs(partLocation, true)) {
  32. throw new MetaException(partLocation
  33. + " is not a directory or unable to create one");
  34. }
  35. result = true;
  36. }
  37. }
  38. return result;
  39. }

  总结:

  7、dropPartition

  删除partition就不再从Hive开始了,我们直接看HiveMetaStore服务端做了什么:

 
  1. private boolean drop_partition_common(RawStore ms, String db_name, String tbl_name,
  2. List<String> part_vals, final boolean deleteData, final EnvironmentContext envContext)
  3. throws MetaException, NoSuchObjectException, IOException, InvalidObjectException,
  4. InvalidInputException {
  5. boolean success = false;
  6. Path partPath = null;
  7. Table tbl = null;
  8. Partition part = null;
  9. boolean isArchived = false;
  10. Path archiveParentDir = null;
  11. boolean mustPurge = false;
  12.  
  13. try {
  14. ms.openTransaction();
           //根据dbName、tableName、part_values获取整个part信息
  15. part = ms.getPartition(db_name, tbl_name, part_vals);
           //获取所有Table对象
  16. tbl = get_table_core(db_name, tbl_name);
  17. firePreEvent(new PreDropPartitionEvent(tbl, part, deleteData, this));
  18. mustPurge = isMustPurge(envContext, tbl);
  19.  
  20. if (part == null) {
  21. throw new NoSuchObjectException("Partition doesn't exist. "
  22. + part_vals);
  23. }
  24.      //这一片还没有深入看Arrchived partition
  25. isArchived = MetaStoreUtils.isArchived(part);
  26. if (isArchived) {
  27. archiveParentDir = MetaStoreUtils.getOriginalLocation(part);
  28. verifyIsWritablePath(archiveParentDir);
  29. checkTrashPurgeCombination(archiveParentDir, db_name + "." + tbl_name + "." + part_vals, mustPurge);
  30. }
  31. if (!ms.dropPartition(db_name, tbl_name, part_vals)) {
  32. throw new MetaException("Unable to drop partition");
  33. }
  34. success = ms.commitTransaction();
  35. if ((part.getSd() != null) && (part.getSd().getLocation() != null)) {
  36. partPath = new Path(part.getSd().getLocation());
  37. verifyIsWritablePath(partPath);
  38. checkTrashPurgeCombination(partPath, db_name + "." + tbl_name + "." + part_vals, mustPurge);
  39. }
  40. } finally {
  41. if (!success) {
  42. ms.rollbackTransaction();
  43. } else if (deleteData && ((partPath != null) || (archiveParentDir != null))) {
  44. if (tbl != null && !isExternal(tbl)) {
  45. if (mustPurge) {
  46. LOG.info("dropPartition() will purge " + partPath + " directly, skipping trash.");
  47. }
  48. else {
  49. LOG.info("dropPartition() will move " + partPath + " to trash-directory.");
  50. }
             //删除partition
  51. // Archived partitions have har:/to_har_file as their location.
  52. // The original directory was saved in params
  53. if (isArchived) {
  54. assert (archiveParentDir != null);
  55. wh.deleteDir(archiveParentDir, true, mustPurge);
  56. } else {
  57. assert (partPath != null);
  58. wh.deleteDir(partPath, true, mustPurge);
  59. deleteParentRecursive(partPath.getParent(), part_vals.size() - 1, mustPurge);
  60. }
  61. // ok even if the data is not deleted
  62. }
  63. }
  64. for (MetaStoreEventListener listener : listeners) {
  65. DropPartitionEvent dropPartitionEvent =
  66. new DropPartitionEvent(tbl, part, success, deleteData, this);
  67. dropPartitionEvent.setEnvironmentContext(envContext);
  68. listener.onDropPartition(dropPartitionEvent);
  69. }
  70. }
  71. return true;
  72. }

  总结:

  8、alterPartition

  alterPartition牵扯的校验及文件目录的修改,我们直接从HiveMetaStore中的rename_partition中查看:

 
  1. private void rename_partition(final String db_name, final String tbl_name,
  2. final List<String> part_vals, final Partition new_part,
  3. final EnvironmentContext envContext)
  4. throws InvalidOperationException, MetaException,
  5. TException {
          //日志记录
  6. startTableFunction("alter_partition", db_name, tbl_name);
  7.  
  8. if (LOG.isInfoEnabled()) {
  9. LOG.info("New partition values:" + new_part.getValues());
  10. if (part_vals != null && part_vals.size() > 0) {
  11. LOG.info("Old Partition values:" + part_vals);
  12. }
  13. }
  14.  
  15. Partition oldPart = null;
  16. Exception ex = null;
  17. try {
  18. firePreEvent(new PreAlterPartitionEvent(db_name, tbl_name, part_vals, new_part, this));
  19.      //校验PartName的规范性
  20. if (part_vals != null && !part_vals.isEmpty()) {
  21. MetaStoreUtils.validatePartitionNameCharacters(new_part.getValues(),
  22. partitionValidationPattern);
  23. }
  24.      调用alterHandler的alterPartition进行partition物理上的rename,以及元数据修改
  25. oldPart = alterHandler.alterPartition(getMS(), wh, db_name, tbl_name, part_vals, new_part);
  26.  
  27. // Only fetch the table if we actually have a listener
  28. Table table = null;
  29. for (MetaStoreEventListener listener : listeners) {
  30. if (table == null) {
  31. table = getMS().getTable(db_name, tbl_name);
  32. }
  33. AlterPartitionEvent alterPartitionEvent =
  34. new AlterPartitionEvent(oldPart, new_part, table, true, this);
  35. alterPartitionEvent.setEnvironmentContext(envContext);
  36. listener.onAlterPartition(alterPartitionEvent);
  37. }
  38. } catch (InvalidObjectException e) {
  39. ex = e;
  40. throw new InvalidOperationException(e.getMessage());
  41. } catch (AlreadyExistsException e) {
  42. ex = e;
  43. throw new InvalidOperationException(e.getMessage());
  44. } catch (Exception e) {
  45. ex = e;
  46. if (e instanceof MetaException) {
  47. throw (MetaException) e;
  48. } else if (e instanceof InvalidOperationException) {
  49. throw (InvalidOperationException) e;
  50. } else if (e instanceof TException) {
  51. throw (TException) e;
  52. } else {
  53. throw newMetaException(e);
  54. }
  55. } finally {
  56. endFunction("alter_partition", oldPart != null, ex, tbl_name);
  57. }
  58. return;
  59. }

  这里我们着重看一下,alterHandler.alterPartition方法,前方高能:

 
  1. public Partition alterPartition(final RawStore msdb, Warehouse wh, final String dbname,
  2. final String name, final List<String> part_vals, final Partition new_part)
  3. throws InvalidOperationException, InvalidObjectException, AlreadyExistsException,
  4. MetaException {
  5. boolean success = false;
  6.  
  7. Path srcPath = null;
  8. Path destPath = null;
  9. FileSystem srcFs = null;
  10. FileSystem destFs = null;
  11. Partition oldPart = null;
  12. String oldPartLoc = null;
  13. String newPartLoc = null;
  14.  
  15. //修改新的partition的DDL时间
  16. if (new_part.getParameters() == null ||
  17. new_part.getParameters().get(hive_metastoreConstants.DDL_TIME) == null ||
  18. Integer.parseInt(new_part.getParameters().get(hive_metastoreConstants.DDL_TIME)) == 0) {
  19. new_part.putToParameters(hive_metastoreConstants.DDL_TIME, Long.toString(System
  20. .currentTimeMillis() / 1000));
  21. }
  22.    //根据dbName、tableName获取整个Table对象
  23. Table tbl = msdb.getTable(dbname, name);
  24. //如果传入的part_vals为空或为0,说明修改的只是partition的其他元数据信息而不牵扯到partKV,则直接元数据,在msdb.alterPartition会直接更新
  25. if (part_vals == null || part_vals.size() == 0) {
  26. try {
  27. oldPart = msdb.getPartition(dbname, name, new_part.getValues());
  28. if (MetaStoreUtils.requireCalStats(hiveConf, oldPart, new_part, tbl)) {
  29. MetaStoreUtils.updatePartitionStatsFast(new_part, wh, false, true);
  30. }
  31. updatePartColumnStats(msdb, dbname, name, new_part.getValues(), new_part);
  32. msdb.alterPartition(dbname, name, new_part.getValues(), new_part);
  33. } catch (InvalidObjectException e) {
  34. throw new InvalidOperationException("alter is not possible");
  35. } catch (NoSuchObjectException e){
  36. //old partition does not exist
  37. throw new InvalidOperationException("alter is not possible");
  38. }
  39. return oldPart;
  40. }
  41. //rename partition
  42. try {
  43. msdb.openTransaction();
  44. try {
           //获取oldPart对象信息
  45. oldPart = msdb.getPartition(dbname, name, part_vals);
  46. } catch (NoSuchObjectException e) {
  47. // this means there is no existing partition
  48. throw new InvalidObjectException(
  49. "Unable to rename partition because old partition does not exist");
  50. }
  51. Partition check_part = null;
  52. try {
           //组装newPart的partValues等Partition信息
  53. check_part = msdb.getPartition(dbname, name, new_part.getValues());
  54. } catch(NoSuchObjectException e) {
  55. // this means there is no existing partition
  56. check_part = null;
  57. }
          //如果check_part组装成功,说明该part已经存在,则报already exists
  58. if (check_part != null) {
  59. throw new AlreadyExistsException("Partition already exists:" + dbname + "." + name + "." +
  60. new_part.getValues());
  61. }
          //table的信息校验
  62. if (tbl == null) {
  63. throw new InvalidObjectException(
  64. "Unable to rename partition because table or database do not exist");
  65. }
  66.  
  67. //如果是外部表的分区变化了,那么不需要操作文件系统,直接更新meta信息即可
  68. if (tbl.getTableType().equals(TableType.EXTERNAL_TABLE.toString())) {
  69. new_part.getSd().setLocation(oldPart.getSd().getLocation());
  70. String oldPartName = Warehouse.makePartName(tbl.getPartitionKeys(), oldPart.getValues());
  71. try {
  72. //existing partition column stats is no longer valid, remove
  73. msdb.deletePartitionColumnStatistics(dbname, name, oldPartName, oldPart.getValues(), null);
  74. } catch (NoSuchObjectException nsoe) {
  75. //ignore
  76. } catch (InvalidInputException iie) {
  77. throw new InvalidOperationException("Unable to update partition stats in table rename." + iie);
  78. }
  79. msdb.alterPartition(dbname, name, part_vals, new_part);
  80. } else {
  81. try {
             //获取Table的文件路径
  82. destPath = new Path(wh.getTablePath(msdb.getDatabase(dbname), name),
  83. Warehouse.makePartName(tbl.getPartitionKeys(), new_part.getValues()));
             //拼接新的Partition的路径
  84. destPath = constructRenamedPath(destPath, new Path(new_part.getSd().getLocation()));
  85. } catch (NoSuchObjectException e) {
  86. LOG.debug(e);
  87. throw new InvalidOperationException(
  88. "Unable to change partition or table. Database " + dbname + " does not exist"
  89. + " Check metastore logs for detailed stack." + e.getMessage());
  90. }
           //如果destPath不为空,说明改变了文件路径
  91. if (destPath != null) {
  92. newPartLoc = destPath.toString();
  93. oldPartLoc = oldPart.getSd().getLocation();
  94.       //根据原有sd的路径获取老的part路径信息
  95. srcPath = new Path(oldPartLoc);
  96.  
  97. LOG.info("srcPath:" + oldPartLoc);
  98. LOG.info("descPath:" + newPartLoc);
  99. srcFs = wh.getFs(srcPath);
  100. destFs = wh.getFs(destPath);
  101. //查看srcFS与destFs是否Wie同一个fileSystem
  102. if (!FileUtils.equalsFileSystem(srcFs, destFs)) {
  103. throw new InvalidOperationException("table new location " + destPath
  104. + " is on a different file system than the old location "
  105. + srcPath + ". This operation is not supported");
  106. }
  107. try {
              //校验老的partition路径与新的partition路径是否一致,同时新的partition路径是否已经存在  
  108. srcFs.exists(srcPath); // check that src exists and also checks
  109. if (newPartLoc.compareTo(oldPartLoc) != 0 && destFs.exists(destPath)) {
  110. throw new InvalidOperationException("New location for this table "
  111. + tbl.getDbName() + "." + tbl.getTableName()
  112. + " already exists : " + destPath);
  113. }
  114. } catch (IOException e) {
  115. throw new InvalidOperationException("Unable to access new location "
  116. + destPath + " for partition " + tbl.getDbName() + "."
  117. + tbl.getTableName() + " " + new_part.getValues());
  118. }
  119. new_part.getSd().setLocation(newPartLoc);
  120. if (MetaStoreUtils.requireCalStats(hiveConf, oldPart, new_part, tbl)) {
  121. MetaStoreUtils.updatePartitionStatsFast(new_part, wh, false, true);
  122. }
             //拼接oldPartName,并且删除原有oldPart的信息,写入新的partition信息
  123. String oldPartName = Warehouse.makePartName(tbl.getPartitionKeys(), oldPart.getValues());
  124. try {
  125. //existing partition column stats is no longer valid, remove
  126. msdb.deletePartitionColumnStatistics(dbname, name, oldPartName, oldPart.getValues(), null);
  127. } catch (NoSuchObjectException nsoe) {
  128. //ignore
  129. } catch (InvalidInputException iie) {
  130. throw new InvalidOperationException("Unable to update partition stats in table rename." + iie);
  131. }
  132. msdb.alterPartition(dbname, name, part_vals, new_part);
  133. }
  134. }
  135.  
  136. success = msdb.commitTransaction();
  137. } finally {
  138. if (!success) {
  139. msdb.rollbackTransaction();
  140. }
  141. if (success && newPartLoc != null && newPartLoc.compareTo(oldPartLoc) != 0) {
  142. //rename the data directory
  143. try{
  144. if (srcFs.exists(srcPath)) {
  145. //如果根路径海微创建,需要重新进行创建,就好比计算引擎先调用了alterTable,又调用了alterPartition,这时partition的根路径或许还未创建
  146. Path destParentPath = destPath.getParent();
  147. if (!wh.mkdirs(destParentPath, true)) {
  148. throw new IOException("Unable to create path " + destParentPath);
  149. }
              //进行原路径与目标路径的rename
  150. wh.renameDir(srcPath, destPath, true);
  151. LOG.info("rename done!");
  152. }
  153. } catch (IOException e) {
  154. boolean revertMetaDataTransaction = false;
  155. try {
  156. msdb.openTransaction();
  157. msdb.alterPartition(dbname, name, new_part.getValues(), oldPart);
  158. revertMetaDataTransaction = msdb.commitTransaction();
  159. } catch (Exception e1) {
  160. LOG.error("Reverting metadata opeation failed During HDFS operation failed", e1);
  161. if (!revertMetaDataTransaction) {
  162. msdb.rollbackTransaction();
  163. }
  164. }
  165. throw new InvalidOperationException("Unable to access old location "
  166. + srcPath + " for partition " + tbl.getDbName() + "."
  167. + tbl.getTableName() + " " + part_vals);
  168. }
  169. }
  170. }
  171. return oldPart;
  172. }

  总结:

  • 1
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值