phoenix local index 本地索引分裂源码分析-CSDN博客

本文链接：https://blog.csdn.net/gaoshui87/article/details/52300293

phoenix 本地索引分裂源码分析

phoenix的表有时要建立二级索引，这里可以分为local和global,下面分析一下当数据表进行分拆时，对应的该本地索引表
的拆分情况。（本地索引表和数据表对应的数据区间保证存放在同一个regionserver中）。

LocalIndexSplitter 类分析

public void preSplitBeforePONR(ObserverContext<RegionCoprocessorEnvironment> ctx,
        byte[] splitKey, List<Mutation> metaEntries) throws IOException {
    RegionCoprocessorEnvironment environment = ctx.getEnvironment();
    HTableDescriptor tableDesc = ctx.getEnvironment().getRegion().getTableDesc();
    if (SchemaUtil.isSystemTable(tableDesc.getName())) {
        return;
    }
    final RegionServerServices rss = ctx.getEnvironment().getRegionServerServices();
    //如果没有本地索索属性标识
    if (tableDesc.getValue(MetaDataUtil.IS_LOCAL_INDEX_TABLE_PROP_BYTES) == null
            || !Boolean.TRUE.equals(PBoolean.INSTANCE.toObject(tableDesc
                    .getValue(MetaDataUtil.IS_LOCAL_INDEX_TABLE_PROP_BYTES)))) {
        TableName indexTable =
                TableName.valueOf(MetaDataUtil.getLocalIndexPhysicalName(tableDesc.getName()));
        //如果索引表不存在就直接返回
        if (!MetaTableAccessor.tableExists(rss.getConnection(), indexTable)) return;
        //拿到当前region对应的索引表的region
        Region indexRegion = IndexUtil.getIndexRegion(environment);
        if (indexRegion == null) {
            LOG.warn("Index region corresponindg to data region " + environment.getRegion()
                    + " not in the same server. So skipping the split.");
            ctx.bypass();
            return;
        }
        // FIXME: Uses private type
        try {
            int encodedVersion = VersionUtil.encodeVersion(environment.getHBaseVersion());
            if(encodedVersion >= SPLIT_TXN_MINIMUM_SUPPORTED_VERSION) {
                //对该索引表的region也进行了同一个splitKey的拆分
                st = new SplitTransactionImpl(indexRegion, splitKey);
                st.useZKForAssignment =
                        environment.getConfiguration().getBoolean("hbase.assignment.usezk",
                            true);
            } else {
                st = new IndexSplitTransaction(indexRegion, splitKey);
            }

            if (!st.prepare()) {
                LOG.error("Prepare for the table " + indexRegion.getTableDesc().getNameAsString()
                    + " failed. So returning null. ");
                ctx.bypass();
                return;
            }
             //在这里进行强行拆分了
            ((HRegion)indexRegion).forceSplit(splitKey);
            User.runAsLoginUser(new PrivilegedExceptionAction<Void>() {
              @Override
              public Void run() throws Exception {            
                  //在里面对传进去的 splitKey 进行后台线程的拆分。返回的first 和second就是两个新的region对象
                daughterRegions = st.stepsBeforePONR(rss, rss, false);
                return null;
              }
            });
            HRegionInfo copyOfParent = new HRegionInfo(indexRegion.getRegionInfo());
            copyOfParent.setOffline(true);
            copyOfParent.setSplit(true);
            // Put for parent
            Put putParent = MetaTableAccessor.makePutFromRegionInfo(copyOfParent);
            MetaTableAccessor.addDaughtersToPut(putParent,
                    daughterRegions.getFirst().getRegionInfo(),
                    daughterRegions.getSecond().getRegionInfo());
            metaEntries.add(putParent);
            // Puts for daughters
            Put putA = MetaTableAccessor.makePutFromRegionInfo(
                    daughterRegions.getFirst().getRegionInfo());
            Put putB = MetaTableAccessor.makePutFromRegionInfo(
                    daughterRegions.getSecond().getRegionInfo());
            st.addLocation(putA, rss.getServerName(), 1);
            st.addLocation(putB, rss.getServerName(), 1);
            metaEntries.add(putA);
            metaEntries.add(putB);
        } catch (Exception e) {
            ctx.bypass();
            LOG.warn("index region splitting failed with the exception ", e);
            if (st != null){
                st.rollback(rss, rss);
                st = null;
                daughterRegions = null;
            }
        }
    }
}

通过上面代码可以看到，首先，就是判断一个这个表是否是本地的索引表（上面感觉有点问题，当本地索引属性为否时，才进行下面的逻辑运行，是BUG ?）。
然后查看该物理表对应的索引表是那一个，拿到当前region对应的索引region(目前是在协处理器中进行处理的,是对应于多个region的并行处理)。
然后根据传进来的拆分key splitKey 创建了 SplitTransactionImpl对象，该对象专门给一个region进行拆分的。然后下面就强行调用了forceSplit
方法进行拆分，最后调用 stepsBeforePONR 方法进行等待后台拆分完成。

可以分析一下SplitTransactionImpl源码

  public boolean prepare() throws IOException {
if (!this.parent.isSplittable()) return false;
// Split key can be null if this region is unsplittable; i.e. has refs.
if (this.splitrow == null) return false;
HRegionInfo hri = this.parent.getRegionInfo();
parent.prepareToSplit();
// Check splitrow.
byte [] startKey = hri.getStartKey();
byte [] endKey = hri.getEndKey();
if (Bytes.equals(startKey, splitrow) ||
    !this.parent.getRegionInfo().containsRow(splitrow)) {
  LOG.info("Split row is not inside region key range or is equal to " +
      "startkey: " + Bytes.toStringBinary(this.splitrow));
  return false;
}
long rid = getDaughterRegionIdTimestamp(hri);
this.hri_a = new HRegionInfo(hri.getTable(), startKey, this.splitrow, false, rid);
this.hri_b = new HRegionInfo(hri.getTable(), this.splitrow, endKey, false, rid);

transition(SplitTransactionPhase.PREPARED);

return true;

}

可以看到上面方法，对传进来的splitrow对象，在当前的region中进行了创建两个region的拆分中间键。
在下面的 SplitTransactionImpl.stepsBeforePONR 方法中

//拿到数据文件集
 hstoreFilesToSplit = this.parent.close(false);
// TODO: If splitStoreFiles were multithreaded would we complete steps in
// less elapsed time?  St.Ack 20100920
//
// splitStoreFiles creates daughter region dirs under the parent splits dir
// Nothing to unroll here if failure -- clean up of CREATE_SPLIT_DIR will
// clean this up.
Pair<Integer, Integer> expectedReferences = splitStoreFiles(hstoreFilesToSplit);

// Log to the journal that we are creating region A, the first daughter
// region.  We could fail halfway through.  If we do, we could have left
// stuff in fs that needs cleanup -- a storefile or two.  Thats why we
// add entry to journal BEFORE rather than AFTER the change.
transition(SplitTransactionPhase.STARTED_REGION_A_CREATION);

assertReferenceFileCount(expectedReferences.getFirst(),
    this.parent.getRegionFileSystem().getSplitsDir(this.hri_a));
Region a = this.parent.createDaughterRegionFromSplits(this.hri_a);
assertReferenceFileCount(expectedReferences.getFirst(),
    new Path(this.parent.getRegionFileSystem().getTableDir(), this.hri_a.getEncodedName()));

// Ditto
transition(SplitTransactionPhase.STARTED_REGION_B_CREATION);

assertReferenceFileCount(expectedReferences.getSecond(),
    this.parent.getRegionFileSystem().getSplitsDir(this.hri_b));
Region b = this.parent.createDaughterRegionFromSplits(this.hri_b);
assertReferenceFileCount(expectedReferences.getSecond(),
    new Path(this.parent.getRegionFileSystem().getTableDir(), this.hri_b.getEncodedName()));

return new PairOfSameType<Region>(a, b);

上对面该region对应的hstoreFilesToSplit 开始进行拆分

private Pair<Integer, Integer> splitStoreFiles(
  final Map<byte[], List<StoreFile>> hstoreFilesToSplit)
  throws IOException {
if (hstoreFilesToSplit == null) {
  // Could be null because close didn't succeed -- for now consider it fatal
  throw new IOException("Close returned empty list of StoreFiles");
}
// The following code sets up a thread pool executor with as many slots as
// there's files to split. It then fires up everything, waits for
// completion and finally checks for any exception
int nbFiles = 0;
for (Map.Entry<byte[], List<StoreFile>> entry: hstoreFilesToSplit.entrySet()) {
    nbFiles += entry.getValue().size();
}
if (nbFiles == 0) {
  // no file needs to be splitted.
  return new Pair<Integer, Integer>(0,0);
}
// Default max #threads to use is the smaller of table's configured number of blocking store
// files or the available number of logical cores.
int defMaxThreads = Math.min(parent.conf.getInt(HStore.BLOCKING_STOREFILES_KEY,
            HStore.DEFAULT_BLOCKING_STOREFILE_COUNT),
        Runtime.getRuntime().availableProcessors());
// Max #threads is the smaller of the number of storefiles or the default max determined above.
int maxThreads = Math.min(parent.conf.getInt(HConstants.REGION_SPLIT_THREADS_MAX,
            defMaxThreads), nbFiles);
LOG.info("Preparing to split " + nbFiles + " storefiles for region " + this.parent +
        " using " + maxThreads + " threads");
ThreadFactoryBuilder builder = new ThreadFactoryBuilder();
builder.setNameFormat("StoreFileSplitter-%1$d");
ThreadFactory factory = builder.build();
ThreadPoolExecutor threadPool =
  (ThreadPoolExecutor) Executors.newFixedThreadPool(maxThreads, factory);
List<Future<Pair<Path,Path>>> futures = new ArrayList<Future<Pair<Path,Path>>> (nbFiles);

// Split each store file.
for (Map.Entry<byte[], List<StoreFile>> entry: hstoreFilesToSplit.entrySet()) {
  for (StoreFile sf: entry.getValue()) {
    StoreFileSplitter sfs = new StoreFileSplitter(entry.getKey(), sf);
    futures.add(threadPool.submit(sfs));
  }
}
// Shutdown the pool
threadPool.shutdown();

// Wait for all the tasks to finish
try {
  boolean stillRunning = !threadPool.awaitTermination(
      this.fileSplitTimeout, TimeUnit.MILLISECONDS);
  if (stillRunning) {
    threadPool.shutdownNow();
    // wait for the thread to shutdown completely.
    while (!threadPool.isTerminated()) {
      Thread.sleep(50);
    }
    throw new IOException("Took too long to split the" +
        " files and create the references, aborting split");
  }
} catch (InterruptedException e) {
  throw (InterruptedIOException)new InterruptedIOException().initCause(e);
}

int created_a = 0;
int created_b = 0;
// Look for any exception
for (Future<Pair<Path, Path>> future : futures) {
  try {
    Pair<Path, Path> p = future.get();
    created_a += p.getFirst() != null ? 1 : 0;
    created_b += p.getSecond() != null ? 1 : 0;
  } catch (InterruptedException e) {
    throw (InterruptedIOException) new InterruptedIOException().initCause(e);
  } catch (ExecutionException e) {
    throw new IOException(e);
  }
}

if (LOG.isDebugEnabled()) {
  LOG.debug("Split storefiles for region " + this.parent + " Daughter A: " + created_a
      + " storefiles, Daughter B: " + created_b + " storefiles.");
}
return new Pair<Integer, Integer>(created_a, created_b);
}

该方法就是对数据文件进行了拆分

private Pair<Path, Path> splitStoreFile(final byte[] family, final StoreFile sf)
  throws IOException {
if (LOG.isDebugEnabled()) {
    LOG.debug("Splitting started for store file: " + sf.getPath() + " for region: " +
              this.parent);
}
HRegionFileSystem fs = this.parent.getRegionFileSystem();
String familyName = Bytes.toString(family);

Path path_a =
    fs.splitStoreFile(this.hri_a, familyName, sf, this.splitrow, false,
      this.parent.getSplitPolicy());
Path path_b =
    fs.splitStoreFile(this.hri_b, familyName, sf, this.splitrow, true,
      this.parent.getSplitPolicy());
if (LOG.isDebugEnabled()) {
    LOG.debug("Splitting complete for store file: " + sf.getPath() + " for region: " +
              this.parent);
}
return new Pair<Path,Path>(path_a, path_b);
}

可以看到上面的方法真正对数据文件进行了拆分了

  Put putA = MetaTableAccessor.makePutFromRegionInfo(
                    daughterRegions.getFirst().getRegionInfo());
            Put putB = MetaTableAccessor.makePutFromRegionInfo(
                    daughterRegions.getSecond().getRegionInfo());
            st.addLocation(putA, rss.getServerName(), 1);
            st.addLocation(putB, rss.getServerName(), 1);
            metaEntries.add(putA);
            metaEntries.add(putB);

最后该索引文件就根据传进来的splitKey进行拆分成两个文件了。

接下下面的代码，就是在上面拆分的时候，把该索引状态更新为不可用状态

public void preSplitAfterPONR(ObserverContext<RegionCoprocessorEnvironment> ctx)
        throws IOException {
    if (st == null || daughterRegions == null) return;
    RegionCoprocessorEnvironment environment = ctx.getEnvironment();
    PhoenixConnection conn = null;
    try {
        conn = QueryUtil.getConnection(ctx.getEnvironment().getConfiguration()).unwrap(
            PhoenixConnection.class);
        MetaDataClient client = new MetaDataClient(conn);
        String userTableName = ctx.getEnvironment().getRegion().getTableDesc().getNameAsString();
        //去phoenix元数据库中拿出该表的所有元数据信息同，包括对应该物理表的索引表的情况
        PTable dataTable = PhoenixRuntime.getTable(conn, userTableName);
        List<PTable> indexes = dataTable.getIndexes();
        for (PTable index : indexes) {
            if (index.getIndexType() == IndexType.LOCAL) {
                //判断如果是本地索引表，就进行生成修改元数据对象
                AlterIndexStatement indexStatement = FACTORY.alterIndex(FACTORY.namedTable(null,
                    org.apache.phoenix.parse.TableName.create(index.getSchemaName().getString(), index.getTableName().getString())),
                    dataTable.getTableName().getString(), false, PIndexState.INACTIVE);
                //通过该phoenix客户端进行元数据修改
                client.alterIndex(indexStatement);
            }
        }
        conn.commit();
    } catch (ClassNotFoundException ex) {
    } catch (SQLException ex) {
    } finally {
        if (conn != null) {
            try {
                conn.close();
            } catch (SQLException ex) {
            }
        }
    }

    HRegionServer rs = (HRegionServer) environment.getRegionServerServices();
    st.stepsAfterPONR(rs, rs, daughterRegions);
}

如上述代码，其它最后是通过phoenix的连接更新到system_catalog的元数据表，进行状态的更新的。

总结，其实索引表的拆分，应该是跟着物理数据表的拆分而触发的。这样保证的数据表和索引表对应的region总是在同一个regionserver上面