phoenix 本地索引分裂源码分析
phoenix的表有时要建立二级索引,这里可以分为local和global,下面分析一下当数据表进行分拆时,对应的该本地索引表
的拆分情况。(本地索引表和数据表对应的数据区间保证存放在同一个regionserver中)。
LocalIndexSplitter 类分析
public void preSplitBeforePONR(ObserverContext<RegionCoprocessorEnvironment> ctx,
byte[] splitKey, List<Mutation> metaEntries) throws IOException {
RegionCoprocessorEnvironment environment = ctx.getEnvironment();
HTableDescriptor tableDesc = ctx.getEnvironment().getRegion().getTableDesc();
if (SchemaUtil.isSystemTable(tableDesc.getName())) {
return;
}
final RegionServerServices rss = ctx.getEnvironment().getRegionServerServices();
//如果没有本地索索属性标识
if (tableDesc.getValue(MetaDataUtil.IS_LOCAL_INDEX_TABLE_PROP_BYTES) == null
|| !Boolean.TRUE.equals(PBoolean.INSTANCE.toObject(tableDesc
.getValue(MetaDataUtil.IS_LOCAL_INDEX_TABLE_PROP_BYTES)))) {
TableName indexTable =
TableName.valueOf(MetaDataUtil.getLocalIndexPhysicalName(tableDesc.getName()));
//如果索引表不存在就直接返回
if (!MetaTableAccessor.tableExists(rss.getConnection(), indexTable)) return;
//拿到当前region对应的索引表的region
Region indexRegion = IndexUtil.getIndexRegion(environment);
if (indexRegion == null) {
LOG.warn("Index region corresponindg to data region " + environment.getRegion()
+ " not in the same server. So skipping the split.");
ctx.bypass();
return;
}
// FIXME: Uses private type
try {
int encodedVersion = VersionUtil.encodeVersion(environment.getHBaseVersion());
if(encodedVersion >= SPLIT_TXN_MINIMUM_SUPPORTED_VERSION) {
//对该索引表的region也进行了同一个splitKey的拆分
st = new SplitTransactionImpl(indexRegion, splitKey);
st.useZKForAssignment =
environment.getConfiguration().getBoolean("hbase.assignment.usezk",
true);
} else {
st = new IndexSplitTransaction(indexRegion, splitKey);
}
if (!st.prepare()) {
LOG.error("Prepare for the table " + indexRegion.getTableDesc().getNameAsString()
+ " failed. So returning null. ");
ctx.bypass();
return;
}
//在这里进行强行拆分了
((HRegion)indexRegion).forceSplit(splitKey);
User.runAsLoginUser(new PrivilegedExceptionAction<Void>() {
@Override
public Void run() throws Exception {
//在里面对传进去的 splitKey 进行后台线程的拆分。返回的first 和second就是两个新的region对象
daughterRegions = st.stepsBeforePONR(rss, rss, false);
return null;
}
});
HRegionInfo copyOfParent = new HRegionInfo(indexRegion.getRegionInfo());
copyOfParent.setOffline(true);
copyOfParent.setSplit(true);
// Put for parent
Put putParent = MetaTableAccessor.makePutFromRegionInfo(copyOfParent);
MetaTableAccessor.addDaughtersToPut(putParent,
daughterRegions.getFirst().getRegionInfo(),
daughterRegions.getSecond().getRegionInfo());
metaEntries.add(putParent);
// Puts for daughters
Put putA = MetaTableAccessor.makePutFromRegionInfo(
daughterRegions.getFirst().getRegionInfo());
Put putB = MetaTableAccessor.makePutFromRegionInfo(
daughterRegions.getSecond().getRegionInfo());
st.addLocation(putA, rss.getServerName(), 1);
st.addLocation(putB, rss.getServerName(), 1);
metaEntries.add(putA);
metaEntries.add(putB);
} catch (Exception e) {
ctx.bypass();
LOG.warn("index region splitting failed with the exception ", e);
if (st != null){
st.rollback(rss, rss);
st = null;
daughterRegions = null;
}
}
}
}
通过上面代码可以看到,首先,就是判断一个这个表是否是本地的索引表(上面感觉有点问题,当本地索引属性为否时,才进行下面的逻辑运行,是BUG ?)。
然后查看该物理表对应的索引表是那一个,拿到当前region对应的索引region(目前是在协处理器中进行处理的,是对应于多个region的并行处理)。
然后根据传进来的拆分key splitKey 创建了 SplitTransactionImpl对象,该对象专门给一个region进行拆分的。然后下面就强行调用了forceSplit
方法进行拆分,最后调用 stepsBeforePONR 方法进行等待后台拆分完成。
可以分析一下SplitTransactionImpl源码
public boolean prepare() throws IOException {
if (!this.parent.isSplittable()) return false;
// Split key can be null if this region is unsplittable; i.e. has refs.
if (this.splitrow == null) return false;
HRegionInfo hri = this.parent.getRegionInfo();
parent.prepareToSplit();
// Check splitrow.
byte [] startKey = hri.getStartKey();
byte [] endKey = hri.getEndKey();
if (Bytes.equals(startKey, splitrow) ||
!this.parent.getRegionInfo().containsRow(splitrow)) {
LOG.info("Split row is not inside region key range or is equal to " +
"startkey: " + Bytes.toStringBinary(this.splitrow));
return false;
}
long rid = getDaughterRegionIdTimestamp(hri);
this.hri_a = new HRegionInfo(hri.getTable(), startKey, this.splitrow, false, rid);
this.hri_b = new HRegionInfo(hri.getTable(), this.splitrow, endKey, false, rid);
transition(SplitTransactionPhase.PREPARED);
return true;
}
可以看到上面方法,对传进来的splitrow对象,在当前的region中进行了创建两个region的拆分中间键。
在下面的 SplitTransactionImpl.stepsBeforePONR 方法中
//拿到数据文件集
hstoreFilesToSplit = this.parent.close(false);
// TODO: If splitStoreFiles were multithreaded would we complete steps in
// less elapsed time? St.Ack 20100920
//
// splitStoreFiles creates daughter region dirs under the parent splits dir
// Nothing to unroll here if failure -- clean up of CREATE_SPLIT_DIR will
// clean this up.
Pair<Integer, Integer> expectedReferences = splitStoreFiles(hstoreFilesToSplit);
// Log to the journal that we are creating region A, the first daughter
// region. We could fail halfway through. If we do, we could have left
// stuff in fs that needs cleanup -- a storefile or two. Thats why we
// add entry to journal BEFORE rather than AFTER the change.
transition(SplitTransactionPhase.STARTED_REGION_A_CREATION);
assertReferenceFileCount(expectedReferences.getFirst(),
this.parent.getRegionFileSystem().getSplitsDir(this.hri_a));
Region a = this.parent.createDaughterRegionFromSplits(this.hri_a);
assertReferenceFileCount(expectedReferences.getFirst(),
new Path(this.parent.getRegionFileSystem().getTableDir(), this.hri_a.getEncodedName()));
// Ditto
transition(SplitTransactionPhase.STARTED_REGION_B_CREATION);
assertReferenceFileCount(expectedReferences.getSecond(),
this.parent.getRegionFileSystem().getSplitsDir(this.hri_b));
Region b = this.parent.createDaughterRegionFromSplits(this.hri_b);
assertReferenceFileCount(expectedReferences.getSecond(),
new Path(this.parent.getRegionFileSystem().getTableDir(), this.hri_b.getEncodedName()));
return new PairOfSameType<Region>(a, b);
上对面该region对应的hstoreFilesToSplit 开始进行拆分
private Pair<Integer, Integer> splitStoreFiles(
final Map<byte[], List<StoreFile>> hstoreFilesToSplit)
throws IOException {
if (hstoreFilesToSplit == null) {
// Could be null because close didn't succeed -- for now consider it fatal
throw new IOException("Close returned empty list of StoreFiles");
}
// The following code sets up a thread pool executor with as many slots as
// there's files to split. It then fires up everything, waits for
// completion and finally checks for any exception
int nbFiles = 0;
for (Map.Entry<byte[], List<StoreFile>> entry: hstoreFilesToSplit.entrySet()) {
nbFiles += entry.getValue().size();
}
if (nbFiles == 0) {
// no file needs to be splitted.
return new Pair<Integer, Integer>(0,0);
}
// Default max #threads to use is the smaller of table's configured number of blocking store
// files or the available number of logical cores.
int defMaxThreads = Math.min(parent.conf.getInt(HStore.BLOCKING_STOREFILES_KEY,
HStore.DEFAULT_BLOCKING_STOREFILE_COUNT),
Runtime.getRuntime().availableProcessors());
// Max #threads is the smaller of the number of storefiles or the default max determined above.
int maxThreads = Math.min(parent.conf.getInt(HConstants.REGION_SPLIT_THREADS_MAX,
defMaxThreads), nbFiles);
LOG.info("Preparing to split " + nbFiles + " storefiles for region " + this.parent +
" using " + maxThreads + " threads");
ThreadFactoryBuilder builder = new ThreadFactoryBuilder();
builder.setNameFormat("StoreFileSplitter-%1$d");
ThreadFactory factory = builder.build();
ThreadPoolExecutor threadPool =
(ThreadPoolExecutor) Executors.newFixedThreadPool(maxThreads, factory);
List<Future<Pair<Path,Path>>> futures = new ArrayList<Future<Pair<Path,Path>>> (nbFiles);
// Split each store file.
for (Map.Entry<byte[], List<StoreFile>> entry: hstoreFilesToSplit.entrySet()) {
for (StoreFile sf: entry.getValue()) {
StoreFileSplitter sfs = new StoreFileSplitter(entry.getKey(), sf);
futures.add(threadPool.submit(sfs));
}
}
// Shutdown the pool
threadPool.shutdown();
// Wait for all the tasks to finish
try {
boolean stillRunning = !threadPool.awaitTermination(
this.fileSplitTimeout, TimeUnit.MILLISECONDS);
if (stillRunning) {
threadPool.shutdownNow();
// wait for the thread to shutdown completely.
while (!threadPool.isTerminated()) {
Thread.sleep(50);
}
throw new IOException("Took too long to split the" +
" files and create the references, aborting split");
}
} catch (InterruptedException e) {
throw (InterruptedIOException)new InterruptedIOException().initCause(e);
}
int created_a = 0;
int created_b = 0;
// Look for any exception
for (Future<Pair<Path, Path>> future : futures) {
try {
Pair<Path, Path> p = future.get();
created_a += p.getFirst() != null ? 1 : 0;
created_b += p.getSecond() != null ? 1 : 0;
} catch (InterruptedException e) {
throw (InterruptedIOException) new InterruptedIOException().initCause(e);
} catch (ExecutionException e) {
throw new IOException(e);
}
}
if (LOG.isDebugEnabled()) {
LOG.debug("Split storefiles for region " + this.parent + " Daughter A: " + created_a
+ " storefiles, Daughter B: " + created_b + " storefiles.");
}
return new Pair<Integer, Integer>(created_a, created_b);
}
该方法就是对数据文件进行了拆分
private Pair<Path, Path> splitStoreFile(final byte[] family, final StoreFile sf)
throws IOException {
if (LOG.isDebugEnabled()) {
LOG.debug("Splitting started for store file: " + sf.getPath() + " for region: " +
this.parent);
}
HRegionFileSystem fs = this.parent.getRegionFileSystem();
String familyName = Bytes.toString(family);
Path path_a =
fs.splitStoreFile(this.hri_a, familyName, sf, this.splitrow, false,
this.parent.getSplitPolicy());
Path path_b =
fs.splitStoreFile(this.hri_b, familyName, sf, this.splitrow, true,
this.parent.getSplitPolicy());
if (LOG.isDebugEnabled()) {
LOG.debug("Splitting complete for store file: " + sf.getPath() + " for region: " +
this.parent);
}
return new Pair<Path,Path>(path_a, path_b);
}
可以看到上面的方法真正对数据文件进行了拆分了
Put putA = MetaTableAccessor.makePutFromRegionInfo(
daughterRegions.getFirst().getRegionInfo());
Put putB = MetaTableAccessor.makePutFromRegionInfo(
daughterRegions.getSecond().getRegionInfo());
st.addLocation(putA, rss.getServerName(), 1);
st.addLocation(putB, rss.getServerName(), 1);
metaEntries.add(putA);
metaEntries.add(putB);
最后该索引文件就根据传进来的splitKey进行拆分成两个文件了。
接下下面的代码,就是在上面拆分的时候,把该索引状态更新为不可用状态
public void preSplitAfterPONR(ObserverContext<RegionCoprocessorEnvironment> ctx)
throws IOException {
if (st == null || daughterRegions == null) return;
RegionCoprocessorEnvironment environment = ctx.getEnvironment();
PhoenixConnection conn = null;
try {
conn = QueryUtil.getConnection(ctx.getEnvironment().getConfiguration()).unwrap(
PhoenixConnection.class);
MetaDataClient client = new MetaDataClient(conn);
String userTableName = ctx.getEnvironment().getRegion().getTableDesc().getNameAsString();
//去phoenix元数据库中拿出该表的所有元数据信息同,包括对应该物理表的索引表的情况
PTable dataTable = PhoenixRuntime.getTable(conn, userTableName);
List<PTable> indexes = dataTable.getIndexes();
for (PTable index : indexes) {
if (index.getIndexType() == IndexType.LOCAL) {
//判断如果是本地索引表,就进行生成修改元数据对象
AlterIndexStatement indexStatement = FACTORY.alterIndex(FACTORY.namedTable(null,
org.apache.phoenix.parse.TableName.create(index.getSchemaName().getString(), index.getTableName().getString())),
dataTable.getTableName().getString(), false, PIndexState.INACTIVE);
//通过该phoenix客户端进行元数据修改
client.alterIndex(indexStatement);
}
}
conn.commit();
} catch (ClassNotFoundException ex) {
} catch (SQLException ex) {
} finally {
if (conn != null) {
try {
conn.close();
} catch (SQLException ex) {
}
}
}
HRegionServer rs = (HRegionServer) environment.getRegionServerServices();
st.stepsAfterPONR(rs, rs, daughterRegions);
}
如上述代码,其它最后是通过phoenix的连接更新到system_catalog的元数据表,进行状态的更新的。
总结,其实索引表的拆分,应该是跟着物理数据表的拆分而触发的。这样保证的数据表和索引表对应的region总是在同一个regionserver上面