phoenix 如何优化成使用索引进行查询源码分析

原创 2016年08月31日 16:11:57

phoenix 如何优化成使用索引进行查询源码分析

在 phoenix 生成执行计划时,最后会进行执行计划的优化,在优化过程中,会根据当前的sql中有没有hint的
提示使用索引,或者查询中用到了索引字段,也会自动转换成用索引表进行关联查询。

在下面的方法当中,会根据已有的执行计划,进行优化

QueryOptimizer.getApplicablePlans

//根据当前的数据执行计划,优化成用索引进行数据的查询
private List<QueryPlan> getApplicablePlans(QueryPlan dataPlan, PhoenixStatement statement, List<? extends PDatum> targetColumns, ParallelIteratorFactory parallelIteratorFactory, boolean stopAtBestPlan) throws SQLException {
    SelectStatement select = (SelectStatement)dataPlan.getStatement();
    // Exit early if we have a point lookup as we can't get better than that
    if (!useIndexes 
            || (dataPlan.getContext().getScanRanges().isPointLookup() && stopAtBestPlan)) {
        return Collections.singletonList(dataPlan);
    }
    // For single query tuple projection, indexes are inherited from the original table to the projected
    // table; otherwise not. So we pass projected table here, which is enough to tell if this is from a
    // single query or a part of join query.
    //判断是否使用索引进行处理
    List<PTable>indexes = Lists.newArrayList(dataPlan.getContext().getResolver().getTables().get(0).getTable().getIndexes());
    if (indexes.isEmpty() || dataPlan.isDegenerate() || dataPlan.getTableRef().hasDynamicCols() || select.getHint().hasHint(Hint.NO_INDEX)) {
        return Collections.singletonList(dataPlan);
    }

    // The targetColumns is set for UPSERT SELECT to ensure that the proper type conversion takes place.
    // For a SELECT, it is empty. In this case, we want to set the targetColumns to match the projection
    // from the dataPlan to ensure that the metadata for when an index is used matches the metadata for
    // when the data table is used.
    if (targetColumns.isEmpty()) {
        List<? extends ColumnProjector> projectors = dataPlan.getProjector().getColumnProjectors();
        List<PDatum> targetDatums = Lists.newArrayListWithExpectedSize(projectors.size());
        for (ColumnProjector projector : projectors) {
            targetDatums.add(projector.getExpression());
        }
        targetColumns = targetDatums;
    }
    //在里面转换一下方式
    SelectStatement translatedIndexSelect = IndexStatementRewriter.translate(select, FromCompiler.getResolver(dataPlan.getTableRef()));
    List<QueryPlan> plans = Lists.newArrayListWithExpectedSize(1 + indexes.size());
    plans.add(dataPlan);
    //拿出指定的hint的查询方式
    QueryPlan hintedPlan = getHintedQueryPlan(statement, translatedIndexSelect, indexes, targetColumns, parallelIteratorFactory, plans);
    if (hintedPlan != null) {
        if (stopAtBestPlan) {
            return Collections.singletonList(hintedPlan);
        }
        plans.add(0, hintedPlan);
    }

    for (PTable index : indexes) {
        //当没有指定hint的方式时
        QueryPlan plan = addPlan(statement, translatedIndexSelect, index, targetColumns, parallelIteratorFactory, dataPlan, false);
        if (plan != null) {
            // Query can't possibly return anything so just return this plan.
            if (plan.isDegenerate()) {
                return Collections.singletonList(plan);
            }
            plans.add(plan);
        }
    }

    return hintedPlan == null ? orderPlansBestToWorst(select, plans) : plans;
}

可以看到,上面根据当前是否具有hints提示和物理表是否创建了索引进行执行计划的优化
当使用hints提示时,会调用下面的方法

private static QueryPlan getHintedQueryPlan(PhoenixStatement statement, SelectStatement select, List<PTable> indexes, List<? extends PDatum> targetColumns, ParallelIteratorFactory parallelIteratorFactory, List<QueryPlan> plans) throws SQLException {
    QueryPlan dataPlan = plans.get(0);
    //查询语句中是否有hints
    String indexHint = select.getHint().getHint(Hint.INDEX);
    if (indexHint == null) {
        return null;
    }
    int startIndex = 0;
    String alias = dataPlan.getTableRef().getTableAlias();
    //对hint的解释 (mytable myindex)
    String prefix = HintNode.PREFIX + (alias == null ? dataPlan.getTableRef().getTable().getName().getString() : alias) + HintNode.SEPARATOR;
    while (startIndex < indexHint.length()) {
        startIndex = indexHint.indexOf(prefix, startIndex);
        if (startIndex < 0) {
            return null;
        }
        startIndex += prefix.length();
        boolean done = false; // true when SUFFIX found
        while (startIndex < indexHint.length() && !done) {
            int endIndex;
            int endIndex1 = indexHint.indexOf(HintNode.SEPARATOR, startIndex);
            int endIndex2 = indexHint.indexOf(HintNode.SUFFIX, startIndex);
            if (endIndex1 < 0 && endIndex2 < 0) { // Missing SUFFIX shouldn't happen
                endIndex = indexHint.length();
            } else if (endIndex1 < 0) {
                done = true;
                endIndex = endIndex2;
            } else if (endIndex2 < 0) {
                endIndex = endIndex1;
            } else {
                endIndex = Math.min(endIndex1, endIndex2);
                done = endIndex2 == endIndex;
            }
            String indexName = indexHint.substring(startIndex, endIndex);
            int indexPos = getIndexPosition(indexes, indexName);
            if (indexPos >= 0) {
                // Hinted index is applicable, so return it's index
                PTable index = indexes.get(indexPos);
                indexes.remove(indexPos);
                //对hints解释成功了
                QueryPlan plan = addPlan(statement, select, index, targetColumns, parallelIteratorFactory, dataPlan, true);
                if (plan != null) {
                    return plan;
                }
            }
            startIndex = endIndex + 1;
        }
    }
    return null;
}

上面解释出Sql中的查询的hints的提示,使用指定的索引进行查询,生成执行计划

//增加索引执行计划
private static QueryPlan addPlan(PhoenixStatement statement, SelectStatement select, PTable index, List<? extends PDatum> targetColumns, ParallelIteratorFactory parallelIteratorFactory, QueryPlan dataPlan, boolean isHinted) throws SQLException {
    int nColumns = dataPlan.getProjector().getColumnCount();
    String tableAlias = dataPlan.getTableRef().getTableAlias();
    String alias = tableAlias==null ? null : '"' + tableAlias + '"'; // double quote in case it's case sensitive
    String schemaName = index.getParentSchemaName().getString();
    schemaName = schemaName.length() == 0 ? null :  '"' + schemaName + '"';

    String tableName = '"' + index.getTableName().getString() + '"';
    TableNode table = FACTORY.namedTable(alias, FACTORY.table(schemaName, tableName));
    //生成对新索引表的查询计划
    SelectStatement indexSelect = FACTORY.select(select, table);
    ColumnResolver resolver = FromCompiler.getResolverForQuery(indexSelect, statement.getConnection());
    // We will or will not do tuple projection according to the data plan.
    boolean isProjected = dataPlan.getContext().getResolver().getTables().get(0).getTable().getType() == PTableType.PROJECTED;
    // Check index state of now potentially updated index table to make sure it's active
    //当前索引状态是正常的
    if (PIndexState.ACTIVE.equals(resolver.getTables().get(0).getTable().getIndexState())) {
        try {
            // translate nodes that match expressions that are indexed to the associated column parse node
            indexSelect = ParseNodeRewriter.rewrite(indexSelect, new  IndexExpressionParseNodeRewriter(index, statement.getConnection(), indexSelect.getUdfParseNodes()));
            QueryCompiler compiler = new QueryCompiler(statement, indexSelect, resolver, targetColumns, parallelIteratorFactory, dataPlan.getContext().getSequenceManager(), isProjected);
            //生成新的查询计划
            QueryPlan plan = compiler.compile();
            // If query doesn't have where clause and some of columns to project are missing
            // in the index then we need to get missing columns from main table for each row in
            // local index. It's like full scan of both local index and data table which is inefficient.
            // Then we don't use the index. If all the columns to project are present in the index 
            // then we can use the index even the query doesn't have where clause. 
            if (index.getIndexType() == IndexType.LOCAL && indexSelect.getWhere() == null
                    && !plan.getContext().getDataColumns().isEmpty()) {
                return null;
            }
            // Checking number of columns handles the wildcard cases correctly, as in that case the index
            // must contain all columns from the data table to be able to be used.
            //用到的索引是活动状态,并且要查询的字段全在索引表,就返回
            if (plan.getTableRef().getTable().getIndexState() == PIndexState.ACTIVE) {
                if (plan.getProjector().getColumnCount() == nColumns) {
                    //最后生成新的索引表的查询计划是可以的
                    return plan;
                } else if (index.getIndexType() == IndexType.GLOBAL) {
                    //当依赖于全局索引表,但是全局表中又找不到要查询的字段,就出错,到下面通过子查询转换进行查询
                    throw new ColumnNotFoundException("*");
                }
            }
        } catch (ColumnNotFoundException e) {
            /* Means that a column is being used that's not in our index.
             * Since we currently don't keep stats, we don't know the selectivity of the index.
             * For now, if this is a hinted plan, we will try rewriting the query as a subquery;
             * otherwise we just don't use this index (as opposed to trying to join back from
             * the index table to the data table.
             */
            //当查询的字段不在索引表时,要关联到数据表进行数据的查询,通过子查询的方式进行数据的查询
            SelectStatement dataSelect = (SelectStatement)dataPlan.getStatement();
            ParseNode where = dataSelect.getWhere();
            if (isHinted && where != null) {
                StatementContext context = new StatementContext(statement, resolver);
                WhereConditionRewriter whereRewriter = new WhereConditionRewriter(FromCompiler.getResolver(dataPlan.getTableRef()), context);
                where = where.accept(whereRewriter);
                if (where != null) {
                    //这个是原始的数据表
                    PTable dataTable = dataPlan.getTableRef().getTable();
                    List<PColumn> pkColumns = dataTable.getPKColumns();
                    //要进行索引查询的字段
                    List<AliasedNode> aliasedNodes = Lists.<AliasedNode>newArrayListWithExpectedSize(pkColumns.size());
                    List<ParseNode> nodes = Lists.<ParseNode>newArrayListWithExpectedSize(pkColumns.size());
                    boolean isSalted = dataTable.getBucketNum() != null;
                    boolean isTenantSpecific = dataTable.isMultiTenant() && statement.getConnection().getTenantId() != null;
                    int posOffset = (isSalted ? 1 : 0) + (isTenantSpecific ? 1 : 0);
                    for (int i = posOffset; i < pkColumns.size(); i++) {
                        PColumn column = pkColumns.get(i);
                        String indexColName = IndexUtil.getIndexColumnName(column);
                        ParseNode indexColNode = new ColumnParseNode(null, '"' + indexColName + '"', indexColName);
                        PDataType indexColType = IndexUtil.getIndexColumnDataType(column);
                        PDataType dataColType = column.getDataType();
                        if (indexColType != dataColType) {
                            indexColNode = FACTORY.cast(indexColNode, dataColType, null, null);
                        }
                        aliasedNodes.add(FACTORY.aliasedNode(null, indexColNode));
                        nodes.add(new ColumnParseNode(null, '"' + column.getName().getString() + '"'));
                    }
                    //生成一个内部子查询
                    SelectStatement innerSelect = FACTORY.select(indexSelect.getFrom(), indexSelect.getHint(), false, aliasedNodes, where, null, null, null, null, indexSelect.getBindCount(), false, indexSelect.hasSequence(), Collections.<SelectStatement>emptyList(), indexSelect.getUdfParseNodes());
                    ParseNode outerWhere = FACTORY.in(nodes.size() == 1 ? nodes.get(0) : FACTORY.rowValueConstructor(nodes), FACTORY.subquery(innerSelect, false), false, true);
                    ParseNode extractedCondition = whereRewriter.getExtractedCondition();
                    if (extractedCondition != null) {
                        outerWhere = FACTORY.and(Lists.newArrayList(outerWhere, extractedCondition));
                    }
                    HintNode hint = HintNode.combine(HintNode.subtract(indexSelect.getHint(), new Hint[] {Hint.INDEX, Hint.NO_CHILD_PARENT_JOIN_OPTIMIZATION}), FACTORY.hint("NO_INDEX"));
                    SelectStatement query = FACTORY.select(dataSelect, hint, outerWhere);
                    ColumnResolver queryResolver = FromCompiler.getResolverForQuery(query, statement.getConnection());
                    //生成索引的子查询
                    query = SubqueryRewriter.transform(query, queryResolver, statement.getConnection());
                    queryResolver = FromCompiler.getResolverForQuery(query, statement.getConnection());
                    query = StatementNormalizer.normalize(query, queryResolver);
                    QueryPlan plan = new QueryCompiler(statement, query, queryResolver, targetColumns, parallelIteratorFactory, dataPlan.getContext().getSequenceManager(), isProjected).compile();
                    return plan;
                }
            }
        }
    }
    return null;
}

假如没有进行hints的提示,但是物理表建立了索引,则下面根据查询的物理表的字段中是否全部在索引表当中,如果是
则直接查询索引表返回数据,否则使用索引表作为子查询的方式,进行关联查询,从而使用索引表进行数据的快速查询

总结

  1. 根据当前是否具有hints提示,通过hints提示中使用的索引表进行查询
  2. 根据物理表建立的索引,当查询字段全在索引表时,使用索引表进行直接查询,否则生成子查询对索引表进行查询

相关文章推荐

Apache Phoenix基本操作(2)

1.      如何映射一个Phoenix的表到一个HBase的表?你可以通过Create table/create view DDL语句在一个已经存在的HBase表上创建一个Phoenix表或者视图...

Phoenix 索引初探

创建一个表: 0: jdbc:phoenix:localhost> create table test ( id varchar not null primary key, cf1.a varch...

Phoenix使用指南

1 使用概述        Phoenix是基于HBase的SQL中间件产品,由Salesforce.com公司开源并托管于Github上。对于熟悉关系型数据库的开发人员来说,通过Phoenix可以...

phoenix 写二级索引的触发机制

phoenix 写二级索引的触发机制查看org.apache.phoenix.hbase.index.Indexer 类当中 在该类中覆盖了 preBatchMutate方法,实现预先写数据到hba...

phoenix select 查询源码分析

phoenix select 查询源码分析如下执行一个Sql的查询语句ResultSet rst = conn.createStatement().executeQuery("select * fro...

Phoenix(十一)二级索引之— —性能优化篇

本篇主要介绍phoenix二级索引调优

Phoenix Tips (4) 提升效率的方法

1、加盐: 2、使用多个列族 3、

Apache Phoenix的Join操作和优化

估计Phoenix中支持Joins,对很多使用HBase的朋友来说,还是比较好的。下面我们就来演示一下。首先看一下几张表的数据:Orders表: OrderID CustomerID Item...

phoenix预创建表拆分key的源码分析

phoenix预创建表拆分key的源码分析在 ConnectionQueryServicesImpl.createTableInternal 方法当中,要分析表是否需要预先分区, if (pare...

phoenix select 查询源码分析

phoenix select 查询源码分析如下执行一个Sql的查询语句ResultSet rst = conn.createStatement().executeQuery("select * fro...
内容举报
返回顶部
收藏助手
不良信息举报
您举报文章:phoenix 如何优化成使用索引进行查询源码分析
举报原因:
原因补充:

(最多只允许输入30个字)