kylin query原理剖析_[badquerydetector] service.badquerydetector:148 : -CSDN博客

本文深入剖析了Apache Kylin的查询入口、源码实现，特别是Realization选择策略。通过成本计算规则（基于dimension、measure、jointable数量的加权求和）来确定查询的Cube。理解这一过程有助于优化数据建模，避免不必要的dimension和jointable操作，提高查询效率。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

前言

最近我们组负责数据建模的同学抱怨kylin的relization选择策略：同一个project下一条查询语句本来期望命中某一个cube的，结果系统却选择了其他cube。之前也有大概翻阅过kylin这块的实现源码，知道如果同一个project下如果有多个满足条件的的实现，会按照成本排序并选择成本最低的那个实现。对于成本这块的度量标准，没有做过多研究，于是带着问题，对这块源码进行了一次梳理。

源码剖析

为使博文简洁相关实现只贴部分核心代码，以下所指的Realization对应于构建好的Cube。

查询入口

QueryService.doQueryWithCache()

 //kylin.query.cache-enabled是否开启，如果开启将会从cache里面去读结果

 if (queryCacheEnabled) {

        sqlResponse = searchQueryInCache(sqlRequest);

 }

 try {

    if (null == sqlResponse) {

        if (isSelect) {

            //查询入口

            sqlResponse = query(sqlRequest);

        } else if (kylinConfig.isPushDownEnabled() && kylinConfig.isPushDownUpdateEnabled()) {

            //如果开启了pushDown的话允许非查询的sql，如update

            sqlResponse = update(sqlRequest);

        } else {    

            logger.debug("Directly return exception as the sql is unsupported, and query pushdown is disabled");

                       throw new BadRequestException(msg.getNOT_SUPPORTED_SQL());

                }

     ...

 catch(){

     ...

 }

这里，我们忽略从缓存中查找（searchQueryInCache），以及非select查询的情况，单单从一次正常的查询进行分析，进入query方法。

QueryService.query()

query方法相对来说比较简单，记录了query开始和结束的信息，相当于做了一个切面的工作

   public SQLResponse query(SQLRequest sqlRequest) throws Exception {

       SQLResponse ret = null;

       try {

           final String user = SecurityContextHolder.getContext().getAuthentication().getName();

           badQueryDetector.queryStart(Thread.currentThread(), sqlRequest, user);

           ret = queryWithSqlMassage(sqlRequest);

           return ret;

       } finally {

           String badReason = (ret != null && ret.isPushDown()) ? BadQueryEntry.ADJ_PUSHDOWN : null;

           badQueryDetector.queryEnd(Thread.currentThread(), badReason);

       }

   }

其中badQueryDetector是一个单起的线程，用来统计和监测bad query的。当有bad query时notify相关的观察者，做一些操作，如打印日志，记录bad query等。kylin 中很多事件的通知都是通过生产者消费者模式订阅发布的。继续进入queryWithSqlMessage()