Kylin是预计算引擎,如果有N个维度,则会计算出2^N个结果;
示例数据
源表: kylintest
2个维度(dimensions): year,city
1个指标(measure): price
year | city | price |
---|---|---|
1993 | beijing | 10 |
1993 | beijing | 30 |
1994 | shanghai | 20 |
1994 | beijing | 40 |
预聚合表(构建Cube)
维度组合(year,city),预计算会计算2^N = 4个组合:(year,city),(year),(city),();
一个组合有一个Cuboid;
(year,city), 全维度组合的结果称为Base Cuboid;
year | city | sum(price) |
---|---|---|
1993 | beijing | 40 |
1994 | shanghai | 20 |
1994 | beijing | 40 |
1993 | * | 40 |
1994 | * | 60 |
* | beijing | 80 |
* | shanghai | 20 |
* | * | 100 |
相应的SQL:
//# (year,city),Base Cuboid
select year,city,sum(price) from kylintest group by year,city;
//# (year)
select year,sum(price) from kylintest group by year;
//# (city)
select city,sum(price) from kylintest group by city;
//# ()
select sum(price) from kylintest;
维度编码:
依据各列的基数性
维度值 | 编码 |
---|---|
1993 | 0 |
1994 | 1 |
beijing | 0 |
shanghai | 1 |
生成cuboid
cuboid是8位,后几位是维度组合的相应bool值,表示是否有此维度;前几位填充0
维度组合为(yaer,city),两个维度都有,cuboid 为[000000(11)]
维度组合为(year),有一个维度,cuboid 为[000000(10)]
生成HBase Rowkey
Rowkey的格式: Shard ID(2字节) + Cuboid ID(8字节) + 维度编码;
这里为了简便,没有加入Shard ID;
维度 | Rowkey | Value |
---|---|---|
(1993,beijing) | 000000 11 00 | 40 |
(1994,shanghai) | 000000 11 11 | 20 |
(1994,beijing) | 000000 11 10 | 40 |
(1993,*) | 000000 10 0 | 40 |
(1994,*) | 000000 10 1 | 60 |
(*,beijing) | 000000 01 0 | 80 |
(*,shanghai) | 000000 01 1 | 20 |
(,) | 000000 00 | 100 |
SQL查询转化Hbase的Scan操作
假设查询的SQL为
select year, sum(price) from table where city = "beijing" group by year
查询结果为
year | sum(price) |
---|---|
1993 | 40 |
1994 | 40 |
SQL查询维度year和city,对应的cuboid是00000011
city维度的值为"beijing", 对应的HBase rowkey就是:0000001100和0000001110
HBase时就会Scan Rowkey,取到对应指标sum(price)的值,返回给用户
参考:
https://blog.bcmeng.com/post/kylin-cube.html