示例数据参见: kylin 用实例说明原理
环境
- centos 6.5
- CDH 5.15
- Apache Kylin2.1.0
hive新表原始表
create table if not exists chenzl.kylintest (
year int,
city string,
price int
)
row format delimited
fileds terminated by '|'
lines terminated by '\n'
stored as textfile;
路径为
/user/hive/warehouse/chenzl.db/kylintest
数据文件
$ vi kylintest.txt
1993|beijing|10
1993|beijing|30
1994|shanghai|20
1994|beijing|40
上传到hdfs
sudo -u hdfs hadoop fs -put kylintest.txt /user/hive/warehouse/chenzl.db/kylintest/
hive查询
$ select * from chenzl.kylintest
year city price
1993 beijing 10
1993 beijing 30
1994 shanghai 20
1994 beijing 40
kylin操作
刷新元数据: System->Reload Metadata
新建项目: Model->(+)Add Project->test
新建数据源: Model->Data Source->(↓) Load Table
Table Names: chenzl.kylintest
创建模型
新建模型: Model->(+ New)->New Model
Model Info
Model Name: M_test
Data Model
Fact Table: chenzl.kylintest
Dimensions
Columns: year,city
Measures
Columns: price
一直next,然后save
创建Cube
新建cube: Model->(+ New)->New Cube
Cube Info
Model Name: M_test
Cube Name: C_test
Dimensions
Add Dimensins->Select All
Measures
(+)Measure
Name: sum(price)
Expression: sum
Param Type: column
Param Value: kylintest.price
Refresh Setting 跳过
Advanced Setting
Aggregation Groups 部分
Includes kylintest.year,kylintest.city
Advanced ColumnFamily 部分
F1 __COUNT_,sum(price)
然后next, 最后Save
构建Cube
构建cube: C_test->Actions->build
Monitor-> 查看, (>)可以查看构建过程
构建过程
- Create Intermediate Flat Hive Table
- Redistribute Flat Hive Table
- Extract Fact Table Distinct Columns
- Build Dimension Dictionary
- Save Cuboid Statistics
- Create HTable
- Build Base Cuboid
- Build N-Dimension Cuboid : level 1
- Build Cube In-Mem
- Convert Cuboid Data to HFile
- Load HFile to HBase Table
- Update Cube Info
- Hive Cleanup
- Garbage Collection on HDFS
在第11步,可以看到生成的hbase的表名,如"KYLIN_CBSVR3S7FK"
查询
构建完,点击Insight,查询cube
select "YEAR", sum(price) from kylintest where city = 'beijing' group by "YEAR"
结果要跟在原表上查询的一样;