Kyin 基础使用

Kyin

1. 背景

传统数仓架构,实际上只支持垂直扩展
hadoop与BI平台衔接不成熟,无法提供高效得交互式查询
在这个背景下,eBay 2013年 BI on Hadoop

核心设计理念:
Hive SparkSQL SQL等 sql on hadoop 框架 大规模得并行处理和列式存储
对于大多数得数据表来说 维度和指标 基本上都可以确定
预计算得概念 把数据计算完 存起来= 查询
join group by = select = 能够节省一些计算

2. 技术架构

数据源: 实时 kafka/ 离线 hadoop hive
核心模块
cube 构建引擎 MR / spark
元数据管理模块/工具 : cube 构建作业 存哪里? HBase上
路由模块
kylin sql == hbase
问题:定义cube 选择维度 度量

3. 核心概念

维度 : 数据分析中最基本得两个概念
维度:审视数据得一个角度,记录数据的一个属性: 时间 地点
度量:基于数据计算出来的一个具体得数值
cube:n维,对于每一种维度得组合,都会去做聚合运算
n种
cuboid:用来组成cube
segment:在kylin当中,设计完cube之后,需要去构建
每次构建得时候,都会去选择对应得时间范围 1周得数据 segment = 一个分区表

4. 安装

前置准备:
1.nodejs安装
$> wget https://nodejs.org/dist/v10.9.0/node-v10.9.0-linux-x64.tar.xz
$> tar xf node-v10.9.0-linux-x64.tar.xz
$> ln -s /usr/software/nodejs/bin/npm /usr/local/bin/
$> ln -s /usr/software/nodejs/bin/node /usr/local/bin/
$> node -v ##查看版本

2.bower安装
$> npm install -g bower

3.phantomjs安装
手动安装部署phantomjs即可,并配置环境变量,下载地址:
https://github.com/Medium/phantomjs/releases/download/v1.9.19/phantomjs-1.9.8-linux-x86_64.tar.bz2

4.pom.xml文件修改 cdh5.7 ==> 对应所要编译的cdh版本
$> sed -i “s/2.6.0-cdh5.7.0/2.6.0-cdh5.16.2/g” grep "cdh" -rl pom.xml
$> sed -i “s/1.1.0-cdh5.7.0/1.1.0-cdh5.16.2/g” grep "cdh" -rl pom.xml
$> sed -i “s/1.2.0-cdh5.7.0/1.2.0-cdh5.16.2/g” grep "cdh" -rl pom.xml
$> sed -i “s/3.4.5-cdh5.7.0/3.4.5-cdh5.16.2/g” grep "cdh" -rl pom.xml
$> sed -i “s/cdh5.7/cdh5.16/g” grep "cdh" -rl pom.xml

5.关闭DocLint特性[可选]
Java 8 新增了DocLint特性,这个特性主要是在开发阶段生产javadoc文档之前就检查Javadoc注释的错误,并且链接到源代码;如果javadoc的注释有错误,不生产javadoc
-Xdoclint:none

6.编译报错问题排查
使用build/script/package.sh -DskipTests -Pcdh5.16 -Papache-release -Dcheckstyle.skip进行编译一直报错:
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-gpg-plugin:1.6:sign (sign-release-artifacts) on project kylin: Exit code: 2 -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
使用build/script/package.sh -DskipTests -Pcdh5.16这个编译则没有

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-remote-resources-plugin:1.5:process (process-resource-bundles) on project kylin-engine-spark: Failed to resolve dependencies for one or more projects in the reactor. Reason: No versions are present in the repository for the artifact with a range [1.8,2.0)
[ERROR] commons-codec:commons-codec:jar:null
[ERROR]
[ERROR] from the specified remote repositories:
[ERROR] central (http://repo.maven.apache.org/maven2, releases=true, snapshots=false),
[ERROR] conjars (http://conjars.org/repo/, releases=true, snapshots=true),
[ERROR] cloudera (https://repository.cloudera.com/artifactory/cloudera-repos/, releases=true, snapshots=true),
[ERROR] shibboleth (https://build.shibboleth.net/nexus/content/repositories/releases/, releases=true, snapshots=true),
[ERROR] nexus (http://repository.kyligence.io:8081/repository/maven-public/, releases=true, snapshots=true),
[ERROR] apache.snapshots (https://repository.apache.org/snapshots, releases=false, snapshots=true),
[ERROR] sonatype-nexus-snapshots (https://oss.sonatype.org/content/repositories/snapshots, releases=false, snapshots=true)
[ERROR] Path to dependency:
[ERROR] 1) org.apache.kylin:kylin-engine-spark:jar:2.6.0
[ERROR] 2) org.apache.spark:spark-core_2.11:jar:2.3.2
[ERROR] 3) net.java.dev.jets3t:jets3t:jar:0.9.4
[ERROR] -> [Help 1]
原因:无法将依赖下载下来,maven仓库中没有找到对应的文件,setting.xml中添加阿里云的maven仓库

[ERROR] Failed to execute goal net.alchim31.maven:scala-maven-plugin:4.4.0:compile (scala-compile-first) on project kylin-engine-spark: Execution scala-compile-first of goal net.alchim31.maven:scala-maven-plugin:4.4.0:compile failed. CompileFailed -> [Help 1]
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal net.alchim31.maven:scala-maven-plugin:4.4.0:compile (scala-compile-first) on project kylin-engine-spark: Execution scala-compile-first of goal net.alchim31.maven:scala-maven-plugin:4.4.0:compile failed.
at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:212)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116)
at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80)
at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51)
at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)
at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:307)
at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:193)
at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:106)
at org.apache.maven.cli.MavenCli.execute(MavenCli.java:863)
at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:288)
at org.apache.maven.cli.MavenCli.main(MavenCli.java:199)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
Caused by: org.apache.maven.plugin.PluginExecutionException: Execution scala-compile-first of goal net.alchim31.maven:scala-maven-plugin:4.4.0:compile failed.
原因:在pom.xml中找到net.alchim31.maven,并取消注释(在发布时取消注释),使用的是3.4.1版本

net.alchim31.maven
scala-maven-plugin

3.4.1

修正后继续编译,仍然报错:
Failed to execute goal net.alchim31.maven:scala-maven-plugin:3.4.1:add-source (scala-compile-first) on project kylin-engine-spark: The plugin net.alchim31.maven:scala-maven-plugin:3.4.1 requires Maven version 3.5.3
需要将maven版本改为3.5.3

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.5.1:testCompile (default-testCompile) on project kylin-source-jdbc: Compilation failure
[ERROR] /opt/source/apache-kylin-2.6.0/source-jdbc/src/test/java/org/apache/kylin/source/jdbc/JdbcExplorerTest.java:[89,26] error: incompatible types: inferred type does not conform to upper bound(s)
[ERROR]
[ERROR] -> [Help 1]
原因:
使用错误命令build/script/package.sh -DskipTests -Dcheckstyle.skip -Pcdh5.16
-Dmaven.test.skip=true 不但跳过单元测试的运行,也跳过测试代码的编译
-DskipTests 跳过单元测试,但是会继续编译
应该使用build/script/package.sh -Dmaven.test.skip=true -Dcheckstyle.skip -Pcdh5.16

[ERROR] Failed to execute goal on project kylin-server-base: Could not resolve dependencies for project org.apache.kylin:kylin-server-base:jar:2.6.0: Failed to collect dependencies at org.springframework.security.extensions:spring-security-saml2-core:jar:1.0.2.RELEASE -> org.opensaml:opensaml:jar:2.6.6: Failed to read artifact descriptor for org.opensaml:opensaml:jar:2.6.6: Could not transfer artifact org.opensaml:opensaml:pom:2.6.6 from/to spring-snapshots (http://repo.spring.io/libs-snapshot): Access denied to: http://repo.spring.io/libs-snapshot/org/opensaml/opensaml/2.6.6/opensaml-2.6.6.pom , ReasonPhrase:Forbidden. -> [Help 1]

[ERROR] Failed to execute goal on project kylin-server-base: Could not resolve dependencies for project org.apache.kylin:kylin-server-base:jar:2.6.0: Failed to collect dependencies at org.springframework.security.extensions:spring-security-saml2-core:jar:1.0.2.RELEASE -> org.opensaml:opensaml:jar:2.6.6: Failed to read artifact descriptor for org.opensaml:opensaml:jar:2.6.6: Could not transfer artifact net.shibboleth:parent-v2:pom:4 from/to spring-snapshots (http://repo.spring.io/libs-snapshot): Access denied to: http://repo.spring.io/libs-snapshot/net/shibboleth/parent-v2/4/parent-v2-4.pom , ReasonPhrase:Forbidden. -> [Help 1]

[ERROR] Failed to execute goal on project kylin-server-base: Could not resolve dependencies for project org.apache.kylin:kylin-server-base:jar:2.6.0: Failed to collect dependencies at org.springframework.security.extensions:spring-security-saml2-core:jar:1.0.2.RELEASE -> org.opensaml:opensaml:jar:2.6.6 -> org.opensaml:openws:jar:1.5.6: Failed to read artifact descriptor for org.opensaml:openws:jar:1.5.6: Could not transfer artifact org.opensaml:openws:pom:1.5.6 from/to spring-snapshots (http://repo.spring.io/libs-snapshot): Access denied to: http://repo.spring.io/libs-snapshot/org/opensaml/openws/1.5.6/openws-1.5.6.pom , ReasonPhrase:Forbidden. -> [Help 1]

[ERROR] Failed to execute goal on project kylin-server-base: Could not resolve dependencies for project org.apache.kylin:kylin-server-base:jar:2.6.0: Failed to collect dependencies at org.springframework.security.extensions:spring-security-saml2-core:jar:1.0.2.RELEASE -> org.opensaml:opensaml:jar:2.6.6 -> org.opensaml:openws:jar:1.5.6 -> org.opensaml:xmltooling:jar:1.4.6: Failed to read artifact descriptor for org.opensaml:xmltooling:jar:1.4.6: Could not transfer artifact org.opensaml:xmltooling:pom:1.4.6 from/to spring-snapshots (http://repo.spring.io/libs-snapshot): Access denied to: http://repo.spring.io/libs-snapshot/org/opensaml/xmltooling/1.4.6/xmltooling-1.4.6.pom , ReasonPhrase:Forbidden. -> [Help 1]
解决:
手动下载https://mvnrepository.com/artifact/org.opensaml/opensaml/2.6.6 jar
手动下载https://build.shibboleth.net/nexus/content/repositories/releases/net/shibboleth/parent-v2/4/parent-v2-4.pom
手动下载https://mvnrepository.com/artifact/org.opensaml/openws/1.5.6 jar和pom
手动下载https://mvnrepository.com/artifact/org.opensaml/xmltooling/1.4.6 jar和pom

7.最终编译成功
[INFO]
[INFO] — maven-install-plugin:2.5.2:install (default-install) @ kylin-tomcat-ext —
[INFO] Installing /opt/source/apache-kylin-2.6.0/tomcat-ext/target/kylin-tomcat-ext-2.6.0.jar to /home/hadoop/.m2/repository/org/
apache/kylin/kylin-tomcat-ext/2.6.0/kylin-tomcat-ext-2.6.0.jar
[INFO] Installing /opt/source/apache-kylin-2.6.0/tomcat-ext/pom.xml to /home/hadoop/.m2/repository/org/apache/kylin/kylin-tomcatext/2.6.0/kylin-tomcat-ext-2.6.0.pom
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Apache Kylin 2.6.0 … SUCCESS [ 6.896 s]
[INFO] Apache Kylin - Core Common … SUCCESS [ 9.497 s]
[INFO] Apache Kylin - Core Metadata … SUCCESS [ 11.527 s]
[INFO] Apache Kylin - Core Dictionary … SUCCESS [ 7.266 s]
[INFO] Apache Kylin - Core Cube … SUCCESS [ 9.569 s]
[INFO] Apache Kylin - Core Metrics … SUCCESS [ 4.294 s]
[INFO] Apache Kylin - Core Job … SUCCESS [ 6.162 s]
[INFO] Apache Kylin - Core Storage … SUCCESS [ 4.810 s]
[INFO] Apache Kylin - MapReduce Engine … SUCCESS [ 11.646 s]
[INFO] Apache Kylin - Spark Engine … SUCCESS [ 36.087 s]
[INFO] Apache Kylin - Hive Source … SUCCESS [ 16.175 s]
[INFO] Apache Kylin - DataSource SDK … SUCCESS [ 9.821 s]
[INFO] Apache Kylin - Jdbc Source … SUCCESS [ 6.607 s]
[INFO] Apache Kylin - Kafka Source … SUCCESS [ 7.157 s]
[INFO] Apache Kylin - Cache … SUCCESS [ 5.200 s]
[INFO] Apache Kylin - HBase Storage … SUCCESS [ 21.739 s]
[INFO] Apache Kylin - Query … SUCCESS [ 10.917 s]
[INFO] Apache Kylin - Metrics Reporter Hive … SUCCESS [ 8.855 s]
[INFO] Apache Kylin - Metrics Reporter Kafka … SUCCESS [ 5.503 s]
[INFO] Apache Kylin - REST Server Base … SUCCESS [01:21 min]
[INFO] Apache Kylin - REST Server … SUCCESS [01:47 min]
[INFO] Apache Kylin - JDBC Driver … SUCCESS [01:29 min]
[INFO] Apache Kylin - Assembly … SUCCESS [02:21 min]
[INFO] Apache Kylin - Tool … SUCCESS [ 33.614 s]
[INFO] Apache Kylin - Tool Assembly … SUCCESS [ 22.368 s]
[INFO] Apache Kylin - Integration Test … SUCCESS [01:08 min]
[INFO] Apache Kylin - Tomcat Extension 2.6.0 … SUCCESS [ 6.227 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 12:31 min
[INFO] Finished at: 2020-05-17T04:29:36+08:00
[INFO] ------------------------------------------------------------------------

apache-kylin-2.6.0-bin/bin/check-port-availability.sh
apache-kylin-2.6.0-bin/bin/build-incremental-cube.sh
apache-kylin-2.6.0-bin/bin/find-spark-dependency.sh
apache-kylin-2.6.0-bin/bin/check-migration-acl.sh
apache-kylin-2.6.0-bin/bin/find-hadoop-conf-dir.sh
apache-kylin-2.6.0-bin/bin/find-hbase-dependency.sh
apache-kylin-2.6.0-bin/bin/check-env.sh
apache-kylin-2.6.0-bin/bin/set-java-home.sh
apache-kylin-2.6.0-bin/bin/sample.sh
apache-kylin-2.6.0-bin/bin/find-kafka-dependency.sh
apache-kylin-2.6.0-bin/bin/metastore.sh
apache-kylin-2.6.0-bin/bin/sample-streaming.sh
apache-kylin-2.6.0-bin/bin/diag.sh
apache-kylin-2.6.0-bin/bin/find-hive-dependency.sh
apache-kylin-2.6.0-bin/bin/kylin.sh
apache-kylin-2.6.0-bin/bin/check-hive-usability.sh
apache-kylin-2.6.0-bin/bin/system-cube.sh
Package ready: dist/apache-kylin-2.6.0-bin.tar.gz
真正意义上是编译成功了 基于cdh5.16.2

5. 配置

需要去编译源码 基于kylin的2.6.0版本 cdh5.16.2

kylin.properties
kylin.env.hadoop-conf-dir=/opt/app/hadoop-2.6.0-cdh5.16.2/etc/hadoop

启动kylin:
[hadoop@hadoop001 bin]$ ./kylin.sh start

看到如下信息输出,即启动成功:
A new Kylin instance is started by hadoop. To stop it, run 'kylin.sh stop'
Check the log at /opt/app/apache-kylin-2.6.0/logs/kylin.log
Web UI is at http://<hostname>:7070/kylin
You have new mail in /var/spool/mail/hadoop

[hadoop@hadoop001 bin]$ ps -ef | grep kylin

web ui:http://hadoop001:7070/kylin/login   ADMIN/KYLIN

6. 使用

./sample.sh
创建项目

启动kylin
./kylin.sh start

Cardinality 维度基数: count(distinct)之后得结果 item 200

数据加载:
load table 库名.表名
load table metadata from tree
Aggregation Groups
聚合组 是我们cube优化 维度裁剪 重要步骤
Includes 需要包含得维度
Mendatory Dimensions 强制维度。必须维度 每个查询都会带得维度 一般来说会选时间字段
Hierarchy Dimensions 层级维度 省/市/区 一/二/三/四/五
joint Dimensions 联合维度 id 和 name 一起出现
rowkeys
设计得好坏 与查询hbase数据效率有关
cube engine
mapreduce/spark
Advance Dictionaries
设计到精确去重得指标 有一定误差吗,但是存储得空间小
Advanced Snapshot Table
快照
Advanced ColumnFamily
设置列簇 设计到去重/求和

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值