kylin4.0.1安装与填坑

最新推荐文章于 2023-01-19 16:25:01 发布

Java页大数据

最新推荐文章于 2023-01-19 16:25:01 发布

阅读量1.1k

点赞数

分类专栏： kylin spark 文章标签： spark kylin

本文链接：https://blog.csdn.net/liguanhaoyonghu/article/details/124918803

版权

spark 同时被 2 个专栏收录

3 篇文章 0 订阅

订阅专栏

kylin

1 篇文章 0 订阅

订阅专栏

kylin4.0.1安装

解压安装包
将hdfs-site.xml,core-site.xml,hive-site.xml,spark-defaults.conf文件添加到/$KYLIN_HOME/conf目录下，软连接也可以（建议）
将自身的HADOO_HOME,HIVE_HOME,HBASE_HOME,SPARK_HOME的值添加到/$KYLIN_HOME/bin/kylin.sh文件中；
内容如下：

export HADOOP_HOME=/opt/hadoop-3.3.2
export HIVE_HOME=/opt/hive-3.1.3
export HBASE_HOME=/opt/hbase-2.4.11
export SPARK_HOME=/opt/spark-3.2.1-bin-hadoop3.2

修改conf目录下的kylin.properties文件(原来的都是带有注释！！我自身的是非注释)

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#




# The below commented values will effect as default settings
# Uncomment and override them if necessary



#
#### METADATA | ENV ###
#
## The metadata store has two implementations(RDBMS/HBase), while RDBMS is recommended in Kylin 4.X
## Please check https://kylin.apache.org/docs/tutorial/mysql_metastore.html
kylin.metadata.url=kylin_metadata@jdbc,url=jdbc:mysql://only:3306/kylin,username=root,password=xxxx,maxActive=10,maxIdle=10
#
## metadata cache sync retry times
#kylin.metadata.sync-retries=3
#
## Working folder in HDFS, better be qualified absolute path, make sure user has the right permission to this directory
#kylin.env.hdfs-working-dir=/kylin
#
## DEV|QA|PROD. DEV will turn on some dev features, QA and PROD has no difference in terms of functions.
#kylin.env=QA
#
## kylin zk base path
#kylin.env.zookeeper-base-path=/kylin
#
## Run a TestingServer for curator locally
#kylin.env.zookeeper-is-local=false
#
## Connect to a remote zookeeper with the url, should set kylin.env.zookeeper-is-local to false
#kylin.env.zookeeper-connect-string=sandbox.hortonworks.com
#
#### SERVER | WEB | RESTCLIENT ###
#
## Kylin server mode, valid value [all, query, job]
#kylin.server.mode=all
#
## List of web servers in use, this enables one web server instance to sync up with other servers.
#kylin.server.cluster-servers=localhost:7070
#
## Display timezone on UI,format like[GMT+N or GMT-N]
#kylin.web.timezone=
#
## Timeout value for the queries submitted through the Web UI, in milliseconds
#kylin.web.query-timeout=300000
#
#kylin.web.cross-domain-enabled=true
#
##allow user to export query result
#kylin.web.export-allow-admin=true
#kylin.web.export-allow-other=true
#
## Hide measures in measure list of cube designer, separate by comma
#kylin.web.hide-measures=RAW
#
##max connections of one route
#kylin.restclient.connection.default-max-per-route=20
#
##max connections of one rest-client
#kylin.restclient.connection.max-total=200
#
#### PUBLIC CONFIG ###
#kylin.engine.default=6
#kylin.storage.default=4
#kylin.web.hive-limit=20
#kylin.web.help.length=4
#kylin.web.help.0=start|Getting Started|http://kylin.apache.org/docs/tutorial/kylin_sample.html
#kylin.web.help.1=odbc|ODBC Driver|http://kylin.apache.org/docs/tutorial/odbc.html
#kylin.web.help.2=tableau|Tableau Guide|http://kylin.apache.org/docs/tutorial/tableau_91.html
#kylin.web.help.3=onboard|Cube Design Tutorial|http://kylin.apache.org/docs/howto/howto_optimize_cubes.html
#kylin.web.link-streaming-guide=http://kylin.apache.org/
#kylin.htrace.show-gui-trace-toggle=false
#kylin.web.link-hadoop=
#kylin.web.link-diagnostic=
#kylin.web.contact-mail=
#kylin.server.external-acl-provider=
#kylin.source.hive.database-for-flat-table=default
#
## Default time filter for job list, 0->current day, 1->last one day, 2->last one week, 3->last one year, 4->all
#kylin.web.default-time-filter=1
#
#### STORAGE ###
#
## clean real storage after purge operation
## if you want to delete the real storage like htable of deleting segment, you can set it to true
#kylin.storage.clean-after-delete-operation=false
#
#### JOB ###
#
## Max job retry on error, default 0: no retry
#kylin.job.retry=0
#
## Max count of concurrent jobs running
#kylin.job.max-concurrent-jobs=10
#
## The percentage of the sampling, default 100%
#kylin.job.sampling-percentage=100
#
## If true, will send email notification on job complete
##kylin.job.notification-enabled=true
##kylin.job.notification-mail-enable-starttls=true
##kylin.job.notification-mail-host=smtp.office365.com
##kylin.job.notification-mail-port=587
##kylin.job.notification-mail-username=kylin@example.com
##kylin.job.notification-mail-password=mypassword
##kylin.job.notification-mail-sender=kylin@example.com
#kylin.job.scheduler.provider.100=org.apache.kylin.job.impl.curator.CuratorScheduler
#kylin.job.scheduler.default=0
#
#### CUBE | DICTIONARY ###
#
#kylin.cube.cuboid-scheduler=org.apache.kylin.cube.cuboid.DefaultCuboidScheduler
#kylin.cube.segment-advisor=org.apache.kylin.cube.CubeSegmentAdvisor
#kylin.cube.aggrgroup.max-combination=32768
#
#kylin.cube.cubeplanner.enabled=false
#kylin.cube.cubeplanner.enabled-for-existing-cube=false
#
## In kylin3, the default value of 'kylin.cube.cubeplanner.expansion-threshold' is 15.
## Because the storage engine in kylin4 changes, the storage size of the same cuboid will become smaller.
## Under the same expansion rate, the number of cuboids in kylin4 will be greater than that in kylin3.
## In order to keep the pruning result of the phase1 of the cube planner in kylin4 roughly the same as that in kylin3,
## the default value of this parameter is adjusted to 2.5.
#kylin.cube.cubeplanner.expansion-threshold=2.5
#
#kylin.cube.cubeplanner.recommend-cache-max-size=200
#kylin.cube.cubeplanner.mandatory-rollup-threshold=1000
#kylin.cube.cubeplanner.algorithm-threshold-greedy=8
#kylin.cube.cubeplanner.algorithm-threshold-genetic=23
#
#### QUERY ###
#
#kylin.query.cache-enabled=true
#kylin.query.cache-threshold-scan-count=10240
#kylin.query.cache-threshold-duration=2000
#kylin.query.cache-threshold-scan-bytes=1048576
#kylin.query.large-query-threshold=1000000
#
## Controls extras properties for Calcite jdbc driver
## all extras properties should undder prefix "kylin.query.calcite.extras-props."
## case sensitive, default: true, to enable case insensitive set it to false
## @see org.apache.calcite.config.CalciteConnectionProperty.CASE_SENSITIVE
#kylin.query.calcite.extras-props.caseSensitive=true
## how to handle unquoted identity, defualt: TO_UPPER, available options: UNCHANGED, TO_UPPER, TO_LOWER
## @see org.apache.calcite.config.CalciteConnectionProperty.UNQUOTED_CASING
#kylin.query.calcite.extras-props.unquotedCasing=TO_UPPER
## quoting method, default: DOUBLE_QUOTE, available options: DOUBLE_QUOTE, BACK_TICK, BRACKET
## @see org.apache.calcite.config.CalciteConnectionProperty.QUOTING
#kylin.query.calcite.extras-props.quoting=DOUBLE_QUOTE
## change SqlConformance from DEFAULT to LENIENT to enable group by ordinal
## @see org.apache.calcite.sql.validate.SqlConformance.SqlConformanceEnum
#kylin.query.calcite.extras-props.conformance=LENIENT
#
## TABLE ACL
#kylin.query.security.table-acl-enabled=true
#
## Usually should not modify this
#kylin.query.interceptors=org.apache.kylin.rest.security.TableInterceptor
#
#kylin.query.escape-default-keyword=false
#
## Usually should not modify this
#kylin.query.transformers=org.apache.kylin.query.util.DefaultQueryTransformer,org.apache.kylin.query.util.KeywordDefaultDirtyHack
#
#### SECURITY ###
#
## Spring security profile, options: testing, ldap, saml
## with "testing" profile, user can use pre-defined name/pwd like KYLIN/ADMIN to login
#kylin.security.profile=testing
#
## Admin roles in LDAP, for ldap and saml
#kylin.security.acl.admin-role=admin
#
## LDAP authentication configuration
#kylin.security.ldap.connection-server=ldap://ldap_server:389
#kylin.security.ldap.connection-username=
#kylin.security.ldap.connection-password=
## When you use the customized CA certificate library for user authentication based on LDAPs, you need to configure this item.
## The value of this item will be added to the JVM parameter javax.net.ssl.trustStore.
#kylin.security.ldap.connection-truststore=
#
## LDAP user account directory;
#kylin.security.ldap.user-search-base=
#kylin.security.ldap.user-search-pattern=
#kylin.security.ldap.user-group-search-base=
#kylin.security.ldap.user-group-search-filter=(|(member={0})(memberUid={1}))
#
## LDAP service account directory
#kylin.security.ldap.service-search-base=
#kylin.security.ldap.service-search-pattern=
#kylin.security.ldap.service-group-search-base=
#
### SAML configurations for SSO
## SAML IDP metadata file location
#kylin.security.saml.metadata-file=classpath:sso_metadata.xml
#kylin.security.saml.metadata-entity-base-url=https://hostname/kylin
#kylin.security.saml.keystore-file=classpath:samlKeystore.jks
#kylin.security.saml.context-scheme=https
#kylin.security.saml.context-server-name=hostname
#kylin.security.saml.context-server-port=443
#kylin.security.saml.context-path=/kylin
#
###################################
#### SPARK BUILD ENGINE CONFIGS ###
#
## Hadoop conf folder, will export this as "HADOOP_CONF_DIR" to run spark-submit
## This must contain site xmls of core, yarn, hive, and hbase in one folder
kylin.env.hadoop-conf-dir=/opt/hadoop-3.3.2/etc/hadoop
#
## Switch to spark resources automatic adjustment strategy
#kylin.spark-conf.auto.prior=true
#
## Read-Write separation deployment for Kylin 4, please check https://cwiki.apache.org/confluence/display/KYLIN/Read-Write+Separation+Deployment+for+Kylin+4.0
##kylin.engine.submit-hadoop-conf-dir=
#
## Spark conf (default is in spark/conf/spark-defaults.conf)
kylin.engine.spark-conf.spark.master=yarn
kylin.engine.spark-conf.spark.submit.deployMode=cluster
kylin.engine.spark-conf.spark.yarn.queue=default
kylin.engine.spark-conf.spark.executor.cores=2
kylin.engine.spark-conf.spark.executor.memory=2G
kylin.engine.spark-conf.spark.executor.instances=2
kylin.engine.spark-conf.spark.executor.memoryOverhead=512M
kylin.engine.spark-conf.spark.driver.cores=1
kylin.engine.spark-conf.spark.driver.memory=1G
kylin.engine.spark-conf.spark.driver.memoryOverhead=512M
kylin.engine.spark-conf.spark.shuffle.service.enabled=true
kylin.engine.spark-conf.spark.eventLog.enabled=true
kylin.engine.spark-conf.spark.eventLog.dir=hdfs://only:9000/spark_eventlog
kylin.engine.spark-conf.spark.history.fs.logDirectory=hdfs://only:9000/spark_history
kylin.engine.spark-conf.spark.hadoop.yarn.timeline-service.enabled=false
#kylin.engine.spark-conf.spark.executor.extraJavaOptions=-Dfile.encoding=UTF-8 -Dhdp.version=current -Dlog4j.configuration=spark-executor-log4j.properties -Dlog4j.debug -Dkylin.hdfs.working.dir=${hdfs.working.dir} -Dkylin.metadata.identifier=${kylin.metadata.url.identifier} -Dkylin.spark.category=job -Dkylin.spark.project=${job.project} -Dkylin.spark.identifier=${job.id} -Dkylin.spark.jobName=${job.stepId} -Duser.timezone=${user.timezone}
##kylin.engine.spark-conf.spark.sql.shuffle.partitions=1
#
## manually upload spark-assembly jar to HDFS and then set this property will avoid repeatedly uploading jar at runtime
kylin.engine.spark-conf.spark.yarn.jars=hdfs://only:9000/spark_jars/*
##kylin.engine.spark-conf.spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec
#
## uncomment for HDP
##kylin.engine.spark-conf.spark.driver.extraJavaOptions=-Dhdp.version=current
##kylin.engine.spark-conf.spark.yarn.am.extraJavaOptions=-Dhdp.version=current
#kylin.engine.spark-conf.spark.driver.extraJavaOptions=-XX:+CrashOnOutOfMemoryError
#
###################################
#### SPARK QUERY ENGINE CONFIGS ###
#
## Enlarge cores and memory to improve query performance in production env, please check https://cwiki.apache.org/confluence/display/KYLIN/User+Manual+4.X
#
##Whether or not to start SparderContext when query server start
#kylin.query.auto-sparder-context-enabled-enabled=false
##kylin.query.sparder-context.app-name=
#
#kylin.query.spark-conf.spark.master=yarn
#kylin.query.spark-conf.spark.driver.cores=1
#kylin.query.spark-conf.spark.driver.memory=4G
#kylin.query.spark-conf.spark.driver.memoryOverhead=1G
#kylin.query.spark-conf.spark.executor.cores=1
#kylin.query.spark-conf.spark.executor.instances=1
#kylin.query.spark-conf.spark.executor.memory=4G
#kylin.query.spark-conf.spark.executor.memoryOverhead=1G
#kylin.query.spark-conf.spark.serializer=org.apache.spark.serializer.JavaSerializer
##kylin.query.spark-conf.spark.sql.shuffle.partitions=40
##kylin.query.spark-conf.spark.yarn.jars=hdfs://localhost:9000/spark2_jars/*
#kylin.query.spark-conf.spark.hadoop.yarn.timeline-service.enabled=false
#
#kylin.query.spark-conf.spark.executor.extraJavaOptions=-Dhdp.version=current -Dlog4j.configuration=spark-executor-log4j.properties -Dlog4j.debug -Dkylin.hdfs.working.dir=${kylin.env.hdfs-working-dir} -Dkylin.metadata.identifier=${kylin.metadata.url.identifier} -Dkylin.spark.category=sparder -Dkylin.spark.identifier={{APP_ID}}
## uncomment for HDP
##kylin.query.spark-conf.spark.driver.extraJavaOptions=-Dhdp.version=current
##kylin.query.spark-conf.spark.yarn.am.extraJavaOptions=-Dhdp.version=current
#
#### QUERY PUSH DOWN ###
#
##kylin.query.pushdown.runner-class-name=org.apache.kylin.query.pushdown.PushDownRunnerSparkImpl
##kylin.query.pushdown.update-enabled=false

启动kylin

[root@only bin]# ./kylin.sh start
Retrieving hadoop conf dir...
...................................................[PASS]
KYLIN_HOME is set to /opt/kylin-4.0.1-spark3
Checking hive
...................................................[PASS]
Checking hadoop shell
...................................................[PASS]
Checking hdfs working dir
WARNING: log4j.properties is not found. HADOOP_CONF_DIR may be incomplete.
...................................................[PASS]
WARNING: log4j.properties is not found. HADOOP_CONF_DIR may be incomplete.
WARNING: log4j.properties is not found. HADOOP_CONF_DIR may be incomplete.

Checking environment finished successfully. To check again, run 'bin/check-env.sh' manually.
Retrieving hadoop conf dir...
Retrieving Spark dependency...
Skip spark which not owned by kylin. SPARK_HOME is /opt/spark-3.2.1-bin-hadoop3.2 and KYLIN_HOME is /opt/kylin-4.0.1-spark3.
  Please download the correct version of Apache Spark, unzip it, rename it to 'spark' and put it in /opt/kylin-4.0.1-spark3 directory.
  Do not use the spark that comes with your hadoop environment.
  If your hadoop environment is cdh6.x, you need to do some additional operations in advance.
  Please refer to the link: https://cwiki.apache.org/confluence/display/KYLIN/Deploy+Kylin+4+on+CDH+6.
Start to check whether we need to migrate acl tables
Not HBase metadata. Skip check.

A new Kylin instance is started by root. To stop it, run 'kylin.sh stop'
Check the log at /opt/kylin-4.0.1-spark3/logs/kylin.log
Web UI is at http://only:7070/kylin

填坑

坑一

看到这里，我去浏览器访问，但是访问不了，然后看进程ps，也存在，最终到日志下看，存在报错：

06:43:59.891 [localhost-startStop-1] ERROR org.springframework.web.context.ContextLoader - Context initialization failed
org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerMapping': Invocation of init method failed; nested exception is java.lang.NoClassDefFoundError: org/apache/commons/configuration/ConfigurationException

解决：这里指的是缺少commons-collections-3.2.2.jar和commons-configuration-1.3.jar依赖；下载这两个jar包即可处理该问题；

坑二

再次重启，结果还是报错：
Unable to resolve address: sandbox.hortonworks.com:2181

处理:在/etc/hosts文件添加映射即可(但是我没弄懂，那位大佬看到可以提点下，感谢)

127.0.0.1 sandbox.hortonworks.com

坑三

Caused by: java.lang.ClassNotFoundException: com.mysql.jdbc.Driver

解决：添加mysql-connector-java-5.1.8.jar到$KYLIN_HOME/tomcat/webapps/kylin/WEB-INF/lib目录下即可

浏览器访问
处理完上面的问题，即浏览器上即可正常访问

http://192.168.52.19:7070/kylin
username:ADMIN   (大写)
password:KYLIN   (大写)

问题以为就完了？天真

坑四

配置元数据的时候，执行脚本报错：

org.apache.commons.dbcp.SQLNestedException: Cannot load JDBC driver class 'com.mysql.jdbc.Driver'

这个问题查了百度很久，说什么commons-dbcp-1.4.jar还要有对应的mysql驱动包，可我上面不是已经添加了吗？那我重新下载jar包试试，结果无论怎么弄都搞不好；
解决：最后看了官网，才知道需要将myql的驱动包也添加到$KYLIN_HOME/ext目录下。就这么简单，我都想打死自己了，不看官网。。。。。。
官网截图：

Java页大数据

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
1
评论
kylin4.0.1安装与填坑

kylin4.0.1安装解压安装包将hdfs-site.xml,core-site.xml,hive-site.xml,spark-defaults.conf文件添加到/$KYLIN_HOME/conf目录下，软连接也可以（建议）将自身的HADOO_HOME,HIVE_HOME,HBASE_HOME,SPARK_HOME的值添加到/$KYLIN_HOME/bin/kylin.sh文件中；内容如下：export HADOOP_HOME=/opt/hadoop-3.3.2export HIVE_
复制链接

扫一扫

专栏目录