1、apache-kylin-3.1.3-bin-hadoop3介绍及部署、验证详解

apache-kylin-3.1.3-bin-hadoop3 系列文章

1、apache-kylin-3.1.3-bin-hadoop3介绍及部署、验证详解
2、apache-kylin-3.1.3-bin-hadoop3集群部署
3、apache-kylin-3.1.3-bin-hadoop3基本操作(创建model和cube、数据查詢)
4、apache-kylin-3.1.3-bin-hadoop3增量构建、全量构建详细示例
5、apache-kylin-3.1.3-bin-hadoop3 segment管理及JDBC操作
6、apache-kylin-3.1.3-bin-hadoop3 cube优化方向及减少cuboids和降低膨胀率详细示例



本文介绍了kylin的基本情况和单机部署的说明及验证。
本文依赖环境有zookeeper、hadoop、hbase、hive环境,其相关的环境详见本人相关专题。
本文分为2个部分,及kylin的介绍和部署。

一、kylin介绍

1、介绍

Apache Kylin™是一个开源的、分布式的分析型数据仓库,提供 Hadoop 之上的 SQL 查询接口及多维分析(OLAP)能力以支持超大规模数据,最初由eBay Inc.开发并贡献至开源社区。
官网地址:https://kylin.apache.org/cn/
Apache Kylin™ 令使用者仅需三步,即可实现超大数据集上的亚秒级查询。

  • 定义数据集上的一个星形或雪花形模型
  • 在定义的数据表上构建cube
  • 使用标准 SQL 通过 ODBC、JDBC 或 RESTFUL API 进行查询,仅需亚秒级响应时间即可获得查询结果
    Kylin 提供与多种数据可视化工具的整合能力,如 Tableau,PowerBI 等,令用户可以使用 BI 工具对 Hadoop 数据进行分析。
    在这里插入图片描述

2、功能特点

Kylin 是一个 Hadoop 生态圈下的 MOLAP 系统,是 ebay 大数据部门从2014 年开始研发的支持 TB 到 PB 级别数据量的分布式 Olap 分析引擎。其特点包括:

  • 可扩展超快的基于大数据的分析型数据仓库:Kylin 是为减少在 Hadoop/Spark 上百亿规模数据查询延迟而设计
  • Hadoop ANSI SQL 接口:作为一个分析型数据仓库(也是 OLAP 引擎),Kylin 为 Hadoop 提供标准 SQL 支持大部分查询功能
  • 交互式查询能力:通过 Kylin,用户可以与 Hadoop 数据进行亚秒级交互,在同样的数据集上提供比 Hive 更好的性能
  • 多维立方体(MOLAP Cube):用户能够在 Kylin 里为百亿以上数据集定义数据模型并构建立方体
  • 实时 OLAP:Kylin 可以在数据产生时进行实时处理,用户可以在秒级延迟下进行实时数据的多维分析。
  • 与BI工具无缝整合:Kylin 提供与 BI 工具的整合能力,如Tableau,PowerBI/Excel,MSTR,QlikSense,Hue 和 SuperSet
  • 其他特性:
    Job管理与监控
    压缩与编码
    增量更新
    利用HBase Coprocessor
    基于HyperLogLog的Dinstinc Count近似算法
    友好的web界面以管理,监控和使用立方体
    项目及表级别的访问控制安全
    支持LDAP、SSO

3、生态

  • Kylin 核心,Kylin 基础框架,包括元数据引擎,查询引擎,Job引擎及存储引擎等,同时包括REST服务器以响应客户端请求
  • 扩展,支持额外功能和特性的插件
  • 整合,与调度系统,ETL,监控等生命周期管理系统的整合
  • 用户界面,在Kylin核心之上扩展的第三方用户界面
  • 驱动,ODBC 和 JDBC 驱动以支持不同的工具和产品,比如Tableau
    在这里插入图片描述

二、kylin单机部署

1、软件要求

  • Hadoop: 2.7+, 3.1+ (since v2.5) 示例環境:3.1.4
  • Hive: 0.13 - 1.2.1+ 示例環境:3.1.2
  • HBase: 1.1+, 2.0 (since v2.5) 示例環境:2.1.0
  • Spark (可选) 2.3.0+
  • Kafka (可选) 1.0.0+ (since v2.5)
  • JDK: 1.8+ (since v2.5) 示例環境:1.8
  • OS: Linux only, CentOS 6.5+ or Ubuntu 16.0.4+ 示例環境:centos 6.10

2、Hadoop 环境

Kylin 依赖于 Hadoop 集群处理大量的数据集。需要准备一个配置好 HDFS,YARN,MapReduce,Hive, HBase,Zookeeper 和其他服务的 Hadoop 集群供 Kylin 运行。

Kylin 可以在 Hadoop 集群的任意节点上启动。方便起见,可以在 master 节点上运行 Kylin。但为了更好的稳定性,建议将 Kylin 部署在一个干净的 Hadoop client 节点上,该节点上 Hive,HBase,HDFS 等命令行已安装好且 client 配置(如 core-site.xml,hive-site.xml,hbase-site.xml及其他)也已经合理的配置且其可以自动和其它节点同步。

运行 Kylin 的 Linux 账户要有访问 Hadoop 集群的权限,包括创建/写入 HDFS 文件夹,Hive 表, HBase 表和提交 MapReduce 任务的权限。

3、安裝

由於hive安裝在server4上,故本示例kylin也是安裝在server4(192.168.10.44)上。
安装文件去官网上下载即可,本处不再赘述。本文的版本是apache-kylin-3.1.3-bin-hadoop3.tar.gz

1)、解壓

[alanchan@server4 ~]$ tar -zxf /usr/local/bigdata/apache-kylin-3.1.3-bin-hadoop3.tar.gz -C /usr/local/bigdata
[alanchan@server4 apache-kylin-3.1.3-bin-hadoop3]$ pwd
/usr/local/bigdata/apache-kylin-3.1.3-bin-hadoop3
[alanchan@server4 apache-kylin-3.1.3-bin-hadoop3]$ ll
总用量 56
drwxr-xr-x 2 alanchan root  4096 1229 2021 bin
-rw-r--r-- 1 alanchan root   823 1229 2021 commit_SHA1
drwxr-xr-x 2 alanchan root  4096 1229 2021 conf
drwxr-xr-x 3 alanchan root  4096 1229 2021 lib
-rw-r--r-- 1 alanchan root 14725 1229 2021 LICENSE
-rw-r--r-- 1 alanchan root  1279 1229 2021 NOTICE
-rw-r--r-- 1 alanchan root  3222 1229 2021 README.md
drwxr-xr-x 4 alanchan root  4096 1229 2021 sample_cube
drwxr-xr-x 9 alanchan root  4096 1229 2021 tomcat
drwxr-xr-x 2 alanchan root  4096 1229 2021 tool
-rw-r--r-- 1 alanchan root    19 1229 2021 VERSION

2)、Kylin tarball 目录

  • bin, shell 脚本,用于启动/停止 Kylin,备份/恢复 Kylin 元数据,以及一些检查端口、获取 Hive/HBase 依赖的方法等
  • conf,Hadoop 任务的 XML 配置文件,这些文件的作用可参考配置页面
  • lib,供外面应用使用的 jar 文件,例如 Hadoop 任务 jar, JDBC 驱动, HBase coprocessor 等
  • meta_backups,执行
  • bin/metastore.sh backup 后的默认的备份目录;
  • sample_cube 用于创建样例 Cube 和表的文件
  • spark,自带的 spark
  • tomcat,自带的 tomcat,用于启动 Kylin 服务
  • tool,用于执行一些命令行的jar文件

3)、增加kylin依赖组件的配置

[alanchan@server4 conf]$ cd /usr/local/bigdata/apache-kylin-3.1.3-bin-hadoop3/conf
[alanchan@server4 apache-kylin-3.1.3-bin-hadoop3]$ cd conf
[alanchan@server4 conf]$ ll
总用量 64
-rw-r--r-- 1 alanchan root  3605 1229 2021 kylin_hive_conf.xml
-rw-r--r-- 1 alanchan root  3690 1229 2021 kylin_job_conf_cube_merge.xml
-rw-r--r-- 1 alanchan root  3807 1229 2021 kylin_job_conf_inmem.xml
-rw-r--r-- 1 alanchan root  3159 1229 2021 kylin_job_conf.xml
-rw-r--r-- 1 alanchan root  1156 1229 2021 kylin-kafka-consumer.xml
-rw-r--r-- 1 alanchan root 17062 1229 2021 kylin.properties
-rw-r--r-- 1 alanchan root  2257 1229 2021 kylin-server-log4j.properties
-rw-r--r-- 1 alanchan root  2011 1229 2021 kylin-spark-log4j.properties
-rw-r--r-- 1 alanchan root  1655 1229 2021 kylin-tools-log4j.properties
-rwxr-xr-x 1 alanchan root  4120 1229 2021 setenv.sh
-rwxr-xr-x 1 alanchan root  3870 1229 2021 setenv-tool.sh

需要先確認hadoop、hbase、hive是否配置了環境變量

ln -s $HADOOP_HOME/etc/hadoop/hdfs-site.xml hdfs-site.xml
ln -s $HADOOP_HOME/etc/hadoop/core-site.xml core-site.xml
ln -s $HBASE_HOME/conf/hbase-site.xml hbase-site.xml
ln -s $HIVE_HOME/conf/hive-site.xml hive-site.xml

創建正確如下

[alanchan@server4 conf]$ ll
总用量 64
lrwxrwxrwx 1 alanchan root    56 14 17:27 core-site.xml -> /usr/local/bigdata/hadoop-3.1.4/etc/hadoop/core-site.xml
lrwxrwxrwx 1 alanchan root    50 14 17:27 hbase-site.xml -> /usr/local/bigdata/hbase-2.1.0/conf/hbase-site.xml
lrwxrwxrwx 1 alanchan root    56 14 17:26 hdfs-site.xml -> /usr/local/bigdata/hadoop-3.1.4/etc/hadoop/hdfs-site.xml
lrwxrwxrwx 1 alanchan root    59 14 17:33 hive-site.xml -> /usr/local/bigdata/apache-hive-3.1.2-bin/conf/hive-site.xml
-rw-r--r-- 1 alanchan root  3605 1229 2021 kylin_hive_conf.xml
-rw-r--r-- 1 alanchan root  3690 1229 2021 kylin_job_conf_cube_merge.xml
-rw-r--r-- 1 alanchan root  3807 1229 2021 kylin_job_conf_inmem.xml
-rw-r--r-- 1 alanchan root  3159 1229 2021 kylin_job_conf.xml
-rw-r--r-- 1 alanchan root  1156 1229 2021 kylin-kafka-consumer.xml
-rw-r--r-- 1 alanchan root 17062 1229 2021 kylin.properties
-rw-r--r-- 1 alanchan root  2257 1229 2021 kylin-server-log4j.properties
-rw-r--r-- 1 alanchan root  2011 1229 2021 kylin-spark-log4j.properties
-rw-r--r-- 1 alanchan root  1655 1229 2021 kylin-tools-log4j.properties
-rwxr-xr-x 1 alanchan root  4120 1229 2021 setenv.sh
-rwxr-xr-x 1 alanchan root  3870 1229 2021 setenv-tool.sh

4)、配置kylin環境變量

vim /etc/profile

# 增加
#kylin
export KYLIN_HOME=/usr/local/bigdata/apache-kylin-3.1.3-bin-hadoop3
export PATH=$PATH:${KYLIN_HOME}/bin:
    
source /etc/profile

5)、配置kylin.sh

cd /usr/local/bigdata/apache-kylin-3.1.3-bin-hadoop3/bin
vim kylin.sh

kylin.sh文件添加如下内容:
export HADOOP_HOME=/usr/local/bigdata/hadoop-3.1.4
export HIVE_HOME=/usr/local/bigdata/apache-hive-3.1.2-bin
export HBASE_HOME=/usr/local/bigdata/hbase-2.1.0

完整配置文件如下

#!/bin/bash

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# set verbose=true to print more logs during start up

export HADOOP_HOME=/usr/local/bigdata/hadoop-3.1.4
export HIVE_HOME=/usr/local/bigdata/apache-hive-3.1.2-bin
export HBASE_HOME=/usr/local/bigdata/hbase-2.1.0


source ${KYLIN_HOME:-"$(cd -P -- "$(dirname -- "$0")" && pwd -P)/../"}/bin/header.sh $@
if [ "$verbose" = true ]; then
    shift
fi

mkdir -p ${KYLIN_HOME}/logs
mkdir -p ${KYLIN_HOME}/ext

source ${dir}/set-java-home.sh

function retrieveDependency() {
    #retrive $hive_dependency and $hbase_dependency
    if [[ -z $reload_dependency && `ls -1 ${dir}/cached-* 2>/dev/null | wc -l` -eq 6 ]]
    then
        echo "Using cached dependency..."

        source ${dir}/cached-hive-dependency.sh
        if [ -z "${hive_warehouse_dir}" ] || [ -z "${hive_dependency}" ] || [ -z "${hive_conf_path}" ]; then
          echo "WARNING: Using ${dir}/cached-hive-dependency.sh failed,will be use ${dir}/find-hive-dependency.sh"
          source ${dir}/find-hive-dependency.sh
        fi

        source ${dir}/cached-hbase-dependency.sh
        if [ -z "${hbase_dependency}" ]; then
          echo "WARNING: Using ${dir}/cached-hbase-dependency.sh failed,will be use ${dir}/find-hbase-dependency.sh"
          source ${dir}/find-hbase-dependency.sh
        fi

        source ${dir}/cached-hadoop-conf-dir.sh
        if [ -z "${kylin_hadoop_conf_dir}" ]; then
          echo "WARNING: Using ${dir}/cached-hadoop-conf-dir.sh failed,will be use ${dir}/find-hadoop-conf-dir.sh"
          source ${dir}/find-hadoop-conf-dir.sh
        fi

        source ${dir}/cached-kafka-dependency.sh
        if [ -z "${kafka_dependency}" ]; then
          echo "WARNING: Using ${dir}/cached-kafka-dependency.sh failed,will be use ${dir}/find-kafka-dependency.sh"
          source ${dir}/find-kafka-dependency.sh
        fi

        source ${dir}/cached-spark-dependency.sh
        if [ -z "${spark_dependency}" ]; then
          echo "WARNING: Using ${dir}/cached-spark-dependency.sh failed,will be use ${dir}/find-spark-dependency.sh"
          source ${dir}/find-spark-dependency.sh
        fi

        source ${dir}/cached-flink-dependency.sh
        if [ -z "${flink_dependency}" ]; then
          echo "WARNING: Using ${dir}/cached-flink-dependency.sh failed,will be use ${dir}/find-flink-dependency.sh"
          source ${dir}/find-flink-dependency.sh
        fi
    else
        source ${dir}/find-hive-dependency.sh
        source ${dir}/find-hbase-dependency.sh
        source ${dir}/find-hadoop-conf-dir.sh
        source ${dir}/find-kafka-dependency.sh
        source ${dir}/find-spark-dependency.sh
        source ${dir}/find-flink-dependency.sh
    fi

    #retrive $KYLIN_EXTRA_START_OPTS
    if [ -f "${dir}/setenv.sh" ]; then
        echo "WARNING: ${dir}/setenv.sh is deprecated and ignored, please remove it and use ${KYLIN_HOME}/conf/setenv.sh instead"
        source ${dir}/setenv.sh
    fi

    if [ -f "${KYLIN_HOME}/conf/setenv.sh" ]; then
        source ${KYLIN_HOME}/conf/setenv.sh
    fi

    export HBASE_CLASSPATH_PREFIX=${KYLIN_HOME}/conf:${KYLIN_HOME}/lib/*:${KYLIN_HOME}/ext/*:${HBASE_CLASSPATH_PREFIX}
    export HBASE_CLASSPATH=${HBASE_CLASSPATH}:${hive_dependency}:${kafka_dependency}:${spark_dependency}:${flink_dependency}

    verbose "HBASE_CLASSPATH: ${HBASE_CLASSPATH}"
}

function retrieveStartCommand() {
    if [ -f "${KYLIN_HOME}/pid" ]
    then
        PID=`cat $KYLIN_HOME/pid`
        if ps -p $PID > /dev/null
        then
          quit "Kylin is running, stop it first"
        fi
    fi

    lockfile=$KYLIN_HOME/LOCK
    if [ ! -e $lockfile ]; then
        trap "rm -f $lockfile; exit" INT TERM EXIT
        touch $lockfile
    else
        quit "Kylin is starting, wait for it"
    fi

    source ${dir}/check-env.sh

    tomcat_root=${dir}/../tomcat
    export tomcat_root


    #The location of all hadoop/hbase configurations are difficult to get.
    #Plus, some of the system properties are secretly set in hadoop/hbase shell command.
    #For example, in hdp 2.2, there is a system property called hdp.version,
    #which we cannot get until running hbase or hadoop shell command.
    #
    #To save all these troubles, we use hbase runjar to start tomcat.
    #In this way we no longer need to explicitly configure hadoop/hbase related classpath for tomcat,
    #hbase command will do all the dirty tasks for us:

    spring_profile=`bash ${dir}/get-properties.sh kylin.security.profile`
    if [ -z "$spring_profile" ]
    then
        quit 'Please set kylin.security.profile in kylin.properties, options are: testing, ldap, saml.'
    else
        verbose "kylin.security.profile is set to $spring_profile"
    fi
    # the number of Spring active profiles can be greater than 1. Additional profiles
    # can be added by setting kylin.security.additional-profiles
    additional_security_profiles=`bash ${dir}/get-properties.sh kylin.security.additional-profiles`
    if [[ "x${additional_security_profiles}" != "x" ]]; then
        spring_profile="${spring_profile},${additional_security_profiles}"
    fi

    retrieveDependency

    #additionally add tomcat libs to HBASE_CLASSPATH_PREFIX
    export HBASE_CLASSPATH_PREFIX=${tomcat_root}/bin/bootstrap.jar:${tomcat_root}/bin/tomcat-juli.jar:${tomcat_root}/lib/*:${HBASE_CLASSPATH_PREFIX}

    kylin_rest_address=`hostname -f`":"`grep "<Connector port=" ${tomcat_root}/conf/server.xml |grep protocol=\"HTTP/1.1\" | cut -d '=' -f 2 | cut -d \" -f 2`
    kylin_rest_address_arr=(${kylin_rest_address//;/ })
    nc -z -w 5 ${kylin_rest_address_arr[0]} ${kylin_rest_address_arr[1]} 1>/dev/null 2>&1; nc_result=$?
    if [ $nc_result -eq 0 ]; then
        quit "Port ${kylin_rest_address} is not available, could not start Kylin."
    fi

    ${KYLIN_HOME}/bin/check-migration-acl.sh || { exit 1; }
    #debug if encounter NoClassDefError
    verbose "kylin classpath is: $(hbase classpath)"

    security_ldap_truststore=`bash ${dir}/get-properties.sh kylin.security.ldap.connection-truststore`
    if [ -f "${security_ldap_truststore}" ]; then
        KYLIN_EXTRA_START_OPTS="$KYLIN_EXTRA_START_OPTS -Djavax.net.ssl.trustStore=$security_ldap_truststore"
    fi

    # KYLIN_EXTRA_START_OPTS is for customized settings, checkout bin/setenv.sh
    start_command="hbase ${KYLIN_EXTRA_START_OPTS} \
    -Djava.util.logging.config.file=${tomcat_root}/conf/logging.properties \
    -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager \
    -Dlog4j.configuration=file:${KYLIN_HOME}/conf/kylin-server-log4j.properties \
    -Dorg.apache.tomcat.util.buf.UDecoder.ALLOW_ENCODED_SLASH=true \
    -Dorg.apache.catalina.connector.CoyoteAdapter.ALLOW_BACKSLASH=true \
    -Djava.endorsed.dirs=${tomcat_root}/endorsed  \
    -Dcatalina.base=${tomcat_root} \
    -Dcatalina.home=${tomcat_root} \
    -Djava.io.tmpdir=${tomcat_root}/temp  \
    -Dkylin.hive.dependency=${hive_dependency} \
    -Dkylin.hbase.dependency=${hbase_dependency} \
    -Dkylin.kafka.dependency=${kafka_dependency} \
    -Dkylin.spark.dependency=${spark_dependency} \
    -Dkylin.flink.dependency=${flink_dependency} \
    -Dkylin.hadoop.conf.dir=${kylin_hadoop_conf_dir} \
    -Dkylin.server.host-address=${kylin_rest_address} \
    -Dkylin.source.hive.warehouse-dir=${hive_warehouse_dir} \
    -Dspring.profiles.active=${spring_profile} \
    org.apache.hadoop.util.RunJar ${tomcat_root}/bin/bootstrap.jar  org.apache.catalina.startup.Bootstrap start"
}

function retrieveStopCommand() {
    if [ -f "${KYLIN_HOME}/pid" ]
    then
        PID=`cat $KYLIN_HOME/pid`
        WAIT_TIME=2
        LOOP_COUNTER=10
        if ps -p $PID > /dev/null
        then
            echo "Stopping Kylin: $PID"
            kill $PID

            for ((i=0; i<$LOOP_COUNTER; i++))
            do
                # wait to process stopped
                sleep $WAIT_TIME
                if ps -p $PID > /dev/null ; then
                    echo "Stopping in progress. Will check after $WAIT_TIME secs again..."
                    continue;
                else
                    break;
                fi
            done

            # if process is still around, use kill -9
            if ps -p $PID > /dev/null
            then
                echo "Initial kill failed, getting serious now..."
                kill -9 $PID
                sleep 1 #give kill -9  sometime to "kill"
                if ps -p $PID > /dev/null
                then
                   quit "Warning, even kill -9 failed, giving up! Sorry..."
                fi
            fi

            # process is killed , remove pid file
            rm -rf ${KYLIN_HOME}/pid
            echo "Kylin with pid ${PID} has been stopped."
            return 0
        else
           echo "Kylin with pid ${PID} is not running"
           return 1
        fi
    else
        return 1
    fi
}

if [ "$2" == "--reload-dependency" ]
then
    reload_dependency=1
fi

# start command
if [ "$1" == "start" ]
then
    retrieveStartCommand
    ${start_command} >> ${KYLIN_HOME}/logs/kylin.out 2>&1 & echo $! > ${KYLIN_HOME}/pid &
    rm -f $lockfile

    echo ""
    echo "A new Kylin instance is started by $USER. To stop it, run 'kylin.sh stop'"
    echo "Check the log at ${KYLIN_HOME}/logs/kylin.log"
    echo "Web UI is at http://${kylin_rest_address_arr}/kylin"
    exit 0

# run command
elif [ "$1" == "run" ]
then
    retrieveStartCommand
    ${start_command}
    rm -f $lockfile

# stop command
elif [ "$1" == "stop" ]
then
    retrieveStopCommand
    if [[ $? == 0 ]]
    then
        exit 0
    else
        quit "Kylin is not running"
    fi

# restart command
elif [ "$1" == "restart" ]
then
    echo "Restarting kylin..."
    echo "--> Stopping kylin first if it's running..."
    retrieveStopCommand
    if [[ $? != 0 ]]
    then
        echo "Kylin is not running, now start it"
    fi
    echo "--> Start kylin..."
    retrieveStartCommand
    ${start_command} >> ${KYLIN_HOME}/logs/kylin.out 2>&1 & echo $! > ${KYLIN_HOME}/pid &
    rm -f $lockfile

    echo ""
    echo "A new Kylin instance is started by $USER. To stop it, run 'kylin.sh stop'"
    echo "Check the log at ${KYLIN_HOME}/logs/kylin.log"
    echo "Web UI is at http://${kylin_rest_address_arr}/kylin"
    exit 0

# streaming command
elif [ "$1" == "streaming" ]
then
    if [ $# -lt 2 ]
    then
        echo "Invalid input args $@"
        exit -1
    fi
    if [ "$2" == "start" ]
    then
        if [ -f "${KYLIN_HOME}/streaming_receiver_pid" ]
        then
            PID=`cat $KYLIN_HOME/streaming_receiver_pid`
            if ps -p $PID > /dev/null
            then
              echo "Kylin streaming receiver is running, stop it first"
            exit 1
            fi
        fi
        #retrive $hbase_dependency
        source ${dir}/find-hbase-dependency.sh
        #retrive $KYLIN_EXTRA_START_OPTS
        if [ -f "${KYLIN_HOME}/conf/setenv.sh" ]
            then source ${KYLIN_HOME}/conf/setenv.sh
        fi

        mkdir -p ${KYLIN_HOME}/ext
        HBASE_CLASSPATH=`hbase classpath`
        #echo "hbase class path:"$HBASE_CLASSPATH
        STREAM_CLASSPATH=${KYLIN_HOME}/lib/streaming/*:${KYLIN_HOME}/ext/*:${HBASE_CLASSPATH}

        # KYLIN_EXTRA_START_OPTS is for customized settings, checkout bin/setenv.sh
        ${JAVA_HOME}/bin/java -cp $STREAM_CLASSPATH ${KYLIN_EXTRA_START_OPTS} \
        -Dlog4j.configuration=stream-receiver-log4j.properties\
        -DKYLIN_HOME=${KYLIN_HOME}\
        -Dkylin.hbase.dependency=${hbase_dependency} \
        org.apache.kylin.stream.server.StreamingReceiver $@ > ${KYLIN_HOME}/logs/streaming_receiver.out 2>&1 & echo $! > ${KYLIN_HOME}/streaming_receiver_pid &
        exit 0
    elif [ "$2" == "stop" ]
    then
        if [ ! -f "${KYLIN_HOME}/streaming_receiver_pid" ]
        then
            echo "Streaming receiver is not running, please check"
            exit 1
        fi
        PID=`cat ${KYLIN_HOME}/streaming_receiver_pid`
        if [ "$PID" = "" ]
        then
            echo "Streaming receiver is not running, please check"
            exit 1
        else
            echo "Stopping streaming receiver: $PID"
            WAIT_TIME=2
            LOOP_COUNTER=20
            if ps -p $PID > /dev/null
            then
                kill $PID

                for ((i=0; i<$LOOP_COUNTER; i++))
                do
                    # wait to process stopped
                    sleep $WAIT_TIME
                    if ps -p $PID > /dev/null ; then
                        echo "Stopping in progress. Will check after $WAIT_TIME secs again..."
                        continue;
                    else
                        break;
                    fi
                done

                # if process is still around, use kill -9
                if ps -p $PID > /dev/null
                then
                    echo "Initial kill failed, getting serious now..."
                    kill -9 $PID
                    sleep 1 #give kill -9  sometime to "kill"
                    if ps -p $PID > /dev/null
                    then
                       quit "Warning, even kill -9 failed, giving up! Sorry..."
                    fi
                fi

                # process is killed , remove pid file
                rm -rf ${KYLIN_HOME}/streaming_receiver_pid
                echo "Kylin streaming receiver with pid ${PID} has been stopped."
                exit 0
            else
               quit "Kylin streaming receiver with pid ${PID} is not running"
            fi
        fi
    elif [[ "$2" = org.apache.kylin.* ]]
    then
        source ${KYLIN_HOME}/conf/setenv.sh
        HBASE_CLASSPATH=`hbase classpath`
        #echo "hbase class path:"$HBASE_CLASSPATH
        STREAM_CLASSPATH=${KYLIN_HOME}/lib/streaming/*:${KYLIN_HOME}/ext/*:${HBASE_CLASSPATH}

        shift
        # KYLIN_EXTRA_START_OPTS is for customized settings, checkout bin/setenv.sh
        ${JAVA_HOME}/bin/java -cp $STREAM_CLASSPATH ${KYLIN_EXTRA_START_OPTS} \
        -Dlog4j.configuration=stream-receiver-log4j.properties\
        -DKYLIN_HOME=${KYLIN_HOME}\
        -Dkylin.hbase.dependency=${hbase_dependency} \
        "$@"
        exit 0
    fi

elif [ "$1" = "version" ]
then
    retrieveDependency
    exec hbase -Dlog4j.configuration=file:${KYLIN_HOME}/conf/kylin-tools-log4j.properties org.apache.kylin.common.KylinVersion
    exit 0

elif [ "$1" = "diag" ]
then
    echo "'kylin.sh diag' no longer supported, use diag.sh instead"
    exit 0

# tool command
elif [[ "$1" = org.apache.kylin.* ]]
then
    retrieveDependency

    #retrive $KYLIN_EXTRA_START_OPTS from a separate file called setenv-tool.sh
    unset KYLIN_EXTRA_START_OPTS # unset the global server setenv config first
    if [ -f "${dir}/setenv-tool.sh" ]; then
        echo "WARNING: ${dir}/setenv-tool.sh is deprecated and ignored, please remove it and use ${KYLIN_HOME}/conf/setenv-tool.sh instead"
        source ${dir}/setenv-tool.sh
    fi

    if [ -f "${KYLIN_HOME}/conf/setenv-tool.sh" ]; then
        source ${KYLIN_HOME}/conf/setenv-tool.sh
    fi
    hbase_pre_original=${HBASE_CLASSPATH_PREFIX}
    export HBASE_CLASSPATH_PREFIX=${KYLIN_HOME}/tool/*:${HBASE_CLASSPATH_PREFIX}
    exec hbase ${KYLIN_EXTRA_START_OPTS} -Dkylin.hive.dependency=${hive_dependency} -Dkylin.hbase.dependency=${hbase_dependency} -Dlog4j.configuration=file:${KYLIN_HOME}/conf/kylin-tools-log4j.properties "$@"
    export HBASE_CLASSPATH_PREFIX=${hbase_pre_original}
else
    quit "Usage: 'kylin.sh [-v] start' or 'kylin.sh [-v] stop' or 'kylin.sh [-v] restart'"
fi

6)、配置conf/kylin.properties

cd /usr/local/bigdata/apache-kylin-3.1.3-bin-hadoop3/conf


## Working folder in HDFS, better be qualified absolute path, make sure user has the right permission to this directory
kylin.env.hdfs-working-dir=/kylin

## kylin zk base path
kylin.env.zookeeper-base-path=/kylin

kylin.source.hive.keep-flat-table=false

## Hive database name for putting the intermediate flat tables
kylin.source.hive.database-for-flat-table=default

## Whether redistribute the intermediate flat table before building
kylin.source.hive.redistribute-flat-table=true

## The storage for final cube file in hbase
kylin.storage.url=hbase

## The prefix of hbase table
kylin.storage.hbase.table-name-prefix=KYLIN_

## The namespace for hbase storage
kylin.storage.hbase.namespace=default

## Compression codec for htable, valid value [none, snappy, lzo, gzip, lz4]
kylin.storage.hbase.compression-codec=none


## Hadoop conf folder, will export this as "HADOOP_CONF_DIR" to run spark-submit
## This must contain site xmls of core, yarn, hive, and hbase in one folder
kylin.env.hadoop-conf-dir=/usr/local/bigdata/hadoop-3.1.4/etc/hadoop

完整配文件如下

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#




# The below commented values will effect as default settings
# Uncomment and override them if necessary



#
#### METADATA | ENV ###
#
## The metadata store in hbase
#kylin.metadata.url=kylin_metadata@hbase
#
## metadata cache sync retry times
#kylin.metadata.sync-retries=3
#
## Working folder in HDFS, better be qualified absolute path, make sure user has the right permission to this directory
kylin.env.hdfs-working-dir=/kylin
#
## DEV|QA|PROD. DEV will turn on some dev features, QA and PROD has no difference in terms of functions.
#kylin.env=QA
#
## kylin zk base path
kylin.env.zookeeper-base-path=/kylin
#
#### SERVER | WEB | RESTCLIENT ###
#
## Kylin server mode, valid value [all, query, job]
#kylin.server.mode=all
#
## List of web servers in use, this enables one web server instance to sync up with other servers.
#kylin.server.cluster-servers=localhost:7070
#
## Display timezone on UI,format like[GMT+N or GMT-N]
#kylin.web.timezone=
#
## Timeout value for the queries submitted through the Web UI, in milliseconds
#kylin.web.query-timeout=300000
#
#kylin.web.cross-domain-enabled=true
#
##allow user to export query result
#kylin.web.export-allow-admin=true
#kylin.web.export-allow-other=true
#
## Hide measures in measure list of cube designer, separate by comma
#kylin.web.hide-measures=RAW
#
##max connections of one route
#kylin.restclient.connection.default-max-per-route=20
#
##max connections of one rest-client
#kylin.restclient.connection.max-total=200
#
#### PUBLIC CONFIG ###
#kylin.engine.default=2
#kylin.storage.default=2
#kylin.web.hive-limit=20
#kylin.web.help.length=4
#kylin.web.help.0=start|Getting Started|http://kylin.apache.org/docs/tutorial/kylin_sample.html
#kylin.web.help.1=odbc|ODBC Driver|http://kylin.apache.org/docs/tutorial/odbc.html
#kylin.web.help.2=tableau|Tableau Guide|http://kylin.apache.org/docs/tutorial/tableau_91.html
#kylin.web.help.3=onboard|Cube Design Tutorial|http://kylin.apache.org/docs/howto/howto_optimize_cubes.html
#kylin.web.link-streaming-guide=http://kylin.apache.org/
#kylin.htrace.show-gui-trace-toggle=false
#kylin.web.link-hadoop=
#kylin.web.link-diagnostic=
#kylin.web.contact-mail=
#kylin.server.external-acl-provider=
#
#### CUBE MIGRATION
#kylin.cube.migration.enabled=false
#
## Default time filter for job list, 0->current day, 1->last one day, 2->last one week, 3->last one year, 4->all
#kylin.web.default-time-filter=1
#
#### SOURCE ###
#
## Hive client, valid value [cli, beeline]
#kylin.source.hive.client=cli
#
## Absolute path to beeline shell, can be set to spark beeline instead of the default hive beeline on PATH
#kylin.source.hive.beeline-shell=beeline
#
## Parameters for beeline client, only necessary if hive client is beeline
##kylin.source.hive.beeline-params=-n root --hiveconf hive.security.authorization.sqlstd.confwhitelist.append='mapreduce.job.*|dfs.*' -u jdbc:hive2://localhost:10000
#
## While hive client uses above settings to read hive table metadata,
## table operations can go through a separate SparkSQL command line, given SparkSQL connects to the same Hive metastore.
#kylin.source.hive.enable-sparksql-for-table-ops=false
##kylin.source.hive.sparksql-beeline-shell=/path/to/spark-client/bin/beeline
##kylin.source.hive.sparksql-beeline-params=-n root --hiveconf hive.security.authorization.sqlstd.confwhitelist.append='mapreduce.job.*|dfs.*' -u jdbc:hive2://localhost:10000
#
kylin.source.hive.keep-flat-table=false
#
## Hive database name for putting the intermediate flat tables
kylin.source.hive.database-for-flat-table=default
#
## Whether redistribute the intermediate flat table before building
kylin.source.hive.redistribute-flat-table=true
## Define how to access to hive metadata
## When user deploy kylin on AWS EMR and Glue is used as external metadata, use gluecatalog instead
#kylin.source.hive.metadata-type=hcatalog
#
#### STORAGE ###
#
## The storage for final cube file in hbase
kylin.storage.url=hbase
#
## The prefix of hbase table
kylin.storage.hbase.table-name-prefix=KYLIN_
#
## The namespace for hbase storage
kylin.storage.hbase.namespace=default
#
## Compression codec for htable, valid value [none, snappy, lzo, gzip, lz4]
kylin.storage.hbase.compression-codec=none
#
## HBase Cluster FileSystem, which serving hbase, format as hdfs://hbase-cluster:8020
## Leave empty if hbase running on same cluster with hive and mapreduce
##kylin.storage.hbase.cluster-fs=
#
## The cut size for hbase region, in GB.
#kylin.storage.hbase.region-cut-gb=5
#
## The hfile size of GB, smaller hfile leading to the converting hfile MR has more reducers and be faster.
## Set 0 to disable this optimization.
#kylin.storage.hbase.hfile-size-gb=2
#
#kylin.storage.hbase.min-region-count=1
#kylin.storage.hbase.max-region-count=500
#
## Optional information for the owner of kylin platform, it can be your team's email
## Currently it will be attached to each kylin's htable attribute
#kylin.storage.hbase.owner-tag=whoami@kylin.apache.org
#
#kylin.storage.hbase.coprocessor-mem-gb=3
#
## By default kylin can spill query's intermediate results to disks when it's consuming too much memory.
## Set it to false if you want query to abort immediately in such condition.
#kylin.storage.partition.aggr-spill-enabled=true
#
## The maximum number of bytes each coprocessor is allowed to scan.
## To allow arbitrary large scan, you can set it to 0.
#kylin.storage.partition.max-scan-bytes=3221225472
#
## The default coprocessor timeout is (hbase.rpc.timeout * 0.9) / 1000 seconds,
## You can set it to a smaller value. 0 means use default.
## kylin.storage.hbase.coprocessor-timeout-seconds=0
#
## clean real storage after delete operation
## if you want to delete the real storage like htable of deleting segment, you can set it to true
#kylin.storage.clean-after-delete-operation=false
#
#### JOB ###
#
## Max job retry on error, default 0: no retry
#kylin.job.retry=0
#
## Max count of concurrent jobs running
#kylin.job.max-concurrent-jobs=10
#
## The percentage of the sampling, default 100%
#kylin.job.sampling-percentage=100
#
## If true, will send email notification on job complete
##kylin.job.notification-enabled=true
##kylin.job.notification-mail-enable-starttls=true
##kylin.job.notification-mail-host=smtp.office365.com
##kylin.job.notification-mail-port=587
##kylin.job.notification-mail-username=kylin@example.com
##kylin.job.notification-mail-password=mypassword
##kylin.job.notification-mail-sender=kylin@example.com
#kylin.job.scheduler.provider.100=org.apache.kylin.job.impl.curator.CuratorScheduler
#kylin.job.scheduler.default=0
#
#### ENGINE ###
#
## Time interval to check hadoop job status
#kylin.engine.mr.yarn-check-interval-seconds=10
#
#kylin.engine.mr.reduce-input-mb=500
#
#kylin.engine.mr.max-reducer-number=500
#
#kylin.engine.mr.mapper-input-rows=1000000
#
## Enable dictionary building in MR reducer
#kylin.engine.mr.build-dict-in-reducer=true
#
## Number of reducers for fetching UHC column distinct values
#kylin.engine.mr.uhc-reducer-count=3
#
## Whether using an additional step to build UHC dictionary
#kylin.engine.mr.build-uhc-dict-in-additional-step=false
#
#
#### CUBE | DICTIONARY ###
#
#kylin.cube.cuboid-scheduler=org.apache.kylin.cube.cuboid.DefaultCuboidScheduler
#kylin.cube.segment-advisor=org.apache.kylin.cube.CubeSegmentAdvisor
#
## 'auto', 'inmem', 'layer' or 'random' for testing 
#kylin.cube.algorithm=layer
#
## A smaller threshold prefers layer, a larger threshold prefers in-mem
#kylin.cube.algorithm.layer-or-inmem-threshold=7
#
## auto use inmem algorithm:
## 1, cube planner optimize job
## 2, no source record
#kylin.cube.algorithm.inmem-auto-optimize=true
#
#kylin.cube.aggrgroup.max-combination=32768
#
#kylin.snapshot.max-mb=300
#
#kylin.cube.cubeplanner.enabled=true
#kylin.cube.cubeplanner.enabled-for-existing-cube=true
#kylin.cube.cubeplanner.expansion-threshold=15.0
#kylin.cube.cubeplanner.recommend-cache-max-size=200
#kylin.cube.cubeplanner.mandatory-rollup-threshold=1000
#kylin.cube.cubeplanner.algorithm-threshold-greedy=8
#kylin.cube.cubeplanner.algorithm-threshold-genetic=23
#
#
#### QUERY ###
#
## Controls the maximum number of bytes a query is allowed to scan storage.
## The default value 0 means no limit.
## The counterpart kylin.storage.partition.max-scan-bytes sets the maximum per coprocessor.
#kylin.query.max-scan-bytes=0
#
#kylin.query.cache-enabled=true
#
## Controls extras properties for Calcite jdbc driver
## all extras properties should undder prefix "kylin.query.calcite.extras-props."
## case sensitive, default: true, to enable case insensitive set it to false
## @see org.apache.calcite.config.CalciteConnectionProperty.CASE_SENSITIVE
#kylin.query.calcite.extras-props.caseSensitive=true
## how to handle unquoted identity, defualt: TO_UPPER, available options: UNCHANGED, TO_UPPER, TO_LOWER
## @see org.apache.calcite.config.CalciteConnectionProperty.UNQUOTED_CASING
#kylin.query.calcite.extras-props.unquotedCasing=TO_UPPER
## quoting method, default: DOUBLE_QUOTE, available options: DOUBLE_QUOTE, BACK_TICK, BRACKET
## @see org.apache.calcite.config.CalciteConnectionProperty.QUOTING
#kylin.query.calcite.extras-props.quoting=DOUBLE_QUOTE
## change SqlConformance from DEFAULT to LENIENT to enable group by ordinal
## @see org.apache.calcite.sql.validate.SqlConformance.SqlConformanceEnum
#kylin.query.calcite.extras-props.conformance=LENIENT
#
## TABLE ACL
#kylin.query.security.table-acl-enabled=true
#
## Usually should not modify this
#kylin.query.interceptors=org.apache.kylin.rest.security.TableInterceptor
#
#kylin.query.escape-default-keyword=false
#
## Usually should not modify this
#kylin.query.transformers=org.apache.kylin.query.util.DefaultQueryTransformer,org.apache.kylin.query.util.KeywordDefaultDirtyHack
#
#### SECURITY ###
#
## Spring security profile, options: testing, ldap, saml
## with "testing" profile, user can use pre-defined name/pwd like KYLIN/ADMIN to login
#kylin.security.profile=testing
#
## Admin roles in LDAP, for ldap and saml
#kylin.security.acl.admin-role=admin
#
## LDAP authentication configuration
#kylin.security.ldap.connection-server=ldap://ldap_server:389
#kylin.security.ldap.connection-username=
#kylin.security.ldap.connection-password=
## When you use the customized CA certificate library for user authentication based on LDAPs, you need to configure this item.
## The value of this item will be added to the JVM parameter javax.net.ssl.trustStore.
#kylin.security.ldap.connection-truststore=
#
## LDAP user account directory;
#kylin.security.ldap.user-search-base=
#kylin.security.ldap.user-search-pattern=
#kylin.security.ldap.user-group-search-base=
#kylin.security.ldap.user-group-search-filter=(|(member={0})(memberUid={1}))
#
## LDAP service account directory
#kylin.security.ldap.service-search-base=
#kylin.security.ldap.service-search-pattern=
#kylin.security.ldap.service-group-search-base=
#
### SAML configurations for SSO
## SAML IDP metadata file location
#kylin.security.saml.metadata-file=classpath:sso_metadata.xml
#kylin.security.saml.metadata-entity-base-url=https://hostname/kylin
#kylin.security.saml.keystore-file=classpath:samlKeystore.jks
#kylin.security.saml.context-scheme=https
#kylin.security.saml.context-server-name=hostname
#kylin.security.saml.context-server-port=443
#kylin.security.saml.context-path=/kylin
#
#### SPARK ENGINE CONFIGS ###
#
## Hadoop conf folder, will export this as "HADOOP_CONF_DIR" to run spark-submit
## This must contain site xmls of core, yarn, hive, and hbase in one folder
kylin.env.hadoop-conf-dir=/usr/local/bigdata/hadoop-3.1.4/etc/hadoop
#
## Estimate the RDD partition numbers
#kylin.engine.spark.rdd-partition-cut-mb=10
#
## Minimal partition numbers of rdd
#kylin.engine.spark.min-partition=1
#
## Max partition numbers of rdd
#kylin.engine.spark.max-partition=5000
#
## Spark conf (default is in spark/conf/spark-defaults.conf)
#kylin.engine.spark-conf.spark.master=yarn
##kylin.engine.spark-conf.spark.submit.deployMode=cluster
#kylin.engine.spark-conf.spark.yarn.queue=default
#kylin.engine.spark-conf.spark.driver.memory=2G
#kylin.engine.spark-conf.spark.executor.memory=4G
#kylin.engine.spark-conf.spark.executor.instances=40
#kylin.engine.spark-conf.spark.yarn.executor.memoryOverhead=1024
#kylin.engine.spark-conf.spark.shuffle.service.enabled=true
#kylin.engine.spark-conf.spark.eventLog.enabled=true
#kylin.engine.spark-conf.spark.eventLog.dir=hdfs\:///kylin/spark-history
#kylin.engine.spark-conf.spark.history.fs.logDirectory=hdfs\:///kylin/spark-history
#kylin.engine.spark-conf.spark.hadoop.yarn.timeline-service.enabled=false
#
#### Spark conf for specific job
#kylin.engine.spark-conf-mergedict.spark.executor.memory=6G
#kylin.engine.spark-conf-mergedict.spark.memory.fraction=0.2
#
## manually upload spark-assembly jar to HDFS and then set this property will avoid repeatedly uploading jar at runtime
##kylin.engine.spark-conf.spark.yarn.archive=hdfs://namenode:8020/kylin/spark/spark-libs.jar
##kylin.engine.spark-conf.spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec
#
## uncomment for HDP
##kylin.engine.spark-conf.spark.driver.extraJavaOptions=-Dhdp.version=current
##kylin.engine.spark-conf.spark.yarn.am.extraJavaOptions=-Dhdp.version=current
##kylin.engine.spark-conf.spark.executor.extraJavaOptions=-Dhdp.version=current
#
#
#### FLINK ENGINE CONFIGS ###
#
### Flink conf (default is in flink/conf/flink-conf.yaml)
#kylin.engine.flink-conf.jobmanager.heap.size=2G
#kylin.engine.flink-conf.taskmanager.heap.size=4G
#kylin.engine.flink-conf.taskmanager.numberOfTaskSlots=1
#kylin.engine.flink-conf.taskmanager.memory.preallocate=false
#kylin.engine.flink-conf.job.parallelism=1
#kylin.engine.flink-conf.program.enableObjectReuse=false
#kylin.engine.flink-conf.yarn.queue=
#kylin.engine.flink-conf.yarn.nodelabel=
#
#### QUERY PUSH DOWN ###
#
##kylin.query.pushdown.runner-class-name=org.apache.kylin.query.adhoc.PushDownRunnerJdbcImpl
#
##kylin.query.pushdown.update-enabled=false
##kylin.query.pushdown.jdbc.url=jdbc:hive2://sandbox:10000/default
##kylin.query.pushdown.jdbc.driver=org.apache.hive.jdbc.HiveDriver
##kylin.query.pushdown.jdbc.username=hive
##kylin.query.pushdown.jdbc.password=
#
##kylin.query.pushdown.jdbc.pool-max-total=8
##kylin.query.pushdown.jdbc.pool-max-idle=8
##kylin.query.pushdown.jdbc.pool-min-idle=0
#
#### JDBC Data Source
##kylin.source.jdbc.connection-url=
##kylin.source.jdbc.driver=
##kylin.source.jdbc.dialect=
##kylin.source.jdbc.user=
##kylin.source.jdbc.pass=
##kylin.source.jdbc.sqoop-home=
##kylin.source.jdbc.filed-delimiter=|
#
#### Livy with Kylin
##kylin.engine.livy-conf.livy-enabled=false
##kylin.engine.livy-conf.livy-url=http://LivyHost:8998
##kylin.engine.livy-conf.livy-key.file=hdfs:///path-to-kylin-job-jar
##kylin.engine.livy-conf.livy-arr.jars=hdfs:///path-to-hadoop-dependency-jar
#
#
#### Realtime OLAP ###
#
## Where should local segment cache located, for absolute path, the real path will be ${KYLIN_HOME}/${kylin.stream.index.path}
#kylin.stream.index.path=stream_index
#
## The timezone for Derived Time Column like hour_start, try set to GMT+N, please check detail at KYLIN-4010
#kylin.stream.event.timezone=
#
## Debug switch for print realtime global dict encode information, please check detail at KYLIN-4141
#kylin.stream.print-realtime-dict-enabled=false
#
## Should enable latest coordinator, please check detail at KYLIN-4167
#kylin.stream.new.coordinator-enabled=true
#
## In which way should we collect receiver's metrics info
##kylin.stream.metrics.option=console/csv/jmx
#
## When enable a streaming cube, should cousme from earliest offset or least offset
#kylin.stream.consume.offsets.latest=true
#
## The parallelism of scan in receiver side
#kylin.stream.receiver.use-threads-per-query=8
#
## How coordinator/receiver register itself into StreamMetadata, there are three option:
## 1. hostname:port, then kylin will set the config ip and port as the currentNode;
## 2. port, then kylin will get the node's hostname and append port as the currentNode;
## 3. not set, then kylin will get the node hostname address and set the hostname and defaultPort(7070 for coordinator or 9090 for receiver) as the currentNode.
##kylin.stream.node=
#
## Auto resubmit after job be discarded
#kylin.stream.auto-resubmit-after-discard-enabled=true

4、检查运行环境

Kylin 运行在 Hadoop 集群上,对各个组件的版本、访问权限及 CLASSPATH 等都有一定的要求,为了避免遇到各种环境问题,您可以运行 $KYLIN_HOME/bin/check-env.sh 脚本来进行环境检测,如果您的环境存在任何的问题,脚本将打印出详细报错信息。如果没有报错信息,代表环境适合 Kylin 运行。

cd /usr/local/bigdata/apache-kylin-3.1.3-bin-hadoop3/bin

[alanchan@server4 bin]$ check-env.sh 
Retrieving hadoop conf dir...
...................................................[PASS]
KYLIN_HOME is set to /usr/local/bigdata/apache-kylin-3.1.3-bin-hadoop3
Checking HBase
...................................................[PASS]
Checking hive
...................................................[PASS]
Checking hadoop shell
...................................................[PASS]
Checking hdfs working dir
...................................................[PASS]
Retrieving Spark dependency...
Optional dependency spark not found, if you need this; set SPARK_HOME, or run bin/download-spark.sh
...................................................[PASS]
Retrieving Flink dependency...
Optional dependency flink not found, if you need this; set FLINK_HOME, or run bin/download-flink.sh
...................................................[PASS]
Retrieving kafka dependency...
Couldn't find kafka home. If you want to enable streaming pr`在这里插入代码片`ocessing, Please set KAFKA_HOME to the path which contains kafka dependencies.
...................................................[PASS]

Checking environment finished successfully. To check again, run 'bin/check-env.sh' manually.

5、启动依賴集群

1)、启动zookeeper

cd /usr/local/bigdata/apache-zookeeper-3.7.1/bin
zkServer.sh start

#驗證
zkServer.sh status

2)、启动Hadoop集群

start-all.sh

mapred --daemon start historyserver

yarn --daemon start timelineserver
#驗證
hdfs:http://server1:9870/dfshealth.html#tab-overview
yarn:http://server1:8088/cluster/apps

3)、启动HBase集群

start-hbase.sh
#驗證
http://server1:16010/master-status

4)、启动 hive

nohup /usr/local/bigdata/apache-hive-3.1.2-bin/bin/hive --service metastore > /usr/local/bigdata/apache-hive-3.1.2-bin/logs/metastore.log --hiveconf hive.root.logger=WARN,console 2>&1 &
nohup /usr/local/bigdata/apache-hive-3.1.2-bin/bin/hive --service hiveserver2 > /usr/local/bigdata/apache-hive-3.1.2-bin/logs/hiveserver2.log --hiveconf hive.root.logger=WARN,console 2>&1 &

#驗證
[alanchan@server4 bin]$ beeline

Beeline version 3.1.2 by Apache Hive
beeline> ! connect jdbc:hive2://server4:10000
Connecting to jdbc:hive2://server4:10000
Enter username for jdbc:hive2://server4:10000: alanchan
Enter password for jdbc:hive2://server4:10000: ********(rootroot)
Connected to: Apache Hive (version 3.1.2)
Driver: Hive JDBC (version 3.1.2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://server4:10000> show databases;
+----------------+
| database_name  |
+----------------+
| default        |
| test           |
| testhive       |
+----------------+
3 rows selected (1.388 seconds)

5)、启动 kylin

cd /usr/local/bigdata/apache-kylin-3.1.3-bin-hadoop3/bin

kylin.sh start
# 解決異常1
Retrieving hive dependency...

/usr/local/bigdata/hadoop-3.1.4/libexec/hadoop-functions.sh:行2360: HADOOP_ORG.APACHE.HADOOP.HBASE.UTIL.GETJAVAPROPERTY_USER: 错误的替换
/usr/local/bigdata/hadoop-3.1.4/libexec/hadoop-functions.sh:行2455: HADOOP_ORG.APACHE.HADOOP.HBASE.UTIL.GETJAVAPROPERTY_OPTS: 错误的替换

Hive Session ID = e55203bc-e634-4cd6-84f7-60e52c570730

Logging initialized using configuration in jar:file:/usr/local/bigdata/apache-hive-3.1.2-bin/lib/hive-common-3.1.2.jar!/hive-log4j2.properties Async: true
Hive Session ID = 37aba532-4fb3-4818-900e-c13aa05e0d72
export hive_warehouse_dir=/user/hive/warehouse
Retrieving hbase dependency...
/usr/local/bigdata/hadoop-3.1.4/libexec/hadoop-functions.sh:行2360: HADOOP_ORG.APACHE.HADOOP.HBASE.UTIL.GETJAVAPROPERTY_USER: 错误的替换
/usr/local/bigdata/hadoop-3.1.4/libexec/hadoop-functions.sh:行2455: HADOOP_ORG.APACHE.HADOOP.HBASE.UTIL.GETJAVAPROPERTY_OPTS: 错误的替换
错误: 找不到或无法加载主类 org.apache.hadoop.hbase.util.GetJavaProperty
hbase-common lib not found

#解决方案:hbase脚本 CLASSPATH中添加HBase lib目录
#停止hbase集群

# 找到这一行(大约是158行)
CLASSPATH=${CLASSPATH}:$JAVA_HOME/lib/tools.jar
# 修改为:(根据自己的路径地址)
cd /usr/local/bigdata/hbase-2.1.0/bin

vi hbase
# CLASSPATH=${CLASSPATH}:$JAVA_HOME/lib/tools.jar
CLASSPATH=${CLASSPATH}:$JAVA_HOME/lib/tools.jar:/usr/local/bigdata/hbase-2.1.0/lib/*
#保存退出,scp文件至hbase整個集群
#啓動hbase集群
#啓動kylin

#出現如下提示,表示kylin啓動成功
A new Kylin instance is started by alanchan. To stop it, run 'kylin.sh stop'
Check the log at /usr/local/bigdata/apache-kylin-3.1.3-bin-hadoop3/logs/kylin.log
Web UI is at http://server4:7070/kylin

6、驗證 kylin

http://server4:7070/kylin/login
用戶名:ADMIN
密碼:KYLIN
用戶名和密碼均需大寫
在这里插入图片描述
登錄成功后
在这里插入图片描述
以上,完成了kylin的基本介绍和单机部署的说明及验证。

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 3
    评论
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

一瓢一瓢的饮 alanchanchn

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值