目录
2.1 为ukylin用户设置KYLIN_HOME的环境变量
5. 在kylin UI查看刚才创建的 learn_kylin project
1. 准备工作
1.1 创建kylin用户
在每台服务器上创建启动kylin的用户,执行命令
useradd kylin
passwd kylin
例如设置密码为kylin_PWD245847
1.2 创建kylin二级制文件存放目录
mkdir /data4/kylin
设置所有者为kylin用户
chown kylin:kylin -R kylin/
1.3 上传kylin文件到服务器
下载kylin文件 https://archive.apache.org/dist/kylin/apache-kylin-3.1.1/apache-kylin-3.1.1-bin-cdh57.tar.gz
注释:kylin下载版本,即release note说明,请参考链接:
https://archive.apache.org/dist/kylin/
https://kylin.apache.org/download/
https://apache.website-solution.net/kylin/
将下载好的 apache-kylin-3.1.1-bin-cdh57.tar.gz上传至每个服务器的/data4/kylin目录下,上传好的文件如下图所示
解压文件:
tar -zxvf apache-kylin-3.1.1-bin-cdh57.tar.gz
2. 开始安装kylin
2.1 为ukylin用户设置KYLIN_HOME的环境变量
切换liunx用户到kylin,执行命令
vi ~/.bash_profile
#添加环境变量
KYLIN_HOME=/data4/kylin/apache-kylin-3.1.1-bin-cdh57
PATH=$PATH:$KYLIN_HOME/bin
SPARK_HOME=/opt/cloudera/parcels/SPARK2
使环境变量立即生效
source ~/.bash_profile
2.2 检查环境
2.2.1 执行check-env.sh,检查结果如下:
发现用户kylin没有写权限,因为kylin启动后会自动在hdfs集群上创建/kylin目录,所以应该确保用户kylin具有hdfs集群上的/kylin目录的写权限
hadoop shell命令,请参考:http://hadoop.apache.org/docs/r1.0.4/cn/hdfs_shell.html#mkdir
为用户kylin赋写权限,执行命令:
在服务上切换成hdfs用户,su - hdfs,
然后执行: hadoop fs -mkdir /kylin ,在hdfs集群上创建/kylin目录
修改/kylin目录的用户权限 ,执行 hadoop fs -chown -R kylin:kylin /kylin
同事也设置/user/kylin目录的所以者为kylin
su- hdfs
hadoop fs -chown -R kylin:supergroup /user/kylin
创建好的 /kyln如下所示
再次执行环境检查命令
根据上述提示,环境检查是成功的。注意:上面提示SPARK_HOME、FLINK_HOME和KAKF_HOME没找到,如果要用到这些组件的功能,需要设置一下。
2.2.2 接下来我们设置一下这些可选的环境变量
- 设置SPARK_HOME
手动下载spark
spark_version="2.4.5"
wget http://archive.apache.org/dist/spark/spark-${spark_version}/spark-${spark_version}-bin-hadoop2.7.tgz
参考链接:https://blog.csdn.net/weixin_47303949/article/details/108102215
3. 启动kylin
上面2.2.2节,可以不设置那些可选的环境变量(SPARK_HOME、FLINK_HOME和KAKF_HOME),不设置只是这些功能不用用而已,但是并不影响我们正常启动kylin。
由于上述已经配置了kylin的环境变量,因此这里可以直接执行kylin.sh start命令,启动成功后的日志如下所示
在浏览器中访问kylin的UI页面
默认账号 ADMIN/KYLIN,登陆页面如下所示
登陆成功后的主页面如下所示:
初始用户名和密码是 ADMIN/KYLIN。 服务器启动后,可以通过查看 $KYLIN_HOME/logs/kylin.log 获得运行时日志。
4. 创建Sample Cube
在linux服务器上,使用ukylin,执行命令
$KYLIN_HOME/bin/sample.sh
执行该文件时,其实是执行了hive -e 或者 beeline -f (如果配置为beeline模式)命令,来创建表和加载数据:
如hive client 设置为beeline模式(参考https://www.imooc.com/article/300142?block_id=tuijian_wz)
## Hive client, valid value [cli, beeline]
kylin.source.hive.client=beeline
#
## Absolute path to beeline shell, can be set to spark beeline instead of the default hive beeline on PATH
kylin.source.hive.beeline-shell=beeline
#
## Parameters for beeline client, only necessary if hive client is beeline
kylin.source.hive.beeline-params=-n kylin -p kylin_PWD245847 --hiveconf hive.security.authorization.sqlstd.confwhitelist.append='mapreduce.job.*|dfs.*' -u jdbc:hive2://hiveserver2Ip:10000
#
当执行kylin.sh start时,如果hive client设置为了beeline模式,则sql语句执行如下图所示,注意观察最后一行,其实就是执行了beeline -f /tmp/XXX.sql文件 (beeline -n kylin -p kylin_PWD245847 --hiveconf hive.security.authorization.sqlstd.confwhitelist.append='mapreduce.job.*|dfs.*' -u jdbc:hive2://hiveserver2IP:10000 --hiveconf hive.merge.mapredfiles=false --hiveconf hive.auto.convert.join=true --hiveconf dfs.replication=2 --hiveconf hive.exec.compress.output=true --hiveconf hive.auto.convert.join.noconditionaltask=true --hiveconf mapreduce.job.split.metainfo.maxsize=-1 --hiveconf hive.merge.mapfiles=false --hiveconf hive.auto.convert.join.noconditionaltask.size=100000000 --hiveconf hive.stats.autogather=true -f /tmp/114cd177-e025-483c-bf6e-93a7bc16269c.hql;ret_code=$?;rm -f /tmp/114cd177-e025-483c-bf6e-93a7bc16269c.hql;exit $ret_code):
set mapreduce.job.name=Create Intermediate Flat Hive Table kylin_sales_cube 04aae945-67d3-b67a-1dae-3392f1100b8f-00;
USE default;
DROP TABLE IF EXISTS kylin_intermediate_kylin_sales_cube_fe62441a_c367_8180_4214_654e9aa719dd;
CREATE EXTERNAL TABLE IF NOT EXISTS kylin_intermediate_kylin_sales_cube_fe62441a_c367_8180_4214_654e9aa719dd
(
\`KYLIN_SALES_TRANS_ID\` bigint
,\`KYLIN_SALES_PART_DT\` date
,\`KYLIN_SALES_LEAF_CATEG_ID\` bigint
,\`KYLIN_SALES_LSTG_SITE_ID\` int
,\`KYLIN_CATEGORY_GROUPINGS_META_CATEG_NAME\` string
,\`KYLIN_CATEGORY_GROUPINGS_CATEG_LVL2_NAME\` string
,\`KYLIN_CATEGORY_GROUPINGS_CATEG_LVL3_NAME\` string
,\`KYLIN_SALES_LSTG_FORMAT_NAME\` string
,\`KYLIN_SALES_SELLER_ID\` bigint
,\`KYLIN_SALES_BUYER_ID\` bigint
,\`BUYER_ACCOUNT_ACCOUNT_BUYER_LEVEL\` int
,\`SELLER_ACCOUNT_ACCOUNT_SELLER_LEVEL\` int
,\`BUYER_ACCOUNT_ACCOUNT_COUNTRY\` string
,\`SELLER_ACCOUNT_ACCOUNT_COUNTRY\` string
,\`BUYER_COUNTRY_NAME\` string
,\`SELLER_COUNTRY_NAME\` string
,\`KYLIN_SALES_OPS_USER_ID\` string
,\`KYLIN_SALES_OPS_REGION\` string
,\`KYLIN_SALES_PRICE\` decimal(19,4)
)
STORED AS SEQUENCEFILE
LOCATION 'hdfs://ns1/kylin/kylin_metadata/kylin-04aae945-67d3-b67a-1dae-3392f1100b8f/kylin_intermediate_kylin_sales_cube_fe62441a_c367_8180_4214_654e9aa719dd';
ALTER TABLE kylin_intermediate_kylin_sales_cube_fe62441a_c367_8180_4214_654e9aa719dd SET TBLPROPERTIES('auto.purge'='true');
INSERT OVERWRITE TABLE \`kylin_intermediate_kylin_sales_cube_fe62441a_c367_8180_4214_654e9aa719dd\` SELECT
\`KYLIN_SALES\`.\`TRANS_ID\` as \`KYLIN_SALES_TRANS_ID\`
,\`KYLIN_SALES\`.\`PART_DT\` as \`KYLIN_SALES_PART_DT\`
,\`KYLIN_SALES\`.\`LEAF_CATEG_ID\` as \`KYLIN_SALES_LEAF_CATEG_ID\`
,\`KYLIN_SALES\`.\`LSTG_SITE_ID\` as \`KYLIN_SALES_LSTG_SITE_ID\`
,\`KYLIN_CATEGORY_GROUPINGS\`.\`META_CATEG_NAME\` as \`KYLIN_CATEGORY_GROUPINGS_META_CATEG_NAME\`
,\`KYLIN_CATEGORY_GROUPINGS\`.\`CATEG_LVL2_NAME\` as \`KYLIN_CATEGORY_GROUPINGS_CATEG_LVL2_NAME\`
,\`KYLIN_CATEGORY_GROUPINGS\`.\`CATEG_LVL3_NAME\` as \`KYLIN_CATEGORY_GROUPINGS_CATEG_LVL3_NAME\`
,\`KYLIN_SALES\`.\`LSTG_FORMAT_NAME\` as \`KYLIN_SALES_LSTG_FORMAT_NAME\`
,\`KYLIN_SALES\`.\`SELLER_ID\` as \`KYLIN_SALES_SELLER_ID\`
,\`KYLIN_SALES\`.\`BUYER_ID\` as \`KYLIN_SALES_BUYER_ID\`
,\`BUYER_ACCOUNT\`.\`ACCOUNT_BUYER_LEVEL\` as \`BUYER_ACCOUNT_ACCOUNT_BUYER_LEVEL\`
,\`SELLER_ACCOUNT\`.\`ACCOUNT_SELLER_LEVEL\` as \`SELLER_ACCOUNT_ACCOUNT_SELLER_LEVEL\`
,\`BUYER_ACCOUNT\`.\`ACCOUNT_COUNTRY\` as \`BUYER_ACCOUNT_ACCOUNT_COUNTRY\`
,\`SELLER_ACCOUNT\`.\`ACCOUNT_COUNTRY\` as \`SELLER_ACCOUNT_ACCOUNT_COUNTRY\`
,\`BUYER_COUNTRY\`.\`NAME\` as \`BUYER_COUNTRY_NAME\`
,\`SELLER_COUNTRY\`.\`NAME\` as \`SELLER_COUNTRY_NAME\`
,\`KYLIN_SALES\`.\`OPS_USER_ID\` as \`KYLIN_SALES_OPS_USER_ID\`
,\`KYLIN_SALES\`.\`OPS_REGION\` as \`KYLIN_SALES_OPS_REGION\`
,\`KYLIN_SALES\`.\`PRICE\` as \`KYLIN_SALES_PRICE\`
FROM \`DEFAULT\`.\`KYLIN_SALES\` as \`KYLIN_SALES\`
INNER JOIN \`DEFAULT\`.\`KYLIN_CAL_DT\` as \`KYLIN_CAL_DT\`
ON \`KYLIN_SALES\`.\`PART_DT\` = \`KYLIN_CAL_DT\`.\`CAL_DT\`
INNER JOIN \`DEFAULT\`.\`KYLIN_CATEGORY_GROUPINGS\` as \`KYLIN_CATEGORY_GROUPINGS\`
ON \`KYLIN_SALES\`.\`LEAF_CATEG_ID\` = \`KYLIN_CATEGORY_GROUPINGS\`.\`LEAF_CATEG_ID\` AND \`KYLIN_SALES\`.\`LSTG_SITE_ID\` = \`KYLIN_CATEGORY_GROUPINGS\`.\`SITE_ID\`
INNER JOIN \`DEFAULT\`.\`KYLIN_ACCOUNT\` as \`BUYER_ACCOUNT\`
ON \`KYLIN_SALES\`.\`BUYER_ID\` = \`BUYER_ACCOUNT\`.\`ACCOUNT_ID\`
INNER JOIN \`DEFAULT\`.\`KYLIN_ACCOUNT\` as \`SELLER_ACCOUNT\`
ON \`KYLIN_SALES\`.\`SELLER_ID\` = \`SELLER_ACCOUNT\`.\`ACCOUNT_ID\`
INNER JOIN \`DEFAULT\`.\`KYLIN_COUNTRY\` as \`BUYER_COUNTRY\`
ON \`BUYER_ACCOUNT\`.\`ACCOUNT_COUNTRY\` = \`BUYER_COUNTRY\`.\`COUNTRY\`
INNER JOIN \`DEFAULT\`.\`KYLIN_COUNTRY\` as \`SELLER_COUNTRY\`
ON \`SELLER_ACCOUNT\`.\`ACCOUNT_COUNTRY\` = \`SELLER_COUNTRY\`.\`COUNTRY\`
WHERE 1=1 AND (\`KYLIN_SALES\`.\`PART_DT\` >= '2012-01-01' AND \`KYLIN_SALES\`.\`PART_DT\` < '2020-11-30')
;
EOL
beeline -n kylin -p kylin_PWD245847 --hiveconf hive.security.authorization.sqlstd.confwhitelist.append='mapreduce.job.*|dfs.*' -u jdbc:hive2://hiveserver2Ip:10000 --hiveconf hive.merge.mapredfiles=false --hiveconf hive.auto.convert.join=true --hiveconf dfs.replication=2 --hiveconf hive.exec.compress.output=true --hiveconf hive.auto.convert.join.noconditionaltask=true --hiveconf mapreduce.job.split.metainfo.maxsize=-1 --hiveconf hive.merge.mapfiles=false --hiveconf hive.auto.convert.join.noconditionaltask.size=100000000 --hiveconf hive.stats.autogather=true -f /tmp/114cd177-e025-483c-bf6e-93a7bc16269c.hql;ret_code=$?;rm -f /tmp/114cd177-e025-483c-bf6e-93a7bc16269c.hql;exit $ret_code
初次执行时,遇到的权限的问题,如下图
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:User ukylin does not have privileges for CREATETABLE)
即用户ukylin没有创建表的权限,从日志可以看到,kylin是要在默认库创建表,如下提示语句:
Going to create sample tables in hive to database DEFAULT by cli
给用户ukylin添加到hive用户组里:
执行 usermod -G hive ukylin
给用户ukylin添加default库的创建表的权限。然后重新执行,通过执行日志看到,一共创建了以下 5 张表
default.kylin_sales、default.kylin_account、default.kylin_country、default.kylin_cal_dt、default.kylin_category_groupings
这几个表在hdfs上对应的location路径是
/user/hive/warehouse/kylin_account
/user/hive/warehouse/kylin_cal_dt
。
。
如下图所示,而不是hive默认的/user/hive/warehouse/default.db目录下,这里需要注意。
打印的成功日志如下所示:
上述命令执行完,会在kylin创建一个名叫 learn_kylin的project。
5. 在kylin UI查看刚才创建的 learn_kylin project
完成后登陆kylin,点击System->Configuration->Reload Metadata来重载元数据
元数据重载完成后你可以在左上角的Project中看到一个名为learn_kylin的项目,它包含kylin_sales_cube和kylin_streaming_cube, 它们分别为batch cube和streaming cube,你可以直接对kylin_sales_cube进行构建,构建完成后就可以查询。如下图所示:
6. Sample Cube 构建
6.1 构建cube
点击Monitor链接,观察刚才构建的cube,发现此时发生了一个异常,如下图所示:
6.2 排查异常
观察kylin的日志,less /data4/kylin/apache-kylin-3.1.1-bin-cdh57/logs/kylin.log
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:User ukylin does not have privileges for CREATETABLE)
上述报用户启动kylin的ukylin没有权限创建表,继续观察上面的日志,如下图会创建外表,表的数据放在/kylin目录下
在hue中创建外表进行测试
由于cdh集群中使用了sentry权限管理,可见需要给ukylin赋予hdfs该路径下写权限,因此接下来在sentry中赋予hdfs的acl访问权限
6.2.1 赋ACL权限
7. 集群配置
7.1 kylin.properties文件
kylin.metadata.url=kylin_metadata@hbase
kylin.server.cluster-servers=node1:7070,node2:7070,node3:7070
kylin.server.mode=all
#配置CuratorScheculer
进行任务调度
kylin.job.scheduler.default=100
kylin.server.self-discovery-enabled=true
将node3上修改完成的kylin安装文件分发到其他节点上,依次启动每个kylin实例
7.2 配置nginx负载均衡
# kylin集群实例
upstream kylin {
# 使用ip-hash负载均衡机制,使得来自同一ip的请求,都路由到同一kylin实例,除非这个实例宕机退出
ip_hash;
server node302:7070; #Kylin Server 1
server node303:7070; # Kylin Server 2
server node304:7070; # Kylin Server 2
}
server {
listen 17070;
location / {
# kylin
proxy_pass http://kylin;
}
error_page 500 502 503 504 /50x.html;
location = /50x.html {
root html;
}
}
启动nginx
7.3 访问集群
访问nginx监听的端口,登陆kylin页面,在system的tab页可以看到下面的集群可视化结果
8. 参考链接
- kylin安装配置
http://kylin.apache.org/cn/docs/gettingstarted/kylin-quickstart.html
https://blog.csdn.net/weixin_47303949/article/details/108102215
http://kylin.apache.org/cn/docs/install/kylin_cluster.html
https://www.cnblogs.com/qqflying/p/10918330.html
- sample cube 构建
http://kylin.apache.org/cn/docs/tutorial/kylin_sample.html
https://www.cnblogs.com/qqflying/p/10918330.html
- kylin调度:
https://blog.csdn.net/weixin_39074599/article/details/92383998
https://blog.csdn.net/zhangyaqian96/article/details/98872517
https://cwiki.apache.org/confluence/display/KYLIN/Comparison+of+Kylin+Job+scheduler