kylin与hive视图

1 去掉hive表中无意的列

create external table dim_jd_brand(rowkey string,brand_id string,brand_name string,category_id string, category_name string) 
    stored by'org.apache.hadoop.hive.hbase.HBaseStorageHandler' 
    with serdeproperties("hbase.columns.mapping" = ":key,o:brand_id,o:brand_name,o:category_id,o:category_name") 
tblproperties("hbase.table.name" = "pro:ods_jd_brand");

brand_id列都是空值,这列已经没有啥价值,但是外表已经创建,如何修改呢?外表并不存数据,故而删除重建即可。

drop table dim_jd_brand;

2 kylin与hive视图
Apache Kylin高级部分之使用Hive视图,

create view v_jd_category_brand
as 
select djc.one_category_name,djc.two_category_name,djc.three_category_name,djc.category_id,djc.category_name
    ,djb.brand_name
 from dim_jd_category djc inner join dim_jd_brand djb on djc.category_id=djb.category_id

之前在旧的model中创建,总会出现很多问题,kylin的可视化也不是测试充分,故还是建议删掉旧的model

java.lang.IllegalArgumentException: bad data type -- , does not match (any|char|varchar|string|boolean|byte|binary|int|short|long|integer|tinyint|smallint|bigint|int4|long8|float|real|double|decimal|numeric|date|time|datetime|timestamp|_literal_type|_derived_type|hllc|bitmap|topn|raw|extendedcolumn|percentile|dim_dc|stddev_sum|bitmap_map)\s*(?:[(]([\d\s,]+)[)])?

在Cube Designer中Measures中没有返回类型,这个应该是kylin自身的问题,我是在原来的model基础做的,所以总是遗留各种各种的问题,我删掉model,重新创建model,就没有出现真么乱七八糟的问题,而且维度也都能选了。
1
3 hive join
hive mapjoin
29.697 seconds,大表在前,小表在后,MapJoin适用的场景:在二个要连接的表中,有一个很大,有一个很小,这个小表可以存放在内存中而不影响性能。
Hive v0.7之前,需要使用hint提示 /*+ mapjoin(table) */才会执行MapJoin。那之后呢?

select /*+mapjoin(djc)*/djc.one_category_name,count(*) cn
 from dim_jd_brand djb
 left join dim_jd_category djc on djc.category_id=djb.category_id
 group by djc.one_category_name;

23.182 seconds,直接inner join

select djc.one_category_name,count(*) cn
 from dim_jd_category djc inner join dim_jd_brand djb on djc.category_id=djb.category_id
 group by djc.one_category_name;

23.941 seconds,小表在前、大表在后,不指定join模式

select /*+mapjoin(djc)*/djc.one_category_name,count(*) cn
 from dim_jd_category djc join dim_jd_brand djb on djc.category_id=djb.category_id
 group by djc.one_category_name;

23.121 seconds,小表在前、大表在后,inner join

select /*+mapjoin(djc)*/djc.one_category_name,count(*) cn
 from dim_jd_category djc inner join dim_jd_brand djb on djc.category_id=djb.category_id
 group by djc.one_category_name;

不过在hive的视图中不能这么用了,用了会报错。

Error: Error while compiling statement: FAILED: SemanticException line 1:24 missing EOF at '.' near 'djc' in definition of VIEW v_jd_category_brand [
select mapjoin(djc)`djc`.`one_category_name`,`djc`.`two_category_name`,`djc`.`three_category_name`,`djc`.`category_id`,`djc`.`category_name`
,`djb`.`brand_name`
from `pro`.`dim_jd_category` `djc` inner join `pro`.`dim_jd_brand` `djb` on `djc`.`category_id`=`djb`.`category_id`
] used as v_jd_category_brand at Line 4:6 (state=42000,code=40000)
Closing: 0: jdbc:hive2://sh102.shahu.com:2181,sh103.shahu.com:2181,sh104.shahu.com:2181/default;password=hdfs;serviceDiscoveryMode=zooKeeper;user=hdfs;zooKeeperNamespace=hiveserver2

4 hdfs预警
HDFS Storage Capacity Usage (Weekly)
The variance for this alert is 7,965,949,817B which is 14% of the 58,372,960,768B average (5,837,296,077B is the limit)
1
参考Hadoop-The variance for this alert is **MB which is 20% of the **MB average (**MB is the limit,好像也没什么变化

[hdfs@sh102 root]$ hadoop fs -expunge
20/12/02 16:06:01 INFO fs.TrashPolicyDefault: TrashPolicyDefault#deleteCheckpoint for trashRoot: hdfs://sh102.shahu.com:8020/user/hdfs/.Trash
20/12/02 16:06:01 INFO fs.TrashPolicyDefault: TrashPolicyDefault#deleteCheckpoint for trashRoot: hdfs://sh102.shahu.com:8020/user/hdfs/.Trash
20/12/02 16:06:01 INFO fs.TrashPolicyDefault: TrashPolicyDefault#createCheckpoint for trashRoot: hdfs://sh102.shahu.com:8020/user/hdfs/.Trash

5 hive连不上hbase
下面是来自ambari的日志,没看懂什么意思

Connection failed on host sh103.shahu.com:10000 (Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/HIVE/package/alerts/alert_hive_thrift_port.py", line 204, in execute
    ldap_password=ldap_password)
  File "/usr/lib/ambari-agent/lib/resource_management/libraries/functions/hive_check.py", line 84, in check_thrift_port_sasl
    timeout_kill_strategy=TerminateStrategy.KILL_PROCESS_TREE,
  File "/usr/lib/ambari-agent/lib/resource_management/core/base.py", line 166, in __init__
    self.env.run()
  File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py", line 160, in run
    self.run_action(resource, action)
  File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py", line 124, in run_action
    provider_action()
  File "/usr/lib/ambari-agent/lib/resource_management/core/providers/system.py", line 263, in action_run
    returns=self.resource.returns)
  File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 72, in inner
    result = function(command, **kwargs)
  File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 102, in checked_call
    tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy, returns=returns)
  File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 150, in _call_wrapper
    result = _call(command, **kwargs_copy)
  File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 308, in _call
    raise ExecuteTimeoutException(err_msg)
ExecuteTimeoutException: Execution of 'ambari-sudo.sh su ambari-qa -l -s /bin/bash -c 'export  PATH='"'"'/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/var/lib/ambari-agent:/bin/:/usr/bin/:/usr/lib/hive/bin/:/usr/sbin/'"'"' ; beeline -n hive -u '"'"'jdbc:hive2://sh103.shahu.com:10000/;transportMode=binary'"'"'  -e '"'"';'"'"' 2>&1 | awk '"'"'{print}'"'"' | grep -i -e '"'"'Connected to:'"'"' -e '"'"'Transaction isolation:'"'"''' was killed due timeout after 60 seconds
)

进入hbase,跟着前辈学别人怎么看gc日志,HBase GC故障排查G1垃圾回收日志分析JVM性能调优实践——G1 垃圾收集器分析、调优篇,JVM性能调优实践——G1 垃圾收集器介绍篇好像也不知道从哪里下牙。因为我的gc日志,跟大神的们的好像不一样。
-XX:SurvivorRatio参数了解

# youngGC gc log的输出
2020-12-02T16:33:46.165+0800: 170001.641: [GC pause (young) 170001.641: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 145525, predicted base time: 11.16 ms, remaining time: 88.84 ms, target pause time: 100.00 ms]
# 新生代eden和survivors比例可以通过-XX:SurvivorRatio
# [hbase regionserver进程启动参数](https://blog.csdn.net/u011098327/article/details/80702925),-XX:SurvivorRatio=2,[hbase gc调优(CMS与G1)参数](https://www.jianshu.com/p/c85acaccb2f2)也设置为2,为什么呢?
 170001.641: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 299 regions, survivors: 2 regions, predicted young region time: 0.21 ms]
 170001.641: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 299 regions, survivors: 2 regions, old: 0 regions, predicted pause time: 11.37 ms, target pause time: 100.00 ms]
, 0.0126424 secs]
# GC并行任务
# 并行阶段的汇总信息。总共花费时间9.3ms以及GC的工作线程数8
   [Parallel Time: 9.3 ms, GC Workers: 8]
   # GC开始时间戳信息
      [GC Worker Start (ms): Min: 170001641.0, Avg: 170001642.1, Max: 170001645.5, Diff: 4.5]
      # 以下详细记录并行阶段的GC活动
      # Diff是偏移平均时间的值。Diff越小越好,说明每个工作线程的速度都很均匀,如果Diff值偏大,就要看下面具体哪一项活动产生的波动。
      # Avg代表平均时间值。如果Avg跟Min,Max偏差不大是比较正常的,否则也要详细分析具体的偏差值大的任务。
      # 外部根区扫描。外部根是堆外区。JNI引用,JVM系统目录,Classloaders等。后面跟着具体的时间信息。
      [Ext Root Scanning (ms): Min: 0.0, Avg: 1.1, Max: 2.0, Diff: 2.0, Sum: 8.4]
      # UpdateRS:更新RSet的时间信息
      # -XX:MaxGCPauseMillis参数是限制G1的暂停之间,一般RSet更新的时间小于10%的目标暂停时间是比较可取的。如果花费在RSetUpdate的时间过长,可以修改其占用总暂停时间的百分比-XX:G1RSetUpdatingPauseTimePercent。这个参数的默认值是10。
      [Update RS (ms): Min: 3.6, Avg: 6.0, Max: 7.2, Diff: 3.6, Sum: 47.6]
      # Processed Buffers:已处理缓冲区。这个阶段处理的是在优化线程中处理dirty card分区扫描时记录的日志缓冲区
         [Processed Buffers: Min: 33, Avg: 74.8, Max: 112, Diff: 79, Sum: 598]
      # 关于RSet的粒度。如果RSet中的Bitmap是粗粒度的,那么就会增加RSet扫描的时间。如下所示的扫描时间,说明还没有粗化的RSet。
      # 如果观察到RS的处理时间较长,可以使用-XX:+G1SummarizeRSetStats参数,在GC结束后打印RSet的详细信息。一般在debug环境排查用。还有一个辅助参数G1SummarizeRSetStatsPeriod=0用来控制第几次GC后统计一次RSet信息。
      [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.0, Sum: 0.3]
      # 代码根的扫描。只有在分区的RSet有强代码根时会检查CSet的对内引用,例如常量池
      [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
      # 该任务主要是对CSet中存活对象进行转移(复制)。对象拷贝的时间一般占用暂停时间的主要部分。如果拷贝时间和”预测暂停时间“有相差很大,也可以调整年轻代尺寸大小。
      [Object Copy (ms): Min: 0.8, Avg: 0.9, Max: 1.0, Diff: 0.2, Sum: 7.0]
      # 这里的终止主要是终止工作线程。Work线程在工作终止前会检查其他工作线程的任务,如果其他work线程有没完成的任务,会抢活。如果终止时间较长,可能是某个work线程在某项任务执行时间过长。
      [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
         [Termination Attempts: Min: 23, Avg: 38.5, Max: 56, Diff: 33, Sum: 308]
      # 花在GC之外的工作线程的时间,比如因为JVM的某个活动,导致GC线程被停掉。这部分消耗的时间不是真正花在GC上,只是作为log的一部分记录。
      [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.3]
      # 并行阶段的GC汇总,包含了GC以及GC Worker Other的总时间。
      [GC Worker Total (ms): Min: 4.6, Avg: 8.0, Max: 9.1, Diff: 4.5, Sum: 63.8]
      # GC结束时间戳信息
      [GC Worker End (ms): Min: 170001650.1, Avg: 170001650.1, Max: 170001650.2, Diff: 0.1]
      # GC 串行活动。包括代码根的更新和扫描。Clear的时候还要清理RSet相应去除的Card Table信息。G1 GC在扫描Card信息时会有一个标记记录,防止重复扫描同一个Card。
   [Code Root Fixup: 0.1 ms]
   [Code Root Purge: 0.0 ms]
   [Clear CT: 0.3 ms]
   # 剩余的部分就是其他GC活动了。主要包含:选择CSet、引用处理和排队、卡片重新脏化、回收空闲巨型分区以及在收集之后释放CSet。
   [Other: 3.0 ms]
   # Choose CSet:选择CSet,因为年轻代的所有分区都会被收集,所以CSet不需要选择,消耗时间都是0ms。Choose CSet任务一般都是在mixed gc的过程中触发。
      [Choose CSet: 0.0 ms]
      # Ref Proc、Enq: 引用处理主要针对弱引用,软引用,虚引用,final,JNI引用。将这些引用排列到相应的reference队列中。
      [Ref Proc: 2.2 ms]
      [Ref Enq: 0.0 ms]
      # Redirty Cards:重新脏化卡片。排队引用可能会更新RSet,所以需要对关联的Card重新脏化(Redirty Cards)。
      [Redirty Cards: 0.2 ms]
      # Humongous Register、Reclaim 主要是对巨型对象回收的信息,youngGC阶段会对RSet中有引用的短命的巨型对象进行回收,巨型对象会直接回收而不需要进行转移(转移代价巨大,也没必要)。
      [Humongous Register: 0.0 ms]
      [Humongous Reclaim: 0.2 ms]
      # 释放CSet,其中也会清理CSet中的RSet
      [Free CSet: 0.2 ms]
   [Eden: 299.0M(299.0M)->0.0B(298.0M) Survivors: 2048.0K->3072.0K Heap: 388.3M(502.0M)->21.5M(502.0M)]
  # 2020-12-02T16:33:46.165+0800: 170001.641: [GC pause (young) 170001.641: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 145525, predicted base time: 11.16 ms, remaining time: 88.84 ms, target pause time: 100.00 ms]
 170001.641: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 299 regions, survivors: 2 regions, predicted young region time: 0.21 ms]
 170001.641: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 299 regions, survivors: 2 regions, old: 0 regions, predicted pause time: 11.37 ms, target pause time: 100.00 ms]
, 0.0126424 secs]
   [Parallel Time: 9.3 ms, GC Workers: 8]
      [GC Worker Start (ms): Min: 170001641.0, Avg: 170001642.1, Max: 170001645.5, Diff: 4.5]
      [Ext Root Scanning (ms): Min: 0.0, Avg: 1.1, Max: 2.0, Diff: 2.0, Sum: 8.4]
      [Update RS (ms): Min: 3.6, Avg: 6.0, Max: 7.2, Diff: 3.6, Sum: 47.6]
         [Processed Buffers: Min: 33, Avg: 74.8, Max: 112, Diff: 79, Sum: 598]
      [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.0, Sum: 0.3]
      [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
      [Object Copy (ms): Min: 0.8, Avg: 0.9, Max: 1.0, Diff: 0.2, Sum: 7.0]
      [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
         [Termination Attempts: Min: 23, Avg: 38.5, Max: 56, Diff: 33, Sum: 308]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.3]
      [GC Worker Total (ms): Min: 4.6, Avg: 8.0, Max: 9.1, Diff: 4.5, Sum: 63.8]
      [GC Worker End (ms): Min: 170001650.1, Avg: 170001650.1, Max: 170001650.2, Diff: 0.1]
   [Code Root Fixup: 0.1 ms]
   [Code Root Purge: 0.0 ms]
   [Clear CT: 0.3 ms]
   [Other: 3.0 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 2.2 ms]
      [Ref Enq: 0.0 ms]
      [Redirty Cards: 0.2 ms]
      [Humongous Register: 0.0 ms]
      [Humongous Reclaim: 0.2 ms]
      [Free CSet: 0.2 ms]
   [Eden: 299.0M(299.0M)->0.0B(298.0M) Survivors: 2048.0K->3072.0K Heap: 388.3M(502.0M)->21.5M(502.0M)]
# 回收结束标志,各阶段耗时,这一段倒是可以参考
 [Times: user=0.06 sys=0.00, real=0.01 secs] 
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

warrah

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值