1 去掉hive表中无意的列
create external table dim_jd_brand(rowkey string,brand_id string,brand_name string,category_id string, category_name string)
stored by'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
with serdeproperties("hbase.columns.mapping" = ":key,o:brand_id,o:brand_name,o:category_id,o:category_name")
tblproperties("hbase.table.name" = "pro:ods_jd_brand");
brand_id列都是空值,这列已经没有啥价值,但是外表已经创建,如何修改呢?外表并不存数据,故而删除重建即可。
drop table dim_jd_brand;
2 kylin与hive视图
Apache Kylin高级部分之使用Hive视图,
create view v_jd_category_brand
as
select djc.one_category_name,djc.two_category_name,djc.three_category_name,djc.category_id,djc.category_name
,djb.brand_name
from dim_jd_category djc inner join dim_jd_brand djb on djc.category_id=djb.category_id
之前在旧的model中创建,总会出现很多问题,kylin的可视化也不是测试充分,故还是建议删掉旧的model
java.lang.IllegalArgumentException: bad data type -- , does not match (any|char|varchar|string|boolean|byte|binary|int|short|long|integer|tinyint|smallint|bigint|int4|long8|float|real|double|decimal|numeric|date|time|datetime|timestamp|_literal_type|_derived_type|hllc|bitmap|topn|raw|extendedcolumn|percentile|dim_dc|stddev_sum|bitmap_map)\s*(?:[(]([\d\s,]+)[)])?
在Cube Designer中Measures中没有返回类型,这个应该是kylin自身的问题,我是在原来的model基础做的,所以总是遗留各种各种的问题,我删掉model,重新创建model,就没有出现真么乱七八糟的问题,而且维度也都能选了。
3 hive join
hive mapjoin
29.697 seconds,大表在前,小表在后,MapJoin适用的场景:在二个要连接的表中,有一个很大,有一个很小,这个小表可以存放在内存中而不影响性能。
Hive v0.7之前,需要使用hint提示 /*+ mapjoin(table) */才会执行MapJoin。那之后呢?
select /*+mapjoin(djc)*/djc.one_category_name,count(*) cn
from dim_jd_brand djb
left join dim_jd_category djc on djc.category_id=djb.category_id
group by djc.one_category_name;
23.182 seconds,直接inner join
select djc.one_category_name,count(*) cn
from dim_jd_category djc inner join dim_jd_brand djb on djc.category_id=djb.category_id
group by djc.one_category_name;
23.941 seconds,小表在前、大表在后,不指定join模式
select /*+mapjoin(djc)*/djc.one_category_name,count(*) cn
from dim_jd_category djc join dim_jd_brand djb on djc.category_id=djb.category_id
group by djc.one_category_name;
23.121 seconds,小表在前、大表在后,inner join
select /*+mapjoin(djc)*/djc.one_category_name,count(*) cn
from dim_jd_category djc inner join dim_jd_brand djb on djc.category_id=djb.category_id
group by djc.one_category_name;
不过在hive的视图中不能这么用了,用了会报错。
Error: Error while compiling statement: FAILED: SemanticException line 1:24 missing EOF at '.' near 'djc' in definition of VIEW v_jd_category_brand [
select mapjoin(djc)`djc`.`one_category_name`,`djc`.`two_category_name`,`djc`.`three_category_name`,`djc`.`category_id`,`djc`.`category_name`
,`djb`.`brand_name`
from `pro`.`dim_jd_category` `djc` inner join `pro`.`dim_jd_brand` `djb` on `djc`.`category_id`=`djb`.`category_id`
] used as v_jd_category_brand at Line 4:6 (state=42000,code=40000)
Closing: 0: jdbc:hive2://sh102.shahu.com:2181,sh103.shahu.com:2181,sh104.shahu.com:2181/default;password=hdfs;serviceDiscoveryMode=zooKeeper;user=hdfs;zooKeeperNamespace=hiveserver2
4 hdfs预警
HDFS Storage Capacity Usage (Weekly)
The variance for this alert is 7,965,949,817B which is 14% of the 58,372,960,768B average (5,837,296,077B is the limit)
参考Hadoop-The variance for this alert is **MB which is 20% of the **MB average (**MB is the limit,好像也没什么变化
[hdfs@sh102 root]$ hadoop fs -expunge
20/12/02 16:06:01 INFO fs.TrashPolicyDefault: TrashPolicyDefault#deleteCheckpoint for trashRoot: hdfs://sh102.shahu.com:8020/user/hdfs/.Trash
20/12/02 16:06:01 INFO fs.TrashPolicyDefault: TrashPolicyDefault#deleteCheckpoint for trashRoot: hdfs://sh102.shahu.com:8020/user/hdfs/.Trash
20/12/02 16:06:01 INFO fs.TrashPolicyDefault: TrashPolicyDefault#createCheckpoint for trashRoot: hdfs://sh102.shahu.com:8020/user/hdfs/.Trash
5 hive连不上hbase
下面是来自ambari的日志,没看懂什么意思
Connection failed on host sh103.shahu.com:10000 (Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/HIVE/package/alerts/alert_hive_thrift_port.py", line 204, in execute
ldap_password=ldap_password)
File "/usr/lib/ambari-agent/lib/resource_management/libraries/functions/hive_check.py", line 84, in check_thrift_port_sasl
timeout_kill_strategy=TerminateStrategy.KILL_PROCESS_TREE,
File "/usr/lib/ambari-agent/lib/resource_management/core/base.py", line 166, in __init__
self.env.run()
File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py", line 160, in run
self.run_action(resource, action)
File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py", line 124, in run_action
provider_action()
File "/usr/lib/ambari-agent/lib/resource_management/core/providers/system.py", line 263, in action_run
returns=self.resource.returns)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 72, in inner
result = function(command, **kwargs)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 102, in checked_call
tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy, returns=returns)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 150, in _call_wrapper
result = _call(command, **kwargs_copy)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 308, in _call
raise ExecuteTimeoutException(err_msg)
ExecuteTimeoutException: Execution of 'ambari-sudo.sh su ambari-qa -l -s /bin/bash -c 'export PATH='"'"'/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/var/lib/ambari-agent:/bin/:/usr/bin/:/usr/lib/hive/bin/:/usr/sbin/'"'"' ; beeline -n hive -u '"'"'jdbc:hive2://sh103.shahu.com:10000/;transportMode=binary'"'"' -e '"'"';'"'"' 2>&1 | awk '"'"'{print}'"'"' | grep -i -e '"'"'Connected to:'"'"' -e '"'"'Transaction isolation:'"'"''' was killed due timeout after 60 seconds
)
进入hbase,跟着前辈学别人怎么看gc日志,HBase GC故障排查,G1垃圾回收日志分析,JVM性能调优实践——G1 垃圾收集器分析、调优篇,JVM性能调优实践——G1 垃圾收集器介绍篇好像也不知道从哪里下牙。因为我的gc日志,跟大神的们的好像不一样。
-XX:SurvivorRatio参数了解
# youngGC gc log的输出
2020-12-02T16:33:46.165+0800: 170001.641: [GC pause (young) 170001.641: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 145525, predicted base time: 11.16 ms, remaining time: 88.84 ms, target pause time: 100.00 ms]
# 新生代eden和survivors比例可以通过-XX:SurvivorRatio
# [hbase regionserver进程启动参数](https://blog.csdn.net/u011098327/article/details/80702925),-XX:SurvivorRatio=2,[hbase gc调优(CMS与G1)参数](https://www.jianshu.com/p/c85acaccb2f2)也设置为2,为什么呢?
170001.641: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 299 regions, survivors: 2 regions, predicted young region time: 0.21 ms]
170001.641: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 299 regions, survivors: 2 regions, old: 0 regions, predicted pause time: 11.37 ms, target pause time: 100.00 ms]
, 0.0126424 secs]
# GC并行任务
# 并行阶段的汇总信息。总共花费时间9.3ms以及GC的工作线程数8
[Parallel Time: 9.3 ms, GC Workers: 8]
# GC开始时间戳信息
[GC Worker Start (ms): Min: 170001641.0, Avg: 170001642.1, Max: 170001645.5, Diff: 4.5]
# 以下详细记录并行阶段的GC活动
# Diff是偏移平均时间的值。Diff越小越好,说明每个工作线程的速度都很均匀,如果Diff值偏大,就要看下面具体哪一项活动产生的波动。
# Avg代表平均时间值。如果Avg跟Min,Max偏差不大是比较正常的,否则也要详细分析具体的偏差值大的任务。
# 外部根区扫描。外部根是堆外区。JNI引用,JVM系统目录,Classloaders等。后面跟着具体的时间信息。
[Ext Root Scanning (ms): Min: 0.0, Avg: 1.1, Max: 2.0, Diff: 2.0, Sum: 8.4]
# UpdateRS:更新RSet的时间信息
# -XX:MaxGCPauseMillis参数是限制G1的暂停之间,一般RSet更新的时间小于10%的目标暂停时间是比较可取的。如果花费在RSetUpdate的时间过长,可以修改其占用总暂停时间的百分比-XX:G1RSetUpdatingPauseTimePercent。这个参数的默认值是10。
[Update RS (ms): Min: 3.6, Avg: 6.0, Max: 7.2, Diff: 3.6, Sum: 47.6]
# Processed Buffers:已处理缓冲区。这个阶段处理的是在优化线程中处理dirty card分区扫描时记录的日志缓冲区
[Processed Buffers: Min: 33, Avg: 74.8, Max: 112, Diff: 79, Sum: 598]
# 关于RSet的粒度。如果RSet中的Bitmap是粗粒度的,那么就会增加RSet扫描的时间。如下所示的扫描时间,说明还没有粗化的RSet。
# 如果观察到RS的处理时间较长,可以使用-XX:+G1SummarizeRSetStats参数,在GC结束后打印RSet的详细信息。一般在debug环境排查用。还有一个辅助参数G1SummarizeRSetStatsPeriod=0用来控制第几次GC后统计一次RSet信息。
[Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.0, Sum: 0.3]
# 代码根的扫描。只有在分区的RSet有强代码根时会检查CSet的对内引用,例如常量池
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
# 该任务主要是对CSet中存活对象进行转移(复制)。对象拷贝的时间一般占用暂停时间的主要部分。如果拷贝时间和”预测暂停时间“有相差很大,也可以调整年轻代尺寸大小。
[Object Copy (ms): Min: 0.8, Avg: 0.9, Max: 1.0, Diff: 0.2, Sum: 7.0]
# 这里的终止主要是终止工作线程。Work线程在工作终止前会检查其他工作线程的任务,如果其他work线程有没完成的任务,会抢活。如果终止时间较长,可能是某个work线程在某项任务执行时间过长。
[Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
[Termination Attempts: Min: 23, Avg: 38.5, Max: 56, Diff: 33, Sum: 308]
# 花在GC之外的工作线程的时间,比如因为JVM的某个活动,导致GC线程被停掉。这部分消耗的时间不是真正花在GC上,只是作为log的一部分记录。
[GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.3]
# 并行阶段的GC汇总,包含了GC以及GC Worker Other的总时间。
[GC Worker Total (ms): Min: 4.6, Avg: 8.0, Max: 9.1, Diff: 4.5, Sum: 63.8]
# GC结束时间戳信息
[GC Worker End (ms): Min: 170001650.1, Avg: 170001650.1, Max: 170001650.2, Diff: 0.1]
# GC 串行活动。包括代码根的更新和扫描。Clear的时候还要清理RSet相应去除的Card Table信息。G1 GC在扫描Card信息时会有一个标记记录,防止重复扫描同一个Card。
[Code Root Fixup: 0.1 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 0.3 ms]
# 剩余的部分就是其他GC活动了。主要包含:选择CSet、引用处理和排队、卡片重新脏化、回收空闲巨型分区以及在收集之后释放CSet。
[Other: 3.0 ms]
# Choose CSet:选择CSet,因为年轻代的所有分区都会被收集,所以CSet不需要选择,消耗时间都是0ms。Choose CSet任务一般都是在mixed gc的过程中触发。
[Choose CSet: 0.0 ms]
# Ref Proc、Enq: 引用处理主要针对弱引用,软引用,虚引用,final,JNI引用。将这些引用排列到相应的reference队列中。
[Ref Proc: 2.2 ms]
[Ref Enq: 0.0 ms]
# Redirty Cards:重新脏化卡片。排队引用可能会更新RSet,所以需要对关联的Card重新脏化(Redirty Cards)。
[Redirty Cards: 0.2 ms]
# Humongous Register、Reclaim 主要是对巨型对象回收的信息,youngGC阶段会对RSet中有引用的短命的巨型对象进行回收,巨型对象会直接回收而不需要进行转移(转移代价巨大,也没必要)。
[Humongous Register: 0.0 ms]
[Humongous Reclaim: 0.2 ms]
# 释放CSet,其中也会清理CSet中的RSet
[Free CSet: 0.2 ms]
[Eden: 299.0M(299.0M)->0.0B(298.0M) Survivors: 2048.0K->3072.0K Heap: 388.3M(502.0M)->21.5M(502.0M)]
# 2020-12-02T16:33:46.165+0800: 170001.641: [GC pause (young) 170001.641: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 145525, predicted base time: 11.16 ms, remaining time: 88.84 ms, target pause time: 100.00 ms]
170001.641: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 299 regions, survivors: 2 regions, predicted young region time: 0.21 ms]
170001.641: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 299 regions, survivors: 2 regions, old: 0 regions, predicted pause time: 11.37 ms, target pause time: 100.00 ms]
, 0.0126424 secs]
[Parallel Time: 9.3 ms, GC Workers: 8]
[GC Worker Start (ms): Min: 170001641.0, Avg: 170001642.1, Max: 170001645.5, Diff: 4.5]
[Ext Root Scanning (ms): Min: 0.0, Avg: 1.1, Max: 2.0, Diff: 2.0, Sum: 8.4]
[Update RS (ms): Min: 3.6, Avg: 6.0, Max: 7.2, Diff: 3.6, Sum: 47.6]
[Processed Buffers: Min: 33, Avg: 74.8, Max: 112, Diff: 79, Sum: 598]
[Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.0, Sum: 0.3]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Object Copy (ms): Min: 0.8, Avg: 0.9, Max: 1.0, Diff: 0.2, Sum: 7.0]
[Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
[Termination Attempts: Min: 23, Avg: 38.5, Max: 56, Diff: 33, Sum: 308]
[GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.3]
[GC Worker Total (ms): Min: 4.6, Avg: 8.0, Max: 9.1, Diff: 4.5, Sum: 63.8]
[GC Worker End (ms): Min: 170001650.1, Avg: 170001650.1, Max: 170001650.2, Diff: 0.1]
[Code Root Fixup: 0.1 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 0.3 ms]
[Other: 3.0 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 2.2 ms]
[Ref Enq: 0.0 ms]
[Redirty Cards: 0.2 ms]
[Humongous Register: 0.0 ms]
[Humongous Reclaim: 0.2 ms]
[Free CSet: 0.2 ms]
[Eden: 299.0M(299.0M)->0.0B(298.0M) Survivors: 2048.0K->3072.0K Heap: 388.3M(502.0M)->21.5M(502.0M)]
# 回收结束标志,各阶段耗时,这一段倒是可以参考
[Times: user=0.06 sys=0.00, real=0.01 secs]