kylin切换viewfs任务build失败问题定位

一、问题描述

在hdfs切换viewfs协议的时候,kylin为了支持viewfs协议做了相应的更改,在执行cube build任务的时候,绝大部分任务都成功执行,存在如下cube build失败。

异常信息:

2019-04-15 04:12:41,178 ERROR [Job 9f0f5715-f7ce-4b2f-be07-b47231883ce1-238] common.HadoopShellExecutable:65 : error execute HadoopShellExecutable{id=9f0f5715-f7ce-4b2f-be07-b47231883ce1-03, name=Build Dimension Dictionary, state=RUNNING}
java.lang.RuntimeException: Failed to create dictionary on BASE_TCLIVECHAT.V_CHATTRACKENTRYRECORD.ROBOTSESSIONID
        at org.apache.kylin.dict.DictionaryManager.buildDictFromReadableTable(DictionaryManager.java:308)
        at org.apache.kylin.dict.DictionaryManager.buildDictionary(DictionaryManager.java:292)
        at org.apache.kylin.cube.CubeManager.buildDictionary(CubeManager.java:223)
        at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:71)
        at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:54)
        at org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:66)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
        at org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:63)
        at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:124)
        at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:64)
        at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:124)
        at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:142)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: com.google.common.util.concurrent.UncheckedExecutionException: java.lang.RuntimeException: java.io.IOException: Incomplete HDFS URI, no host: hdfs:///kylin/kylin_metadata/resources/GlobalDict/dict/BASE_TCLIVECHAT.V_CHATTRACKENTRYRECORD/ROBOTSESSIONID
        at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2256)
        at com.google.common.cache.LocalCache.get(LocalCache.java:3985)
        at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3989)
        at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4873)
        at org.apache.kylin.dict.DictionaryManager.getDictionaryInfo(DictionaryManager.java:119)
        at org.apache.kylin.dict.DictionaryManager.getDictionary(DictionaryManager.java:113)
        at org.apache.kylin.dict.AppendTrieDictionary$Builder.createNewBuilder(AppendTrieDictionary.java:873)
        at org.apache.kylin.dict.AppendTrieDictionary$Builder.getInstance(AppendTrieDictionary.java:833)
        at org.apache.kylin.dict.AppendTrieDictionary$Builder.getInstance(AppendTrieDictionary.java:827)
        at org.apache.kylin.dict.GlobalDictionaryBuilder.init(GlobalDictionaryBuilder.java:39)
        at org.apache.kylin.dict.DictionaryGenerator.buildDictionary(DictionaryGenerator.java:73)
        at org.apache.kylin.dict.DictionaryManager.buildDictFromReadableTable(DictionaryManager.java:305)
        ... 15 more
Caused by: java.lang.RuntimeException: java.io.IOException: Incomplete HDFS URI, no host: hdfs:///kylin/kylin_metadata/resources/GlobalDict/dict/BASE_TCLIVECHAT.V_CHATTRACKENTRYRECORD/ROBOTSESSIONID
        at org.apache.kylin.common.util.HadoopUtil.getFileSystem(HadoopUtil.java:90)
        at org.apache.kylin.dict.CachedTreeMap.openLatestIndexInput(CachedTreeMap.java:451)
        at org.apache.kylin.dict.AppendTrieDictionary.readFields(AppendTrieDictionary.java:1197)
        at org.apache.kylin.dict.DictionaryInfoSerializer.deserialize(DictionaryInfoSerializer.java:74)
        at org.apache.kylin.dict.DictionaryInfoSerializer.deserialize(DictionaryInfoSerializer.java:34)
        at org.apache.kylin.common.persistence.ResourceStore.getResource(ResourceStore.java:154)
        at org.apache.kylin.dict.DictionaryManager.load(DictionaryManager.java:445)
        at org.apache.kylin.dict.DictionaryManager$1.load(DictionaryManager.java:102)
        at org.apache.kylin.dict.DictionaryManager$1.load(DictionaryManager.java:99)
        at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3584)
        at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2372)
        at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2335)
        at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2250)
        ... 26 more
Caused by: java.io.IOException: Incomplete HDFS URI, no host: hdfs:///kylin/kylin_metadata/resources/GlobalDict/dict/BASE_TCLIVECHAT.V_CHATTRACKENTRYRECORD/ROBOTSESSIONID
        at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:141)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2596)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
        at org.apache.kylin.common.util.HadoopUtil.getFileSystem(HadoopUtil.java:88)
        ... 38 more

二、问题定位:

发现在该cube进行build dict的时候没有正确的获取viewfs协议。
通过堆栈信息,查看源码,使用arthas,查看部分方法的入参,发现:
由于该字段ROBOTSESSIONID使用了全局字典,由于全局字典相关的字典数据不是存储在hbase中的,hbase中只存在引用全局字典的hdfs路径。
具体信息如下:
字典信息存储在hbase的rowkey为:

/dict/BASE_TCLIVECHAT.V_CHATTRACKENTRYRECORD/ROBOTSESSIONID/2ee6dc87-7842-4bfe-a1fe-cb88a8f91b3b.dict

对应的value值为:

^B_{
  "uuid" : "2ee6dc87-7842-4bfe-a1fe-cb88a8f91b3b",
  "last_modified" : 0,
  "version" : "2.0.0",
  "source_table" : "BASE_TCLIVECHAT.V_CHATTRACKENTRYRECORD",
  "source_column" : "ROBOTSESSIONID",
  "source_column_index" : 6,
  "data_type" : "bigint",
  "input" : {
    "path" : "hdfs:///kylin/kylin_metadata/kylin-6e31d660-8f35-472c-abce-afb2c2bb19e3/IM_Track_Basic_Modeal_Cube/fact_distinct_columns/V_CHATTRACKENTRYRECORD.ROBOTSESSIONID",
    "size" : 40424626,
    "last_modified_time" : 1539833412761
  },
  "dictionary_class" : "org.apache.kylin.dict.AppendTrieDictionary",
  "cardinality" : 2223751
}^@mhdfs:///kylin/kylin_metadata/resources/GlobalDict/dict/BASE_TCLIVECHAT.V_CHATTRACKENTRYRECORD/ROBOTSESSIONID/

发现存储全局字典的路径为:

hdfs:///kylin/kylin_metadata/resources/GlobalDict/dict/BASE_TCLIVECHAT.V_CHATTRACKENTRYRECORD/ROBOTSESSIONID/

并不是使用的viewfs协议。直接修改hdfs:///kylin/kylin_metadata/resources/GlobalDict/dict/BASE_TCLIVECHAT.V_CHATTRACKENTRYRECORD/ROBOTSESSIONID/为viewfs://test/kylin/kylin_metadata/resources/GlobalDict/dict/BASE_TCLIVECHAT.V_CHATTRACKENTRYRECORD/ROBOTSESSIONID/
是无法执行成功的。由于数据采用的是DataOutputStream的writeUTF函数写入的,该函数会使用两个字节来记录写入数据的长度,由于hdfs:///kylin/kylin_metadata/resources/GlobalDict/dict/BASE_TCLIVECHAT.V_CHATTRACKENTRYRECORD/ROBOTSESSIONID/的长度为109而viewfs://test/kylin/kylin_metadata/resources/GlobalDict/dict/BASE_TCLIVECHAT.V_CHATTRACKENTRYRECORD/ROBOTSESSIONID/长度为115。
因此无法正确获取修改后的路径,而是获得:viewfs://test/kylin/kylin_metadata/resources/GlobalDict/dict/BASE_TCLIVECHAT.V_CHATTRACKENTRYRECORD/ROBOTSES导致路径不存在异常。
通过程序解析^@m发现该值对应为109。因此需要将115编译成二进制进行替换:^@s

三、问题修复

最终执行操作如下:

#kylin元数据备份目录
cd /BigData/run/kylin/meta_backups
#创建需要修改的元数据目录
mkdir -p meta_repair_2019_04_15/dict/BASE_TCLIVECHAT.V_CHATTRACKENTRYRECORD/ROBOTSESSIONID/
cd meta_repair_2019_04_15/dict/BASE_TCLIVECHAT.V_CHATTRACKENTRYRECORD/ROBOTSESSIONID/
cp /BigData/run/kylin/meta_backups/meta_2019_04_15_00_00_06/dict/BASE_TCLIVECHAT.V_CHATTRACKENTRYRECORD/ROBOTSESSIONID/2ee6dc87-7842-4bfe-a1fe-cb88a8f91b3b.dict  .

修改2ee6dc87-7842-4bfe-a1fe-cb88a8f91b3b.dict 文件对应的内容为:

^B_{
  "uuid" : "2ee6dc87-7842-4bfe-a1fe-cb88a8f91b3b",
  "last_modified" : 0,
  "version" : "2.0.0",
  "source_table" : "BASE_TCLIVECHAT.V_CHATTRACKENTRYRECORD",
  "source_column" : "ROBOTSESSIONID",
  "source_column_index" : 6,
  "data_type" : "bigint",
  "input" : {
    "path" : "hdfs:///kylin/kylin_metadata/kylin-6e31d660-8f35-472c-abce-afb2c2bb19e3/IM_Track_Basic_Modeal_Cube/fact_distinct_columns/V_CHATTRACKENTRYRECORD.ROBOTSESSIONID",
    "size" : 40424626,
    "last_modified_time" : 1539833412761
  },
  "dictionary_class" : "org.apache.kylin.dict.AppendTrieDictionary",
  "cardinality" : 2223751
}^@sviewfs://test/kylin/kylin_metadata/resources/GlobalDict/dict/BASE_TCLIVECHAT.V_CHATTRACKENTRYRECORD/ROBOTSESSIONID/

保存退出。
然后执行元数据修复命令

cd /BigData/run/kylin
metastore.sh restore meta_backups/meta_repair_2019_04_15

修复成功以后,重新调起kylin的build任务,执行成功。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值