- 对于日志文件中非结构性行的格式化处理成表结构数据;如下;需解析key,value
2019-10-03 00:53:03.624 INFO [resin-port-9001-42][ContentOperationController.java:367] - Collection events:eventsType=operationPage;mac=88CC4525E50C;sn=1208219040576088CC4525E50C;userId=12725254;userType=vod;parentColumnId=38;columnId=541;nowSpm=38.PAGE_DSJ011008.541.0.1570035183613;afterSpm=38.PAGE_DSJ01100605.537.0.1570035181037;pos=POS_LIST;posName=列表;createTime=2019-10-03 00:53:03:END 2019-10-03 00:54:20.394 INFO [resin-port-9003-50][CommonAuthService.java:162] - Collection events:eventsType=auth_product;mac=C88F26CBDC57;sn=12022161905760C88F26CBDC57;userId=12500868;userType=VOD;contentId=11755;contentType=1;parentColumnId=38;code=S100000;message=鉴通过;operateType=auth_product;createTime=2019-10-03 00:54:20:END 2019-10-03 00:55:28.791 INFO [resin-port-9002-41][ResumePointController.java:161] - Collection events:eventsType=operateResumePoint;mac=AC4AFE820E40;sn=120535005053F0AC4AFE820E40;userId=12106198;userType=vod;parentColumnId=38;columnId=0;contentId=9151;contentType=1;operateType=get;createTime=2019-10-03 00:55:28:END 2019-10-03 00:58:46.958 INFO [resin-port-9001-43][ContentOperationController.java:609] - Collection events:eventsType=operationDetails;mac=AC4AFE820E40;sn=120535005053F0AC4AFE820E40;userId=12106198;userType=vod;parentColumnId=38;columnId=4461;contentId=10769;contentType=1;nowSpm=38.PAGE_ALBUM_DETAILS.4461.10769.1570035526958;afterSpm=38.PAGE_DY01100622.4461.10769.1570035516087;pos=PAGE_ALBUM_DETAILS;posName=专辑详情;createTime=2019-10-03 00:58:46:END
- 可参考hive源码实现自定义的解析; 源码参考路径:HIVE源码SER
package com.ppfuns; import com.google.common.base.Splitter; import com.google.common.collect.Lists; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.hive.common.type.*; import org.apache.hadoop.hive.serde2.AbstractSerDe; import org.apache.hadoop.hive.serde2.SerDeException; import org.apache.hadoop.hive.serde2.SerDeSpec; import org.apache.hadoop.hive.serde2.SerDeStats; import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector; import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory; import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector; import org.apache.hadoop.hive.serde2.objectinspector.primitive.AbstractPrimitiveJavaObjectInspector; import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory; i
hive之AbstractSerDe自定义表的解析
最新推荐文章于 2023-06-06 20:02:36 发布
本文介绍了如何处理非结构化的日志文件,将其转化为Hive表结构数据。通过参考Hive源码中的AbstractSerDe,实现自定义解析逻辑,以适应key-value格式的日志内容。并提供了HIVE SQL定义及执行方法。
摘要由CSDN通过智能技术生成