hive处理自定义数据结构日志

1、分析日志结构,想得到“Collection events:”后各个参数的值

2019-08-26 08:42:39.472 INFO [resin-port-9001-42][ContentOperationController.java:90] - Collection events:eventsType=operationPage;mac=A089E41D06A0;sn=12042140205760A089E41D06A0;userId=9518971;userType=vod;parentColumnId=38;columnId=40;nowSpm=38.PAGE_YNGDJSJ.0.0.1566780159472;afterSpm=0;pos=POS_RECOMMEND_BIT;posName=推荐位;createTime=2019-08-26 08:42:39:END
2019-08-26 08:45:28.222 INFO [resin-port-9001-42][ContentOperationController.java:164] - Collection events:eventsType=operationPage;mac=A089E41D06A0;sn=12042140205760A089E41D06A0;userId=9518971;userType=vod;parentColumnId=38;columnId=40;nowSpm=38.PAGE_RECOMMEND.40.0.1566780328220;afterSpm=0;pos=POS_RECOMMEND_BIT;posName=推荐位;createTime=2019-08-26 08:45:28:END
2019-08-26 08:45:50.822 INFO [resin-port-9001-42][ContentOperationController.java:164] - Collection events:eventsType=operationPage;mac=A089E41D06A0;sn=12042140205760A089E41D06A0;userId=9518971;userType=vod;parentColumnId=38;columnId=41;nowSpm=38.PAGE_SERIES.41.0.1566780350821;afterSpm=0;pos=POS_RECOMMEND_BIT;posName=推荐位;createTime=2019-08-26 08:45:50:END
2019-08-26 08:46:00.203 INFO [resin-port-9001-41][ContentOperationController.java:164] - Collection events:eventsType=operationPage;mac=A089E41D06A0;sn=12042140205760A089E41D06A0;userId=9518971;userType=vod;parentColumnId=38;columnId=43;nowSpm=38.PAGE_CHILDREN.43.0.1566780360201;afterSpm=0;pos=POS_RECOMMEND_BIT;posName=推荐位;createTime=2019-08-26 08:46:00:END
2019-08-26 08:45:54.427 INFO [resin-port-9001-41][ContentOperationController.java:164] - Collection events:eventsType=operationPage;mac=A089E41D06A0;sn=12042140205760A089E41D06A0;userId=9518971;userType=vod;parentColumnId=38;columnId=43;nowSpm=38.PAGE_CHILDREN.43.0.1566780354425;afterSpm=0;pos=POS_RECOMMEND_BIT;posName=推荐位;createTime=2019-08-26 08:45:54:END
2019-08-26 08:46:07.205 INFO [resin-port-9001-43][ContentOperationController.java:164] - Collection events:eventsType=operationPage;mac=A089E41D06A0;sn=12042140205760A089E41D06A0;userId=9518971;userType=vod;parentColumnId=38;columnId=47;nowSpm=38.PAGE_HEALTHY.47.0.1566780367204;afterSpm=0;pos=POS_RECOMMEND_BIT;posName=推荐位;createTime=2019-08-26 08:46:07:END
2019-08-26 08:46:11.407 INFO [resin-port-9001-43][ContentOperationController.java:164] - Collection events:eventsType=operationPage;mac=A089E41D06A0;sn=12042140205760A089E41D06A0;userId=9518971;userType=vod;parentColumnId=38;columnId=44;nowSpm=38.PAGE_COMIC.44.0.1566780371406;afterSpm=0;pos=POS_RECOMMEND_BIT;posName=推荐位;createTime=2019-08-26 08:46:11:END
2019-08-26 08:45:09.355 INFO [resin-port-9001-43][ContentOperationController.java:90] - Collection events:eventsType=operationPage;mac=A089E41D06A0;sn=12042140205760A089E41D06A0;userId=9518971;userType=vod;parentColumnId=38;columnId=40;nowSpm=38.PAGE_YNGDJSJ.0.0.1566780309355;afterSpm=0;pos=POS_RECOMMEND_BIT;posName=推荐位;createTime=2019-08-26 08:45:09:END
2019-08-26 08:45:52.431 INFO [resin-port-9001-43][ContentOperationController.java:164] - Collection events:eventsType=operationPage;mac=A089E41D06A0;sn=12042140205760A089E41D06A0;userId=9518971;userType=vod;parentColumnId=38;columnId=42;nowSpm=38.PAGE_MOVIE.42.0.1566780352430;afterSpm=0;pos=POS_RECOMMEND_BIT;posName=推荐位;createTime=2019-08-26 08:45:52:END
2019-08-26 08:46:01.853 INFO [resin-port-9001-42][ContentOperationController.java:164] - Collection events:eventsType=operationPage;mac=A089E41D06A0;sn=12042140205760A089E41D06A0;userId=9518971;userType=vod;parentColumnId=38;columnId=45;nowSpm=38.PAGE_VARIETY.45.0.1566780361852;afterSpm=0;pos=POS_RECOMMEND_BIT;posName=推荐位;createTime=2019-08-26 08:46:01:END
2019-08-26 08:45:26.093 INFO [resin-port-9001-43][ContentOperationController.java:164] - Collection events:eventsType=operationPage;mac=A089E41D06A0;sn=12042140205760A089E41D06A0;userId=9518971;userType=vod;parentColumnId=38;columnId=41;nowSpm=38.PAGE_SERIES.41.0.1566780326091;afterSpm=0;pos=POS_RECOMMEND_BIT;posName=推荐位;createTime=2019-08-26 08:45:26:END
2019-08-26 08:45:57.995 INFO [resin-port-9001-43][ContentOperationController.java:164] - Collection events:eventsType=operationPage;mac=A089E41D06A0;sn=12042140205760A089E41D06A0;userId=9518971;userType=vod;parentColumnId=38;columnId=44;nowSpm=38.PAGE_COMIC.44.0.1566780357993;afterSpm=0;pos=POS_RECOMMEND_BIT;posName=推荐位;createTime=2019-08-26 08:45:57:END
2019-08-26 08:46:05.055 INFO [resin-port-9001-43][ContentOperationController.java:164] - Collection events:eventsType=operationPage;mac=A089E41D06A0;sn=12042140205760A089E41D06A0;userId=9518971;userType=vod;parentColumnId=38;columnId=46;nowSpm=38.PAGE_RECORD.46.0.1566780365053;afterSpm=0;pos=POS_RECOMMEND_BIT;posName=推荐位;createTime=2019-08-26 08:46:05:END

2、创建hive表结构

CREATE TABLE logs
(
 eventsType STRING,
 mac STRING,
 sn STRING,
 userId STRING,
 userType STRING,
 parentColumnId STRING,
 columnId STRING,
 nowSpm STRING,
 afterSpm STRING,
 pos STRING,
 posName STRING,
 createTime STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
"input.regex" = "^.*eventsType=(.*);mac=(.*);sn=(.*);userId=(.*);userType=(.*);parentColumnId=(.*);columnId=(.*);nowSpm=(.*);afterSpm=(.*);pos=(.*);posName=(.*);createTime=(.*):END",
"output.format.string" = "%1$s %2$s %3$s $4$s %5$s %6$s %7$s $8$s %9$s %10$s %11$s $12$s"
);

3、hive运行结果如下

hive> CREATE TABLE logs
    > (
    >  eventsType STRING,
    >  mac STRING,
    >  sn STRING,
    >  userId STRING,
    >  userType STRING,
    >  parentColumnId STRING,
    >  columnId STRING,
    >  nowSpm STRING,
    >  afterSpm STRING,
    >  pos STRING,
    >  posName STRING,
    >  createTime STRING
    > )
    > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
    > WITH SERDEPROPERTIES (
    > "input.regex" = "^.*eventsType=(.*);mac=(.*);sn=(.*);userId=(.*);userType=(.*);parentColumnId=(.*);columnId=(.*);nowSpm=(.*);afterSpm=(.*);pos=(.*);posName=(.*);createTime=(.*):END",
    > "output.format.string" = "%1$s %2$s %3$s $4$s %5$s %6$s %7$s $8$s %9$s %10$s %11$s $12$s"
    > );
OK
Time taken: 0.075 seconds
hive> 
    > load data local inpath '/usr/local/hive-3.1.2/bin/hadoop.log' into table logs;
Loading data to table default.logs
OK
Time taken: 0.243 seconds
hive> select * from logs;
OK
operationPage	A089E41D06A0	12042140205760A089E41D06A0	9518971	vod	38	40	38.PAGE_YNGDJSJ.0.0.1566780159472	0	POS_RECOMMEND_BIT	推荐位	2019-08-26 08:42:39
operationPage	A089E41D06A0	12042140205760A089E41D06A0	9518971	vod	38	40	38.PAGE_RECOMMEND.40.0.1566780328220	0	POS_RECOMMEND_BIT	推荐位	2019-08-26 08:45:28
operationPage	A089E41D06A0	12042140205760A089E41D06A0	9518971	vod	38	41	38.PAGE_SERIES.41.0.1566780350821	0	POS_RECOMMEND_BIT	推荐位	2019-08-26 08:45:50
operationPage	A089E41D06A0	12042140205760A089E41D06A0	9518971	vod	38	43	38.PAGE_CHILDREN.43.0.1566780360201	0	POS_RECOMMEND_BIT	推荐位	2019-08-26 08:46:00
operationPage	A089E41D06A0	12042140205760A089E41D06A0	9518971	vod	38	43	38.PAGE_CHILDREN.43.0.1566780354425	0	POS_RECOMMEND_BIT	推荐位	2019-08-26 08:45:54
operationPage	A089E41D06A0	12042140205760A089E41D06A0	9518971	vod	38	47	38.PAGE_HEALTHY.47.0.1566780367204	0	POS_RECOMMEND_BIT	推荐位	2019-08-26 08:46:07
operationPage	A089E41D06A0	12042140205760A089E41D06A0	9518971	vod	38	44	38.PAGE_COMIC.44.0.1566780371406	0	POS_RECOMMEND_BIT	推荐位	2019-08-26 08:46:11
operationPage	A089E41D06A0	12042140205760A089E41D06A0	9518971	vod	38	40	38.PAGE_YNGDJSJ.0.0.1566780309355	0	POS_RECOMMEND_BIT	推荐位	2019-08-26 08:45:09
operationPage	A089E41D06A0	12042140205760A089E41D06A0	9518971	vod	38	42	38.PAGE_MOVIE.42.0.1566780352430	0	POS_RECOMMEND_BIT	推荐位	2019-08-26 08:45:52
operationPage	A089E41D06A0	12042140205760A089E41D06A0	9518971	vod	38	45	38.PAGE_VARIETY.45.0.1566780361852	0	POS_RECOMMEND_BIT	推荐位	2019-08-26 08:46:01
operationPage	A089E41D06A0	12042140205760A089E41D06A0	9518971	vod	38	41	38.PAGE_SERIES.41.0.1566780326091	0	POS_RECOMMEND_BIT	推荐位	2019-08-26 08:45:26
operationPage	A089E41D06A0	12042140205760A089E41D06A0	9518971	vod	38	44	38.PAGE_COMIC.44.0.1566780357993	0	POS_RECOMMEND_BIT	推荐位	2019-08-26 08:45:57
operationPage	A089E41D06A0	12042140205760A089E41D06A0	9518971	vod	38	46	38.PAGE_RECORD.46.0.1566780365053	0	POS_RECOMMEND_BIT	推荐位	2019-08-26 08:46:05
Time taken: 0.136 seconds, Fetched: 13 row(s)

上面使用了RegexSerDe的正则表达式拆分功能,使用RegexSerDe类时,所有的字段必须为string

参数说明注意
input.regex输入的正则表达式,大括号表示一个字段数据如以一个匹配组,表示一个字段:([^ ]*)
input.regex.case.insensitive是否忽略字母大小写,默认为false 
output.format.string输出的正则表达式%num表示字段位置

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值