1、分析日志结构,想得到“Collection events:”后各个参数的值
2019-08-26 08:42:39.472 INFO [resin-port-9001-42][ContentOperationController.java:90] - Collection events:eventsType=operationPage;mac=A089E41D06A0;sn=12042140205760A089E41D06A0;userId=9518971;userType=vod;parentColumnId=38;columnId=40;nowSpm=38.PAGE_YNGDJSJ.0.0.1566780159472;afterSpm=0;pos=POS_RECOMMEND_BIT;posName=推荐位;createTime=2019-08-26 08:42:39:END
2019-08-26 08:45:28.222 INFO [resin-port-9001-42][ContentOperationController.java:164] - Collection events:eventsType=operationPage;mac=A089E41D06A0;sn=12042140205760A089E41D06A0;userId=9518971;userType=vod;parentColumnId=38;columnId=40;nowSpm=38.PAGE_RECOMMEND.40.0.1566780328220;afterSpm=0;pos=POS_RECOMMEND_BIT;posName=推荐位;createTime=2019-08-26 08:45:28:END
2019-08-26 08:45:50.822 INFO [resin-port-9001-42][ContentOperationController.java:164] - Collection events:eventsType=operationPage;mac=A089E41D06A0;sn=12042140205760A089E41D06A0;userId=9518971;userType=vod;parentColumnId=38;columnId=41;nowSpm=38.PAGE_SERIES.41.0.1566780350821;afterSpm=0;pos=POS_RECOMMEND_BIT;posName=推荐位;createTime=2019-08-26 08:45:50:END
2019-08-26 08:46:00.203 INFO [resin-port-9001-41][ContentOperationController.java:164] - Collection events:eventsType=operationPage;mac=A089E41D06A0;sn=12042140205760A089E41D06A0;userId=9518971;userType=vod;parentColumnId=38;columnId=43;nowSpm=38.PAGE_CHILDREN.43.0.1566780360201;afterSpm=0;pos=POS_RECOMMEND_BIT;posName=推荐位;createTime=2019-08-26 08:46:00:END
2019-08-26 08:45:54.427 INFO [resin-port-9001-41][ContentOperationController.java:164] - Collection events:eventsType=operationPage;mac=A089E41D06A0;sn=12042140205760A089E41D06A0;userId=9518971;userType=vod;parentColumnId=38;columnId=43;nowSpm=38.PAGE_CHILDREN.43.0.1566780354425;afterSpm=0;pos=POS_RECOMMEND_BIT;posName=推荐位;createTime=2019-08-26 08:45:54:END
2019-08-26 08:46:07.205 INFO [resin-port-9001-43][ContentOperationController.java:164] - Collection events:eventsType=operationPage;mac=A089E41D06A0;sn=12042140205760A089E41D06A0;userId=9518971;userType=vod;parentColumnId=38;columnId=47;nowSpm=38.PAGE_HEALTHY.47.0.1566780367204;afterSpm=0;pos=POS_RECOMMEND_BIT;posName=推荐位;createTime=2019-08-26 08:46:07:END
2019-08-26 08:46:11.407 INFO [resin-port-9001-43][ContentOperationController.java:164] - Collection events:eventsType=operationPage;mac=A089E41D06A0;sn=12042140205760A089E41D06A0;userId=9518971;userType=vod;parentColumnId=38;columnId=44;nowSpm=38.PAGE_COMIC.44.0.1566780371406;afterSpm=0;pos=POS_RECOMMEND_BIT;posName=推荐位;createTime=2019-08-26 08:46:11:END
2019-08-26 08:45:09.355 INFO [resin-port-9001-43][ContentOperationController.java:90] - Collection events:eventsType=operationPage;mac=A089E41D06A0;sn=12042140205760A089E41D06A0;userId=9518971;userType=vod;parentColumnId=38;columnId=40;nowSpm=38.PAGE_YNGDJSJ.0.0.1566780309355;afterSpm=0;pos=POS_RECOMMEND_BIT;posName=推荐位;createTime=2019-08-26 08:45:09:END
2019-08-26 08:45:52.431 INFO [resin-port-9001-43][ContentOperationController.java:164] - Collection events:eventsType=operationPage;mac=A089E41D06A0;sn=12042140205760A089E41D06A0;userId=9518971;userType=vod;parentColumnId=38;columnId=42;nowSpm=38.PAGE_MOVIE.42.0.1566780352430;afterSpm=0;pos=POS_RECOMMEND_BIT;posName=推荐位;createTime=2019-08-26 08:45:52:END
2019-08-26 08:46:01.853 INFO [resin-port-9001-42][ContentOperationController.java:164] - Collection events:eventsType=operationPage;mac=A089E41D06A0;sn=12042140205760A089E41D06A0;userId=9518971;userType=vod;parentColumnId=38;columnId=45;nowSpm=38.PAGE_VARIETY.45.0.1566780361852;afterSpm=0;pos=POS_RECOMMEND_BIT;posName=推荐位;createTime=2019-08-26 08:46:01:END
2019-08-26 08:45:26.093 INFO [resin-port-9001-43][ContentOperationController.java:164] - Collection events:eventsType=operationPage;mac=A089E41D06A0;sn=12042140205760A089E41D06A0;userId=9518971;userType=vod;parentColumnId=38;columnId=41;nowSpm=38.PAGE_SERIES.41.0.1566780326091;afterSpm=0;pos=POS_RECOMMEND_BIT;posName=推荐位;createTime=2019-08-26 08:45:26:END
2019-08-26 08:45:57.995 INFO [resin-port-9001-43][ContentOperationController.java:164] - Collection events:eventsType=operationPage;mac=A089E41D06A0;sn=12042140205760A089E41D06A0;userId=9518971;userType=vod;parentColumnId=38;columnId=44;nowSpm=38.PAGE_COMIC.44.0.1566780357993;afterSpm=0;pos=POS_RECOMMEND_BIT;posName=推荐位;createTime=2019-08-26 08:45:57:END
2019-08-26 08:46:05.055 INFO [resin-port-9001-43][ContentOperationController.java:164] - Collection events:eventsType=operationPage;mac=A089E41D06A0;sn=12042140205760A089E41D06A0;userId=9518971;userType=vod;parentColumnId=38;columnId=46;nowSpm=38.PAGE_RECORD.46.0.1566780365053;afterSpm=0;pos=POS_RECOMMEND_BIT;posName=推荐位;createTime=2019-08-26 08:46:05:END
2、创建hive表结构
CREATE TABLE logs
(
eventsType STRING,
mac STRING,
sn STRING,
userId STRING,
userType STRING,
parentColumnId STRING,
columnId STRING,
nowSpm STRING,
afterSpm STRING,
pos STRING,
posName STRING,
createTime STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
"input.regex" = "^.*eventsType=(.*);mac=(.*);sn=(.*);userId=(.*);userType=(.*);parentColumnId=(.*);columnId=(.*);nowSpm=(.*);afterSpm=(.*);pos=(.*);posName=(.*);createTime=(.*):END",
"output.format.string" = "%1$s %2$s %3$s $4$s %5$s %6$s %7$s $8$s %9$s %10$s %11$s $12$s"
);
3、hive运行结果如下
hive> CREATE TABLE logs
> (
> eventsType STRING,
> mac STRING,
> sn STRING,
> userId STRING,
> userType STRING,
> parentColumnId STRING,
> columnId STRING,
> nowSpm STRING,
> afterSpm STRING,
> pos STRING,
> posName STRING,
> createTime STRING
> )
> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
> WITH SERDEPROPERTIES (
> "input.regex" = "^.*eventsType=(.*);mac=(.*);sn=(.*);userId=(.*);userType=(.*);parentColumnId=(.*);columnId=(.*);nowSpm=(.*);afterSpm=(.*);pos=(.*);posName=(.*);createTime=(.*):END",
> "output.format.string" = "%1$s %2$s %3$s $4$s %5$s %6$s %7$s $8$s %9$s %10$s %11$s $12$s"
> );
OK
Time taken: 0.075 seconds
hive>
> load data local inpath '/usr/local/hive-3.1.2/bin/hadoop.log' into table logs;
Loading data to table default.logs
OK
Time taken: 0.243 seconds
hive> select * from logs;
OK
operationPage A089E41D06A0 12042140205760A089E41D06A0 9518971 vod 38 40 38.PAGE_YNGDJSJ.0.0.1566780159472 0 POS_RECOMMEND_BIT 推荐位 2019-08-26 08:42:39
operationPage A089E41D06A0 12042140205760A089E41D06A0 9518971 vod 38 40 38.PAGE_RECOMMEND.40.0.1566780328220 0 POS_RECOMMEND_BIT 推荐位 2019-08-26 08:45:28
operationPage A089E41D06A0 12042140205760A089E41D06A0 9518971 vod 38 41 38.PAGE_SERIES.41.0.1566780350821 0 POS_RECOMMEND_BIT 推荐位 2019-08-26 08:45:50
operationPage A089E41D06A0 12042140205760A089E41D06A0 9518971 vod 38 43 38.PAGE_CHILDREN.43.0.1566780360201 0 POS_RECOMMEND_BIT 推荐位 2019-08-26 08:46:00
operationPage A089E41D06A0 12042140205760A089E41D06A0 9518971 vod 38 43 38.PAGE_CHILDREN.43.0.1566780354425 0 POS_RECOMMEND_BIT 推荐位 2019-08-26 08:45:54
operationPage A089E41D06A0 12042140205760A089E41D06A0 9518971 vod 38 47 38.PAGE_HEALTHY.47.0.1566780367204 0 POS_RECOMMEND_BIT 推荐位 2019-08-26 08:46:07
operationPage A089E41D06A0 12042140205760A089E41D06A0 9518971 vod 38 44 38.PAGE_COMIC.44.0.1566780371406 0 POS_RECOMMEND_BIT 推荐位 2019-08-26 08:46:11
operationPage A089E41D06A0 12042140205760A089E41D06A0 9518971 vod 38 40 38.PAGE_YNGDJSJ.0.0.1566780309355 0 POS_RECOMMEND_BIT 推荐位 2019-08-26 08:45:09
operationPage A089E41D06A0 12042140205760A089E41D06A0 9518971 vod 38 42 38.PAGE_MOVIE.42.0.1566780352430 0 POS_RECOMMEND_BIT 推荐位 2019-08-26 08:45:52
operationPage A089E41D06A0 12042140205760A089E41D06A0 9518971 vod 38 45 38.PAGE_VARIETY.45.0.1566780361852 0 POS_RECOMMEND_BIT 推荐位 2019-08-26 08:46:01
operationPage A089E41D06A0 12042140205760A089E41D06A0 9518971 vod 38 41 38.PAGE_SERIES.41.0.1566780326091 0 POS_RECOMMEND_BIT 推荐位 2019-08-26 08:45:26
operationPage A089E41D06A0 12042140205760A089E41D06A0 9518971 vod 38 44 38.PAGE_COMIC.44.0.1566780357993 0 POS_RECOMMEND_BIT 推荐位 2019-08-26 08:45:57
operationPage A089E41D06A0 12042140205760A089E41D06A0 9518971 vod 38 46 38.PAGE_RECORD.46.0.1566780365053 0 POS_RECOMMEND_BIT 推荐位 2019-08-26 08:46:05
Time taken: 0.136 seconds, Fetched: 13 row(s)
上面使用了RegexSerDe的正则表达式拆分功能,使用RegexSerDe类时,所有的字段必须为string
参数 | 说明 | 注意 |
input.regex | 输入的正则表达式,大括号表示一个字段数据 | 如以一个匹配组,表示一个字段:([^ ]*) |
input.regex.case.insensitive | 是否忽略字母大小写,默认为false | |
output.format.string | 输出的正则表达式 | %num表示字段位置 |