Hive里的正则表达式
如,hive的官网底部
输入regex可查到
下面就是hive里的正则表达式,9个字段,对应定义那边也要9个
"input.regex" = "([^ ]*) ([^ ]*) ([^.]*) \[(.*)\] "(.*)" (-|[0-9]*) (-|[(0-9]*) "(.*)" "(.*)""
([^ ]*) ([^ ]*) ([^.]*) \[(.*)\] "(.*)" (-|[0-9]*) (-|[(0-9]*) "(.*)" "(.*)"
([^ ]*) ([^ ]*) ([^.]*) \\[(.*)\\] "(.*)" (-|[0-9]*) (-|[(0-9]*) \"(.*)\" \"(.*)\"
最后一行是加了转义字符
数据来源 是 /var/log/nginx/ 下面有access.log 日志 符合上面的正则
CREATE TABLE accesslog (
host STRING,
identity STRING,
users STRING,
time STRING,
request STRING,
status STRING,
size STRING,
referer STRING,
agent STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
"input.regex" = "([^ ]*) ([^ ]*) ([^.]*) \\[(.*)\\] \"(.*)\" (-|[0-9]*) (-|[(0-9]*) \"(.*)\" \"(.*)\""
)
STORED AS TEXTFILE;
load data local inpath '/home/hivedata/access' into table accesslog;
小问题:
1.字段user好像是关键字 不让用 所以使用users
ParseException line 4:0 Failed to recognize predicate
0: jdbc:hive2://hadoop01:10000> STORED AS TEXTFILE;
Error: Error while compiling statement: FAILED: ParseException line 4:0 Failed to recognize predicate 'user'. Failed rule: 'identifier' in column specification (state=42000,code=40000)
2.没有添加转义字符 \ 的问题
java.util.regex.PatternSyntaxException: Unclosed character class near index 98
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.util.regex.PatternSyntaxException: Unclosed character class near index 98
([^ ]*) ([^ ]*) ([^.]*) \[(.*)\] "(.*)" (-|[0-9]*) (-|[(0-9]*) "(.*)" "(.*)"
^ (state=08S01,code=1)
结果是可以的