一般hive建表的时候都会根据数据的分隔符进行建表,表的分隔符分三种
1,默认分隔符
\n | 行分隔符 |
^A | 字段分隔符,八进制表示为\001, |
^B | array或struct中为元素分隔符,map中为key-value分隔符\002 |
^C | map中为key和value间的分隔符\003 |
默认分割符一般是在建表是指定的,^A为\001,^B为\002,^C为\003,不同的数据格式不同的分隔符,在vim中\001是先按Ctrl+v再按Ctrl+a,\002先按Ctrl+v再按Ctrl+b,以此类推
2,指定单个特殊符号做为分隔符
create external table an_dimension_area(
area_id string,
county_name string,
city_id string,
city_name string,
province_id string,
province_name string
)
row format delimited
fields terminated by ','
STORED AS TEXTFILE;
上述为用","做为字段间的分割,或者字段间以\t为分割,行之间用\n分割,自己根据自己需要随意指定
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'LINES TERMINATED BY '\n'
3,使用多字符作为分割符
我们的数据是以@@@分割,所以用上面的分割符都不能满足,
①使用MultiDelimitSerDe的方法来实现
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' WITH SERDEPROPERTIES ("field.delim"="@@@") LINES TERMINATED BY '\n'STORED AS TEXTFILE;
row format SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' WITH SERDEPROPERTIES ("field.delim"="@@@")
②使用RegexSerDe的方法实现
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' WITH SERDEPROPERTIES ("input.regex" = "^(.*)\\@\\@\\@(.*)$") LINES TERMINATED BY '\n'STORED AS TEXTFILE;
参考:https://blog.csdn.net/u013150378/article/details/90766209