在创建Hive表时,一般我们会指定字段间的分隔符,如空格,\t,逗号等,这些是都单字符。那么,如果我们想要按照多个字符分隔,如##,|+|等,这时候就要用到如下两种方式:
MultiDelimitSerDe解决多字符分隔
- 创建表
0: jdbc:hive2://node03:10000> create table test1 (id string, name string)
. . . . . . . . . . . . . . > row format serde 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe'
. . . . . . . . . . . . . . > with serdeproperties ('field.delim'='##');
No rows affected (0.172 seconds)
- 准备测试数据test1.txt
[hadoop@node03 data]$ vi /home/hadoop/hive/data/test1.txt
1##liubei
2##guanyv
3##zhangfei
- 插入测试数据
0: jdbc:hive2://node03:10000> load data local inpath '/home/hadoop/hive/data/test1.txt' overwrite into table test1;
- 查询表
0: jdbc:hive2://node03:10000> select * from test1;
+-----------+-------------+--+
| test1.id | test1.name |
+-----------+-------------+--+
| 1 | liubei |
| 2 | guanyv |
| 3 | zhangfei |
+-----------+-------------+--+
RegexSerDe解决多字符分隔
- 创建表
0: jdbc:hive2://node03:10000> create table test2(id int, name string)
. . . . . . . . . . . . . . > row format serde 'org.apache.hadoop.hive.serde2.RegexSerDe'
. . . . . . . . . . . . . . > with serdeproperties ('input.regex'='^(.*)\\#\\#(.*)$');
No rows affected (0.117 seconds)
- 准备测试数据test2.txt
[hadoop@node03 data]$ vi /home/hadoop/hive/data/test2.txt
1##liubei
2##guanyv
3##zhangfei
- 插入测试数据
0: jdbc:hive2://node03:10000> load data local inpath '/home/hadoop/hive/data/test2.txt' overwrite into table test2;
- 查询表
0: jdbc:hive2://node03:10000> select * from test2;
+-----------+-------------+--+
| test2.id | test2.name |
+-----------+-------------+--+
| 1 | liubei |
| 2 | guanyv |
| 3 | zhangfei |
+-----------+-------------+--+