Hive 解析多字节分隔符的Serde

最新推荐文章于 2022-04-19 16:02:27 发布

syssp-F

最新推荐文章于 2022-04-19 16:02:27 发布

阅读量313

点赞数

分类专栏： Hive

本文链接：https://blog.csdn.net/ReyzeLamp/article/details/118489733

版权

Hive 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

movies.dat 数据格式为：

2::Jumanji (1995)::Adventure|Children's|Fantasy，

hive 正常默认是不支持多字节分隔符的，例如：支持这种':' 不支持这种 '::'

如果需要加载上面的以::分割的数据到hive 表，

那么需要用到能解析多字节分隔符的Serde即可

使用RegexSerde

需要两个参数：
input.regex = "(.*)::(.*)::(.*)"
output.format.string = "%1$s %2$s %3$s"

drop database if exists movie;
create database if not exists movie;
use movie;
##建表
create table t_movie(
movieid bigint,
moviename string,
movietype string) 
row format serde 'org.apache.hadoop.hive.serde2.RegexSerDe' 
with serdeproperties('input.regex'='(.*)::(.*)::(.*)','output.format.string'='%1$s %2$s %3$s')
stored as textfile;

##导入数据
load data local inpath "/movie/movies.dat" into table t_movie;