数据样例
{"message": "ok", "code": 0, "data": "N,N,中国$四川$南充$顺庆区|中国$四川$南充$南部县,4|4,女,Mid-Age,1,1\t19~21,PC,5,Windows XP,N,N,Chrome50,N\tN,N,N,N,N,N,N,N,N\tN,N,N,N\tN,N,N,N,N\tN,N,N,N,N\t金融,N,N,N,N,N\t金融,N,N,N,N,N\t金融,N,N,N,N\tN,N,N,N\tP,P,P\tP,P,P"}
1,创建了hive表,如下
create table bdm.profile_gid
(
gid string,
profile string)
partitioned by (l_date string)
row format delimited
fields terminated by '\t'
lines terminated by '\n'
stored as textfile;
2,数据导进去后全部为空
3,原因
原始数据有很多\t,导入两列都是最开始的两列\t,所以相当于没有数据
4,解决方法
4.1 修改表结构,分隔符从\t变为\001
4.2 将文本数据
awk -F'\t' '{print $4"\001"$8}' fname_orginal > fname_format
4.3 导数据进去
hive load data
最新推荐文章于 2024-07-10 16:42:23 发布