hive之Python UDF函数操作map数据详解全是干货

最新推荐文章于 2024-06-12 14:06:37 发布

陈伦(colby)

最新推荐文章于 2024-06-12 14:06:37 发布

阅读量2.7k

点赞数

分类专栏： Hadoop Hue Hive Python SQL 文章标签： Hive Python UDF Map

本文链接：https://blog.csdn.net/colby_chenlun/article/details/78140033

版权

本文详细介绍了如何在Hive中操作Map数据类型，包括创建带Map的外部表、加载数据、自定义Python UDF函数解析IP信息、使用UDF处理数据并动态分区。通过实例展示了查询和更新Map类型数据的方法。

摘要由CSDN通过智能技术生成

#1、Hive基本操作：

查看dw.full_h_usr_base_user的详细信息，可以获取数据文件的存放路径

desc formatted dw.full_h_usr_base_user;

dfs -ls dfs -ls hdfs://BIGDATA:9000/user/hive/warehouse/dw.db/full_h_usr_base_user;

删除外表full_h_usr_base_user的数据

dfs -rmdir dfs -ls hdfs://BIGDATA:9000/user/hive/warehouse/dw.db/full_h_usr_base_user;

#192.168.1.181 192.168.1.1

#2、创建带有map数据类型的外表

create external table dw.full_h_usr_base_user(

user_id string comment '用户id',

reg_ip string comment 'ip',

reg_ip_geo_map map<string,string> comment --map数据类型创建方法

'city_id,city_name,isp,province_id,province_name,country_id,country_name,postzip,district,province'

)

comment '用户测试表'

partitioned by(ds string comment '当前时间,用于分区字段')

row format delimited

fields terminated by '\t'

collection items terminated by ","--map键值对逗号分割

map keys terminated by ":"--map键值冒号分割

stored as TEXTFILE;--存储为文本类型

#3、加载数据（指定user_id和reg_ip即可，reg_ip_geo_map可以通过UDF运算出来）

load data local inpath '/opt/data/dw.full_h_usr_base_user.del'

overwrite into table dw.full_h_usr_base_user partition(ds='2017-09-25');

最低0.47元/天解锁文章

陈伦(colby)

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
hive之Python UDF函数操作map数据详解全是干货

#1、Hive基本操作：查看dw.full_h_usr_base_user的详细信息，可以获取数据文件的存放路径desc formatted dw.full_h_usr_base_user;dfs -ls dfs -ls hdfs://BIGDATA:9000/user/hive/warehouse/dw.db/full_h_usr_base_user;删除外表
复制链接

扫一扫