转载:hive学习5:自定义函数之json对象转换hive数据存储

原始链接:https://www.pianshen.com/article/931082274/

 

hive学习5:自定义函数之json对象转换hive数据存储

技术标签: hive

 

0元数据,上传到服务器上:

 
  1. {"movie":"2717","rate":"3","timeStamp":"978298196","uid":"2"}

  2. {"movie":"2571","rate":"4","timeStamp":"978299773","uid":"2"}

  3. {"movie":"1917","rate":"3","timeStamp":"978300174","uid":"2"}

  4. {"movie":"2396","rate":"4","timeStamp":"978299641","uid":"2"}

  5. {"movie":"3735","rate":"3","timeStamp":"978298814","uid":"2"}

  6. {"movie":"1953","rate":"4","timeStamp":"978298775","uid":"2"}

  7. {"movie":"1597","rate":"3","timeStamp":"978300025","uid":"2"}

  8. {"movie":"3809","rate":"3","timeStamp":"978299712","uid":"2"}

  9. {"movie":"1954","rate":"5","timeStamp":"978298841","uid":"2"}

  10. {"movie":"1955","rate":"4","timeStamp":"978299200","uid":"2"}

  11. {"movie":"235","rate":"3","timeStamp":"978299351","uid":"2"}

  12. {"movie":"1124","rate":"5","timeStamp":"978299418","uid":"2"}

  13. {"movie":"1957","rate":"5","timeStamp":"978298750","uid":"2"}

  14. {"movie":"163","rate":"4","timeStamp":"978299809","uid":"2"}

 
  1. #创建t_json表

  2. 0: jdbc:hive2://localhost:10000> create table t_json(line string)

  3. 0: jdbc:hive2://localhost:10000> row format delimited;

  4.  
  5. #导入数据

  6. 0: jdbc:hive2://localhost:10000> load data local inpath '/home/hadoop/data.json' into table t_json;

导入后结果:

以上数据初始化话完成。

1、创建java项目。不介绍,项目创建好后,可以加入pom依赖,或者直接把hive依赖的jar包拷贝项目里,

2、创建java类JsonParser

 
  1. package com.zsy.hive.udf;

  2.  
  3. import java.io.IOException;

  4.  
  5. import org.apache.hadoop.hive.ql.exec.UDF;

  6.  
  7. import parquet.org.codehaus.jackson.map.ObjectMapper;

  8.  
  9. public class JsonParser extends UDF {

  10.  
  11. ObjectMapper objectMapper = new ObjectMapper();

  12.  
  13. public String evaluate(String jsonLine) {

  14.  
  15. MovieRateBean readValue;

  16. try {

  17. readValue = objectMapper.readValue(jsonLine, MovieRateBean.class);

  18. return readValue.toString();

  19. } catch (IOException e) {

  20. }

  21. return "";

  22. }

  23. }

3、创建bean对象:

 
  1. package com.zsy.hive.udf;

  2.  
  3. public class MovieRateBean {

  4. private String movie;

  5. private String rate;

  6. private String timeStamp;

  7. private String uid;

  8.  
  9. public String getMovie() {

  10. return movie;

  11. }

  12.  
  13. public void setMovie(String movie) {

  14. this.movie = movie;

  15. }

  16.  
  17. public String getRate() {

  18. return rate;

  19. }

  20.  
  21. public void setRate(String rate) {

  22. this.rate = rate;

  23. }

  24.  
  25. public String getTimeStamp() {

  26. return timeStamp;

  27. }

  28.  
  29. public void setTimeStamp(String timeStamp) {

  30. this.timeStamp = timeStamp;

  31. }

  32.  
  33. public String getUid() {

  34. return uid;

  35. }

  36.  
  37. public void setUid(String uid) {

  38. this.uid = uid;

  39. }

  40.  
  41. @Override

  42. public String toString() {

  43. return movie + "," + rate + "," + timeStamp + "," + uid;

  44. }

  45.  
  46. }

4、打包,放到hive所在的服务器。

5、执行命令,把jar包加入classpath里:

0: jdbc:hive2://localhost:10000> add JAR /home/hadoop/udf.jar;

6、创建函数:

0: jdbc:hive2://localhost:10000> create temporary function parseJson as 'com.zsy.hive.udf.JsonParser';

7、测试:

跟进逗号进行分割:select split(parseJson(line),',')[0] movieId,split(parseJson(line),',')[1] rate,split(parseJson(line),',')[2] time,split(parseJson(line),',')[3] id from t_json limit 10;

查询的结果保存到hive表里:

create table t_rate as  select split(parseJson(line),',')[0] movieId,split(parseJson(line),',')[1] rate,split(parseJson(line),',')[2] time,split(parseJson(line),',')[3] id from t_json ;

运行过程:

 

结果:

 

以上是通过自定义hive函数来解决的方式,其实hive有一个内置函数,接下来演示下内置函数:

 select get_json_object(line,'$.movie') as movie from t_json limit 10;

内置的函数只能实现简单的json格式,复杂的json格式还是要定义函数来实现

 

版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。

本文链接:https://blog.csdn.net/u011410254/article/details/83279620

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值