【若泽大数据实战第十九天】Hive 函数UDF开发以及永久注册udf函数

前言:

回顾上期课程,上次课我们讲了聚合函数,多进一出,

分组函数  group by,出现在select里面的字段除了分组函数之外,其他都要出现在group by里面,分组函数的过滤必须使用hiving,hiving是对分组以后的结果做过滤的,where是对进来的数据一条条做过滤,

case when then 报表里面经常实现,给你一个条件让你显示字段,

分区表:为了提高查询的性能,降低IO,分区表分为,静态分区表,动态分区表,单级分区,一个目录,多级分区,是有多个分区,分区表就理解为一个大表拆分成很多小表,一个目录拆分成多个目录,静态分区表单级和多级使用起来是一样的,

partitioined by,语法没有区别,有区别的是,静态分区表你要给它分区值指定好,动态分区动只要写在select之后,就可以了它会自动的进行操作,动态分区表在工作中使用的非常的多,不需要指定很多条件,

函数:function,可以通过show functions,具体的 desc xxx xxx

本期课程如下:

explode:

把数组转成多行的数据

-- 创建一个文本:

[hadoop@hadoop000 data]$ vi hive-wc.txt
hello,world,welcome
hello,welcome

-- 创建表hive_wc,并且把数据上传到hdfs上,查看数据是否上传成功。

hive> create table hive_wc(sentence string);
OK
Time taken: 1.083 seconds

hive> load data local inpath '/home/hadoop/data/hive-wc.txt' into table hive_wc;
Loading data to table default.hive_wc
Table default.hive_wc stats: [numFiles=1, totalSize=35]
OK
Time taken: 1.539 seconds

hive> select * from hive_wc;
OK
hello,world,welcome
hello,welcome

Time taken: 0.536 seconds, Fetched: 3 row(s)
要实现wordcount必须完成以下几点
求每个单词出现的个数:

1.获取每个单词

select split(sentence,",") from hive_wc;

实验发现可以获取每个单词

hive> select split(sentence,",") from hive_wc;
OK
["hello","world","welcome"]
["hello","welcome"]
[""]
Time taken: 0.161 seconds, Fetched: 3 row(s)

我们的需求是变成,以下的格式,这样我们就可以做group by了

"hello"
"world"
"welcome"	
"hello"
"welcome"

explode把数组转成多行的数据修改后的sql语句:

hive> select explode(split(sentence,",")) from hive_wc;
OK
hello
world
welcome
hello
welcome

用一个SQL完成wordcount统计:

hive> select word, count(1) as c 
    > from (select explode(split(sentence,",")) as word from hive_wc) t
    > group by word ;
Query ID = hadoop_20180613094545_920c2e72-5982-47eb-9a9c-5e5a30ebb1ae
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1528851144815_0001, Tracking URL = http://hadoop000:8088/proxy/application_1528851144815_0001/
Kill Command = /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/bin/hadoop job  -kill job_1528851144815_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2018-06-13 10:18:53,155 Stage-1 map = 0%,  reduce = 0%
2018-06-13 10:18:59,605 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.42 sec
2018-06-13 10:19:07,113 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 4.31 sec
MapReduce Total cumulative CPU time: 4 seconds 310 msec
Ended Job = job_1528851144815_0001
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 4.31 sec   HDFS Read: 7333 HDFS Write: 29 SUCCESS
Total MapReduce CPU Time Spent: 4 seconds 310 msec
OK
        1
hello   2
welcome 2
world   1
Time taken: 26.859 seconds, Fetched: 4 row(s)

json : 工作中经常要接触到的

使用到的文件: [hadoop@hadoop000 data]$ rating.json 

创建一张表 rating_json,上传数据,并查看前十行数据信息

hive> create table rating_json(json string);
OK

hive> load data local inpath '/home/hadoop/data/rating.json' into table rating_json;
Loading data to table default.rating_json
Table default.rating_json stats: [numFiles=1, totalSize=34967552]
OK


hive> select * from rating_json limit 10;
OK
{"movie":"1193","rate":"5","time":"978300760","userid":"1"}
{"movie":"661","rate":"3","time":"978302109","userid":"1"}
{"movie":"914","rate":"3","time":"978301968","userid":"1"}
{"movie":"3408","rate":"4","time":"978300275","userid":"1"}
{"movie":"2355","rate":"5","time":"978824291","userid":"1"}
{"movie":"1197","rate":"3","time":"978302268","userid":"1"}
{"movie":"1287","rate":"5","time":"978302039","userid":"1"}
{"movie":"2804","rate":"5","time":"978300719","userid":"1"}
{"movie":"594","rate":"4","time":"978302268","userid":"1"}
{"movie":"919","rate":"4
  • 0
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值