hive基础整理

逻辑表,不存数据,只有表的定义,数据存在hdfs上

1.建表将文章放入,完成wordcount

1.split函数

hive> select split("I am a student"," ");
OK
["I","am","a","student"]	

2.explode 把数组中每个元素单独变成一行【行转列】

hive> select explode(split("I am a teacher "," "));
OK
I
am
a
teacher

3.wordcount

select 
lower(regexp_extract(word,'[A-Za-z]+',0))as word, --大小写和正则位置变换一下
count(*) as cnt
from 
(
select 
explode(split(sentence,' ')) as word 
from article
)t
group by word
limit 10;
OK
I	2
a	2
am	2
student	1
teacher	1

2.外部表的概念

3.分区表

  • 对于查询效率上分区表更快:
  • 所有数据都放到一个表的文件夹中,查询数据需要遍历(扫一遍该文件中所有数据),如果这个文件夹中国的数据包含了一年的用户的行为数据,这样扫一遍就是一年所有数据中筛选出需要查询的那一天(或几天)的数据。
  • 如果做了分区,想要获取昨天数据,只需要先扫一遍所有文件夹,将昨天对应日期的文件夹找出来,再对该文件夹中数据进行读取分析。
  • 什么时候采用分区,结合业务,经常要用到的分析条件(口径)。where条件里面经常要被用到的,可以按照条件进行设计分区。

4.创建分区表,查看目录结构

5.分桶:

create table bucket_test(id int);
load data local inpath
'/home/badou/Documents/data/hive/bucket_test.txt'
into table bucket_test;
---------------------------------
-rwxr-xr-x   3 root supergroup          3 2019-11-12 05:54 /user/hive/warehouse/bucket_user/000000_0
-rwxr-xr-x   3 root supergroup          2 2019-11-12 05:54 /user/hive/warehouse/bucket_user/000001_0
-rwxr-xr-x   3 root supergroup          2 2019-11-12 06:06 /user/hive/warehouse/bucket_user/000002_2
-rwxr-xr-x   3 root supergroup          2 2019-11-12 05:55 /user/hive/warehouse/bucket_user/000003_0
-rwxr-xr-x   3 root supergroup          2 2019-11-12 06:08 /user/hive/warehouse/bucket_user/000004_2
-rwxr-xr-x   3 root supergroup          2 2019-11-12 05:56 /user/hive/warehouse/bucket_user/000005_0
-rwxr-xr-x   3 root supergroup          2 2019-11-12 06:07 /user/hive/warehouse/bucket_user/000006_1
-rwxr-xr-x   3 root supergroup          2 2019-11-12 05:54 /user/hive/warehouse/bucket_user/000007_0
-rwxr-xr-x   3 root supergroup          2 2019-11-12 05:54 /user/hive/warehouse/bucket_user/000008_0
-rwxr-xr-x   3 root supergroup          2 2019-11-12 05:54 /user/hive/warehouse/bucket_user/000009_0
-rwxr-xr-x   3 root supergroup          3 2019-11-12 06:05 /user/hive/warehouse/bucket_user/000010_1
-rwxr-xr-x   3 root supergroup          3 2019-11-12 05:54 /user/hive/warehouse/bucket_user/000011_0
-rwxr-xr-x   3 root supergroup          3 2019-11-12 05:55 /user/hive/warehouse/bucket_user/000012_0
-rwxr-xr-x   3 root supergroup          3 2019-11-12 05:54 /user/hive/warehouse/bucket_user/000013_0
-rwxr-xr-x   3 root supergroup          3 2019-11-12 05:59 /user/hive/warehouse/bucket_user/000014_0
-rwxr-xr-x   3 root supergroup          3 2019-11-12 05:59 /user/hive/warehouse/bucket_user/000015_0
-rwxr-xr-x   3 root supergroup          3 2019-11-12 06:07 /user/hive/warehouse/bucket_user/000016_1
-rwxr-xr-x   3 root supergroup          3 2019-11-12 06:01 /user/hive/warehouse/bucket_user/000017_0
-rwxr-xr-x   3 root supergroup          3 2019-11-12 05:59 /user/hive/warehouse/bucket_user/000018_0
-rwxr-xr-x   3 root supergroup          3 2019-11-12 06:01 /user/hive/warehouse/bucket_user/000019_0
-rwxr-xr-x   3 root supergroup          3 2019-11-12 06:08 /user/hive/warehouse/bucket_user/000020_2
-rwxr-xr-x   3 root supergroup          3 2019-11-12 05:59 /user/hive/warehouse/bucket_user/000021_0
-rwxr-xr-x   3 root supergroup          3 2019-11-12 05:59 /user/hive/warehouse/bucket_user/000022_0
-rwxr-xr-x   3 root supergroup          3 2019-11-12 06:00 /user/hive/warehouse/bucket_user/000023_0
-rwxr-xr-x   3 root supergroup          3 2019-11-12 06:07 /user/hive/warehouse/bucket_user/000024_1
-rwxr-xr-x   3 root supergroup          3 2019-11-12 06:01 /user/hive/warehouse/bucket_user/000025_0
-rwxr-xr-x   3 root supergroup          3 2019-11-12 06:01 /user/hive/warehouse/bucket_user/000026_0
-rwxr-xr-x   3 root supergroup          3 2019-11-12 06:01 /user/hive/warehouse/bucket_user/000027_0
-rwxr-xr-x   3 root supergroup          3 2019-11-12 06:06 /user/hive/warehouse/bucket_user/000028_0
-rwxr-xr-x   3 root supergroup          3 2019-11-12 06:04 /user/hive/warehouse/bucket_user/000029_0
-rwxr-xr-x   3 root supergroup          3 2019-11-12 06:05 /user/hive/warehouse/bucket_user/000030_0
-rwxr-xr-x   3 root supergroup          3 2019-11-12 06:06 /user/hive/warehouse/bucket_user/000031_0

按id分桶,如果id都不够,又怎么会有那么多桶?

6.分桶采样

select * from bucket_user tablesample(bucket 1 out of 32 on id );  32 (对32取余数为0)
select * from bucket_user tablesample(bucket 1 out of 16 on id ); 32 16(对16取余数为0)
select * from bucket_user tablesample(bucket 1 out of 9 on id ); 9 18 27(对9取余数为0)
select * from bucket_user tablesample(bucket 2 out of 9 on id ); 1 10 19 28 (对9取余数为1)

7.test

–orders:订单数据表
order_id user_id 历史 序号 周 小时
2539329,1,prior,1,2,08,
2398795,1,prior,2,3,07,15.0
473747,1,prior,3,3,12,21.0
2254736,1,prior,4,4,07,29.0
431534,1,prior,5,4,15,28.0
3367565,1,prior,6,2,07,19.0
550135,1,prior,7,1,09,20.0
3108588,1,prior,8,1,14,14.0
2295261,1,prior,9,1,16,0.0
2550362,1,prior,10,4,08,30.0

select order_dow,count(1) as cnt
from orders
group by order_dow;

select order_hour_of_day,count(1) as cnt
from orders
group by order_hour_of_day;
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值