分区、分桶练习

本文介绍了如何将数据分区存储在Hive表ip1中,根据IP首字进行分区,并将数据按照fadd字段进行分桶排序。随后,详细步骤演示了如何查询特定桶的数据,如第2和5桶。涉及的技术包括数据加载、分区表、分桶排序和SQL查询。
摘要由CSDN通过智能技术生成
1.将数据放入分区表ip1(fadd,sadd,tadd,fouadd)中并进行分区,(区号:firstadd),即:ip第一位
2.将数据放入分桶排序(正序)表ip2,分8个桶
3.按照fadd查出第2,5桶
4.按照fadd查找第2,4,6,8桶

1.将数据放入分区表ip1(fadd,sadd,tadd,fouadd)中并进行分区,(区号:firstadd),即:ip第一位
create external table if not exists ip1(
fadd int,
asdd int,
tadd int,
fouadd int
)
partitioned by (firstadd int)
row format delimited fields terminated by '.'
lines terminated by '\n'

load data local inpath '/root/ip/ip1.txt' into table ip.ip1
partition(firstadd=192)

load data local inpath '/root/ip/ip2.txt' into table ip.ip1
partition(firstadd=132)

load data local inpath '/root/ip/ip3.txt' into table ip.ip1
partition(firstadd=122)

load data local inpath '/root/ip/ip4.txt' into table ip.ip1
partition(firstadd=182)

2.将数据放入分桶排序(正序)表ip2,分8个桶
分桶
第一种:
create external table if not exists ip2(
fadd int,
asdd int,
tadd int,
fouadd int
)
clustered by (fadd)
sorted by (fadd) into 8 buckets
row format delimited fields terminated by '.'
lines terminated by '\n'

set mapreduce.job.reduces=8

load data local inpath '/root/ip/ip1.txt' into table ip.ip2
load data local inpath '/root/ip/ip2.txt' into table ip.ip2
load data local inpath '/root/ip/ip3.txt' into table ip.ip2
load data local inpath '/root/ip/ip4.txt' into table ip.ip2


第二种:
create external table if not exists t_ip2(
fadd int,
asdd int,
tadd int,
fouadd int
)
row format delimited fields terminated by '.'
lines terminated by '\n'

load data local inpath '/root/ip/ip1.txt' into table ip.t_ip2;
load data local inpath '/root/ip/ip2.txt' into table ip.t_ip2;
load data local inpath '/root/ip/ip3.txt' into table ip.t_ip2;
load data local inpath '/root/ip/ip4.txt' into table ip.t_ip2;

insert overwrite table ip2
select fadd,asdd,tadd,fouadd
from t_ip2


set hive.exec.mode.local.auto=true;

create external table if not exists ip3(
fadd int,
asdd int,
tadd int,
fouadd int
)
clustered by (asdd)
sorted by (asdd) into 8 buckets
row format delimited fields terminated by '.'
lines terminated by '\n'

insert overwrite table ip3
select fadd,asdd,tadd,fouadd
from t_ip2


3.按照fadd查出第2,5桶
select * from ip3 tablesample(bucket 2 out of 4 on asdd);


4.按照fadd查找第2,4,6,8桶
select * from ip3 tablesample(bucket 2 out of 2 on asdd);
ip1.txt
192.178.10.128
192.168.10.138
192.168.10.118
192.168.10.28
192.158.10.128	
192.158.10.138
192.138.0.118
ip2.txt
132.168.11.28
132.128.19.128
132.128.10.138
132.158.10.118
132.168.12.28
ip3.txt
122.168.10.38
122.128.14.138
122.178.10.118
122.168.12.29
ip4.txt
182.168.10.129
182.168.10.108
182.168.10.118
182.168.10.280
182.168.10.120
182.168.10.108
182.168.10.118
182.168.10.256

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值