hive执行了set hive.enforce.bucketing=true;进行分桶失败(未解决)

hive建表,根据remoteIp进行分桶 根据requestmethod进行分区

hive> create table partition_cluster_accsslog 
    > (     remoteIp  string, 
    >       loginRemoteName string,
    >       authrizedName string,
    >       responseCode int,
    >       contentBytes int,
    >       handleTime int,
    >       timestamps bigint,
    >       requesturl string,
    >       requestprotocol string,
    >       refer string,
    >       browsername string)
    > partitioned by (requestmethod string)
    > clustered by (remoteIp) sorted by (handleTime) into 2 buckets
    > row format delimited fields terminated by '\t'
    > LOCATION '/bike/log/partition_cluster_accesslog';
OK
Time taken: 0.332 seconds
hive> set hive.enforce.bucketing;
hive.enforce.bucketing=true
hive> set hive.enforce.bucketing;
hive.enforce.bucketing=true
hive> insert into partition_cluster_accsslog partition(  requestmethod='GET') 
    > select 
    > remoteip,loginremotename,authrizedname,responseCode,contentBytes,handleTime,timestamps,requesturl,requestprotocol,refer,browsername
    > from accesslog where requestmethod='GET';
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20200714100700_0664bb7e-57ff-43a0-b6cf-743c55b965b7
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 2
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1594564726792_0007, Tracking URL = http://node3:8088/proxy/application_1594564726792_0007/
Kill Command = /usr/local/hadoop271/bin/hadoop job  -kill job_1594564726792_0007
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 2
2020-07-14 10:07:24,020 Stage-1 map = 0%,  reduce = 0%
2020-07-14 10:07:41,157 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 6.86 sec
2020-07-14 10:08:13,327 Stage-1 map = 100%,  reduce = 50%, Cumulative CPU 14.17 sec
2020-07-14 10:08:16,119 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 22.43 sec
MapReduce Total cumulative CPU time: 23 seconds 690 msec
Ended Job = job_1594564726792_0007
Loading data to table ibike.partition_cluster_accsslog partition (requestmethod=GET)
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 2   Cumulative CPU: 23.69 sec   HDFS Read: 35775 HDFS Write: 4445 SUCCESS
Total MapReduce CPU Time Spent: 23 seconds 690 msec
OK
Time taken: 83.983 seconds
hive> insert into partition_cluster_accsslog partition(  requestmethod='POST') 
    > select 
    > remoteip,loginremotename,authrizedname,responseCode,contentBytes,handleTime,timestamps,requesturl,requestprotocol,refer,browsername
    > from accesslog where requestmethod='POST';
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20200714100932_91c3cd32-000c-41b9-a6d4-988e10d3cdf7
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 2
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1594564726792_0008, Tracking URL = http://node3:8088/proxy/application_1594564726792_0008/
Kill Command = /usr/local/hadoop271/bin/hadoop job  -kill job_1594564726792_0008
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 2
2020-07-14 10:09:56,198 Stage-1 map = 0%,  reduce = 0%
2020-07-14 10:10:35,815 Stage-1 map = 27%,  reduce = 0%, Cumulative CPU 7.04 sec
2020-07-14 10:10:36,919 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 7.83 sec
2020-07-14 10:10:55,996 Stage-1 map = 100%,  reduce = 50%, Cumulative CPU 12.53 sec
2020-07-14 10:11:01,347 Stage-1 map = 100%,  reduce = 83%, Cumulative CPU 18.89 sec
2020-07-14 10:11:03,601 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 21.65 sec
MapReduce Total cumulative CPU time: 21 seconds 650 msec
Ended Job = job_1594564726792_0008
Loading data to table ibike.partition_cluster_accsslog partition (requestmethod=POST)
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 2   Cumulative CPU: 21.65 sec   HDFS Read: 35792 HDFS Write: 9709 SUCCESS
Total MapReduce CPU Time Spent: 21 seconds 650 msec
OK
Time taken: 97.754 seconds

set hive.enforce.bucketing;输出为true
到HDFS中查看文件是否根据remoteIp分桶
通过浏览器查看
数据都集中一起 另一个为空

再通过命令查看数据内容

hadoop dfs -cat /bike/log/partition_cluster_accesslog/requestmethod=GET/000001_0
192.168.146.9	-	-	200	1974	44	1594643387000	/bike/onLogin	HTTP/1.1	https://servicewechat.com/wxeb7e49cf5ed6429a/devtools/page-frame.html	Mac OS X (iPhone)
192.168.146.3	-	-	200	1974	44	1594643387000	/bike/findNearAll	HTTP/1.1	https://servicewechat.com/wxeb7e49cf5ed6429a/devtools/page-frame.html	Mac OS X (iPhone)
192.168.146.3	-	-	200	1978	50	1594556991000	/bike/findNearAll	HTTP/1.1	https://servicewechat.com/wxeb7e49cf5ed6429a/devtools/page-frame.html	Mac OS X (iPhone)
192.168.146.9	-	-	200	1986	60	1594470586000	/bike/onLogin	HTTP/1.1	https://servicewechat.com/wxeb7e49cf5ed6429a/devtools/page-frame.html	Mac OS X (iPhone)
192.168.146.3	-	-	200	1986	60	1594643386000	/bike/findNearAll	HTTP/1.1	https://servicewechat.com/wxeb7e49cf5ed6429a/devtools/page-frame.html	Mac OS X (iPhone)
192.168.146.3	-	-	200	1980	64	1594643384000	/bike/findNearAll	HTTP/1.1	https://servicewechat.com/wxeb7e49cf5ed6429a/devtools/page-frame.html	Mac OS X (iPhone)
192.168.146.9	-	-	200	1974	64	1594643390000	/bike/onLogin	HTTP/1.1	https://servicewechat.com/wxeb7e49cf5ed6429a/devtools/page-frame.html	Mac OS X (iPhone)
192.168.146.9	-	-	200	1980	64	1594643384000	/bike/onLogin	HTTP/1.1	https://servicewechat.com/wxeb7e49cf5ed6429a/devtools/page-frame.html	Mac OS X (iPhone)
192.168.146.3	-	-	200	1974	64	1594643390000	/bike/findNearAll	HTTP/1.1	https://servicewechat.com/wxeb7e49cf5ed6429a/devtools/page-frame.html	Mac OS X (iPhone)
192.168.146.9	-	-	200	1974	67	1594297788000	/bike/onLogin	HTTP/1.1	https://servicewechat.com/wxeb7e49cf5ed6429a/devtools/page-frame.html	Mac OS X (iPhone)
192.168.146.3	-	-	200	1974	67	1594556988000	/bike/findNearAll	HTTP/1.1	https://servicewechat.com/wxeb7e49cf5ed6429a/devtools/page-frame.html	Mac OS X (iPhone)

未能按照remoteIp分桶

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 3
    评论
可以使用如下的HiveSQL语句批量count fdp库下表名开头为db_的表的数据量: ```sql USE fdp; SET hive.exec.dynamic.partition.mode=nonstrict; SET hive.exec.dynamic.partition=true; SET hive.exec.max.dynamic.partitions=100000; SET hive.exec.max.dynamic.partitions.pernode=100000; SET hive.exec.max.created.files=100000; SET mapred.reduce.tasks=8; SET mapred.map.tasks=8; SET hive.auto.convert.join=true; SET hive.optimize.bucketmapjoin=true; SET hive.optimize.bucketmapjoin.sortedmerge=true; SET hive.optimize.bucketmapjoin.sortedmerge.bucketmapjoin=true; SET hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat; SET hive.map.aggr=true; SET hive.merge.mapfiles=true; SET hive.merge.mapredfiles=true; SET hive.merge.size.per.task=256000000; SET hive.merge.smallfiles.avgsize=16000000; SET hive.merge.orcfile.stripe.level=true; SET hive.merge.orcfile.stripe.level.threshold=0.5; SET hive.merge.tezfiles=true; SET hive.merge.smallfiles.avgsize=134217728; SET hive.merge.size.per.task=1073741824; SET hive.merge.orcfile.stripe.level=true; SET hive.merge.orcfile.stripe.level.threshold=0.5; SET hive.merge.mapfiles=true; SET hive.merge.mapredfiles=true; SET mapred.max.split.size=512000000; SET hive.exec.dynamic.partition=true; SET hive.exec.dynamic.partition.mode=nonstrict; SET hive.enforce.bucketing=true; SET hive.exec.max.dynamic.partitions=100000; SET hive.exec.max.dynamic.partitions.pernode=100000; SET hive.exec.max.created.files=100000; SET hive.auto.convert.join=true; SET hive.optimize.bucketmapjoin=true; SET hive.optimize.bucketmapjoin.sortedmerge=true; SET hive.optimize.bucketmapjoin.sortedmerge.bucketmapjoin=true; SET hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat; SET hive.map.aggr=true; SET hive.merge.mapfiles=true; SET hive.merge.mapredfiles=true; SET hive.merge.size.per.task=256000000; SET hive.merge.smallfiles.avgsize=16000000; SET hive.merge.orcfile.stripe.level=true; SET hive.merge.orcfile.stripe.level.threshold=0.5; SET hive.merge.tezfiles=true; SET hive.merge.smallfiles.avgsize=134217728; SET hive.merge.size.per.task=1073741824; SET hive.merge.orcfile.stripe.level=true; SET hive.merge.orcfile.stripe.level.threshold=0.5; SET hive.merge.mapfiles=true; SET hive.merge.mapredfiles=true; SET mapred.max.split.size=512000000; SELECT COUNT(*) FROM db_table1 UNION ALL SELECT COUNT(*) FROM db_table2 UNION ALL SELECT COUNT(*) FROM db_table3 ... UNION ALL SELECT COUNT(*) FROM db_tableN; ``` 其中,需要将 `db_table1` 至 `db_tableN` 替换为实际的表名。如果不确定有哪些表名符合条件,可以使用如下的语句查询: ```sql SHOW TABLES LIKE 'db_%'; ``` 该语句会列出所有表名开头为 `db_` 的表。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值