hive执行了set hive.enforce.bucketing=true；进行分桶失败(未解决)

最新推荐文章于 2022-04-25 22:22:17 发布

代码怎么又报错啦

最新推荐文章于 2022-04-25 22:22:17 发布

阅读量2.7k

点赞数

文章标签： hive hadoop linux 大数据

本文链接：https://blog.csdn.net/qq_43139323/article/details/107332189

版权

hive建表，根据remoteIp进行分桶根据requestmethod进行分区

hive> create table partition_cluster_accsslog 
    > (     remoteIp  string, 
    >       loginRemoteName string,
    >       authrizedName string,
    >       responseCode int,
    >       contentBytes int,
    >       handleTime int,
    >       timestamps bigint,
    >       requesturl string,
    >       requestprotocol string,
    >       refer string,
    >       browsername string)
    > partitioned by (requestmethod string)
    > clustered by (remoteIp) sorted by (handleTime) into 2 buckets
    > row format delimited fields terminated by '\t'
    > LOCATION '/bike/log/partition_cluster_accesslog';
OK
Time taken: 0.332 seconds
hive> set hive.enforce.bucketing;
hive.enforce.bucketing=true
hive> set hive.enforce.bucketing;
hive.enforce.bucketing=true
hive> insert into partition_cluster_accsslog partition(  requestmethod='GET') 
    > select 
    > remoteip,loginremotename,authrizedname,responseCode,contentBytes,handleTime,timestamps,requesturl,requestprotocol,refer,browsername
    > from accesslog where requestmethod='GET';
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20200714100700_0664bb7e-57ff-43a0-b6cf-743c55b965b7
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 2
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1594564726792_0007, Tracking URL = http://node3:8088/proxy/application_1594564726792_0007/
Kill Command = /usr/local/hadoop271/bin/hadoop job  -kill job_1594564726792_0007
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 2
2020-07-14 10:07:24,020 Stage-1 map = 0%,  reduce = 0%
2020-07-14 10:07:41,157 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 6.86 sec
2020-07-14 10:08:13,327 Stage-1 map = 100%,  reduce = 50%, Cumulative CPU 14.17 sec
2020-07-14 10:08:16,119 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 22.43 sec
MapReduce Total cumulative CPU time: 23 seconds 690 msec
Ended Job = job_1594564726792_0007
Loading data to table ibike.partition_cluster_accsslog partition (requestmethod=GET)
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 2   Cumulative CPU: 23.69 sec   HDFS Read: 35775 HDFS Write: 4445 SUCCESS
Total MapReduce CPU Time Spent: 23 seconds 690 msec
OK
Time taken: 83.983 seconds
hive> insert into partition_cluster_accsslog partition(  requestmethod='POST') 
    > select 
    > remoteip,loginremotename,authrizedname,responseCode,contentBytes,handleTime,timestamps,requesturl,requestprotocol,refer,browsername
    > from accesslog where requestmethod='POST';
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20200714100932_91c3cd32-000c-41b9-a6d4-988e10d3cdf7
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 2
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1594564726792_0008, Tracking URL = http://node3:8088/proxy/application_1594564726792_0008/
Kill Command = /usr/local/hadoop271/bin/hadoop job  -kill job_1594564726792_0008
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 2
2020-07-14 10:09:56,198 Stage-1 map = 0%,  reduce = 0%
2020-07-14 10:10:35,815 Stage-1 map = 27%,  reduce = 0%, Cumulative CPU 7.04 sec
2020-07-14 10:10:36,919 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 7.83 sec
2020-07-14 10:10:55,996 Stage-1 map = 100%,  reduce = 50%, Cumulative CPU 12.53 sec
2020-07-14 10:11:01,347 Stage-1 map = 100%,  reduce = 83%, Cumulative CPU 18.89 sec
2020-07-14 10:11:03,601 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 21.65 sec
MapReduce Total cumulative CPU time: 21 seconds 650 msec
Ended Job = job_1594564726792_0008
Loading data to table ibike.partition_cluster_accsslog partition (requestmethod=POST)
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 2   Cumulative CPU: 21.65 sec   HDFS Read: 35792 HDFS Write: 9709 SUCCESS
Total MapReduce CPU Time Spent: 21 seconds 650 msec
OK
Time taken: 97.754 seconds

set hive.enforce.bucketing;输出为true
到HDFS中查看文件是否根据remoteIp分桶
通过浏览器查看
数据都集中一起另一个为空

再通过命令查看数据内容

hadoop dfs -cat /bike/log/partition_cluster_accesslog/requestmethod=GET/000001_0
192.168.146.9	-	-	200	1974	44	1594643387000	/bike/onLogin	HTTP/1.1	https://servicewechat.com/wxeb7e49cf5ed6429a/devtools/page-frame.html	Mac OS X (iPhone)
192.168.146.3	-	-	200	1974	44	1594643387000	/bike/findNearAll	HTTP/1.1	https://servicewechat.com/wxeb7e49cf5ed6429a/devtools/page-frame.html	Mac OS X (iPhone)
192.168.146.3	-	-	200	1978	50	1594556991000	/bike/findNearAll	HTTP/1.1	https://servicewechat.com/wxeb7e49cf5ed6429a/devtools/page-frame.html	Mac OS X (iPhone)
192.168.146.9	-	-	200	1986	60	1594470586000	/bike/onLogin	HTTP/1.1	https://servicewechat.com/wxeb7e49cf5ed6429a/devtools/page-frame.html	Mac OS X (iPhone)
192.168.146.3	-	-	200	1986	60	1594643386000	/bike/findNearAll	HTTP/1.1	https://servicewechat.com/wxeb7e49cf5ed6429a/devtools/page-frame.html	Mac OS X (iPhone)
192.168.146.3	-	-	200	1980	64	1594643384000	/bike/findNearAll	HTTP/1.1	https://servicewechat.com/wxeb7e49cf5ed6429a/devtools/page-frame.html	Mac OS X (iPhone)
192.168.146.9	-	-	200	1974	64	1594643390000	/bike/onLogin	HTTP/1.1	https://servicewechat.com/wxeb7e49cf5ed6429a/devtools/page-frame.html	Mac OS X (iPhone)
192.168.146.9	-	-	200	1980	64	1594643384000	/bike/onLogin	HTTP/1.1	https://servicewechat.com/wxeb7e49cf5ed6429a/devtools/page-frame.html	Mac OS X (iPhone)
192.168.146.3	-	-	200	1974	64	1594643390000	/bike/findNearAll	HTTP/1.1	https://servicewechat.com/wxeb7e49cf5ed6429a/devtools/page-frame.html	Mac OS X (iPhone)
192.168.146.9	-	-	200	1974	67	1594297788000	/bike/onLogin	HTTP/1.1	https://servicewechat.com/wxeb7e49cf5ed6429a/devtools/page-frame.html	Mac OS X (iPhone)
192.168.146.3	-	-	200	1974	67	1594556988000	/bike/findNearAll	HTTP/1.1	https://servicewechat.com/wxeb7e49cf5ed6429a/devtools/page-frame.html	Mac OS X (iPhone)

未能按照remoteIp分桶

代码怎么又报错啦

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
3
评论
hive执行了set hive.enforce.bucketing=true；进行分桶失败(未解决)

hive建表，根据remoteIp进行分桶根据requestmethod进行分区hive> create table partition_cluster_accsslog > ( remoteIp string, > loginRemoteName string, > authrizedName string, > responseCode int, > conten
复制链接

扫一扫