hive建表,根据remoteIp进行分桶 根据requestmethod进行分区
hive> create table partition_cluster_accsslog
> ( remoteIp string,
> loginRemoteName string,
> authrizedName string,
> responseCode int,
> contentBytes int,
> handleTime int,
> timestamps bigint,
> requesturl string,
> requestprotocol string,
> refer string,
> browsername string)
> partitioned by (requestmethod string)
> clustered by (remoteIp) sorted by (handleTime) into 2 buckets
> row format delimited fields terminated by '\t'
> LOCATION '/bike/log/partition_cluster_accesslog';
OK
Time taken: 0.332 seconds
hive> set hive.enforce.bucketing;
hive.enforce.bucketing=true
hive> set hive.enforce.bucketing;
hive.enforce.bucketing=true
hive> insert into partition_cluster_accsslog partition( requestmethod='GET')
> select
> remoteip,loginremotename,authrizedname,responseCode,contentBytes,handleTime,timestamps,requesturl,requestprotocol,refer,browsername
> from accesslog where requestmethod='GET';
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20200714100700_0664bb7e-57ff-43a0-b6cf-743c55b965b7
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 2
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1594564726792_0007, Tracking URL = http://node3:8088/proxy/application_1594564726792_0007/
Kill Command = /usr/local/hadoop271/bin/hadoop job -kill job_1594564726792_0007
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 2
2020-07-14 10:07:24,020 Stage-1 map = 0%, reduce = 0%
2020-07-14 10:07:41,157 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 6.86 sec
2020-07-14 10:08:13,327 Stage-1 map = 100%, reduce = 50%, Cumulative CPU 14.17 sec
2020-07-14 10:08:16,119 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 22.43 sec
MapReduce Total cumulative CPU time: 23 seconds 690 msec
Ended Job = job_1594564726792_0007
Loading data to table ibike.partition_cluster_accsslog partition (requestmethod=GET)
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 2 Cumulative CPU: 23.69 sec HDFS Read: 35775 HDFS Write: 4445 SUCCESS
Total MapReduce CPU Time Spent: 23 seconds 690 msec
OK
Time taken: 83.983 seconds
hive> insert into partition_cluster_accsslog partition( requestmethod='POST')
> select
> remoteip,loginremotename,authrizedname,responseCode,contentBytes,handleTime,timestamps,requesturl,requestprotocol,refer,browsername
> from accesslog where requestmethod='POST';
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20200714100932_91c3cd32-000c-41b9-a6d4-988e10d3cdf7
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 2
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1594564726792_0008, Tracking URL = http://node3:8088/proxy/application_1594564726792_0008/
Kill Command = /usr/local/hadoop271/bin/hadoop job -kill job_1594564726792_0008
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 2
2020-07-14 10:09:56,198 Stage-1 map = 0%, reduce = 0%
2020-07-14 10:10:35,815 Stage-1 map = 27%, reduce = 0%, Cumulative CPU 7.04 sec
2020-07-14 10:10:36,919 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 7.83 sec
2020-07-14 10:10:55,996 Stage-1 map = 100%, reduce = 50%, Cumulative CPU 12.53 sec
2020-07-14 10:11:01,347 Stage-1 map = 100%, reduce = 83%, Cumulative CPU 18.89 sec
2020-07-14 10:11:03,601 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 21.65 sec
MapReduce Total cumulative CPU time: 21 seconds 650 msec
Ended Job = job_1594564726792_0008
Loading data to table ibike.partition_cluster_accsslog partition (requestmethod=POST)
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 2 Cumulative CPU: 21.65 sec HDFS Read: 35792 HDFS Write: 9709 SUCCESS
Total MapReduce CPU Time Spent: 21 seconds 650 msec
OK
Time taken: 97.754 seconds
set hive.enforce.bucketing;
输出为true
到HDFS中查看文件是否根据remoteIp分桶
数据都集中一起 另一个为空
再通过命令查看数据内容
hadoop dfs -cat /bike/log/partition_cluster_accesslog/requestmethod=GET/000001_0
192.168.146.9 - - 200 1974 44 1594643387000 /bike/onLogin HTTP/1.1 https://servicewechat.com/wxeb7e49cf5ed6429a/devtools/page-frame.html Mac OS X (iPhone)
192.168.146.3 - - 200 1974 44 1594643387000 /bike/findNearAll HTTP/1.1 https://servicewechat.com/wxeb7e49cf5ed6429a/devtools/page-frame.html Mac OS X (iPhone)
192.168.146.3 - - 200 1978 50 1594556991000 /bike/findNearAll HTTP/1.1 https://servicewechat.com/wxeb7e49cf5ed6429a/devtools/page-frame.html Mac OS X (iPhone)
192.168.146.9 - - 200 1986 60 1594470586000 /bike/onLogin HTTP/1.1 https://servicewechat.com/wxeb7e49cf5ed6429a/devtools/page-frame.html Mac OS X (iPhone)
192.168.146.3 - - 200 1986 60 1594643386000 /bike/findNearAll HTTP/1.1 https://servicewechat.com/wxeb7e49cf5ed6429a/devtools/page-frame.html Mac OS X (iPhone)
192.168.146.3 - - 200 1980 64 1594643384000 /bike/findNearAll HTTP/1.1 https://servicewechat.com/wxeb7e49cf5ed6429a/devtools/page-frame.html Mac OS X (iPhone)
192.168.146.9 - - 200 1974 64 1594643390000 /bike/onLogin HTTP/1.1 https://servicewechat.com/wxeb7e49cf5ed6429a/devtools/page-frame.html Mac OS X (iPhone)
192.168.146.9 - - 200 1980 64 1594643384000 /bike/onLogin HTTP/1.1 https://servicewechat.com/wxeb7e49cf5ed6429a/devtools/page-frame.html Mac OS X (iPhone)
192.168.146.3 - - 200 1974 64 1594643390000 /bike/findNearAll HTTP/1.1 https://servicewechat.com/wxeb7e49cf5ed6429a/devtools/page-frame.html Mac OS X (iPhone)
192.168.146.9 - - 200 1974 67 1594297788000 /bike/onLogin HTTP/1.1 https://servicewechat.com/wxeb7e49cf5ed6429a/devtools/page-frame.html Mac OS X (iPhone)
192.168.146.3 - - 200 1974 67 1594556988000 /bike/findNearAll HTTP/1.1 https://servicewechat.com/wxeb7e49cf5ed6429a/devtools/page-frame.html Mac OS X (iPhone)
未能按照remoteIp分桶