HIVE-排序、分区、cluster by/distribute by/sort by集群查询、bucket分桶抽样查询、concat、collect_list/concat_ws 行转列查询

23 篇文章 0 订阅
15 篇文章 0 订阅

1.启动dfs、yarn、hiveserver2




 
  
  [cevent@hadoop207 ~]$ cd
  /opt/module/hadoop-2.7.2/
  [cevent@hadoop207 hadoop-2.7.2]$ ll
  总用量 60
  drwxr-xr-x. 2 cevent cevent  4096 5月  22 2017 bin
  drwxrwxr-x. 3 cevent cevent  4096 4月  30 14:16 data
  drwxr-xr-x. 3 cevent cevent  4096 5月  22 2017 etc
  drwxr-xr-x. 2 cevent cevent  4096 5月  22 2017 include
  drwxr-xr-x. 3 cevent cevent  4096 5月  22 2017 lib
  drwxr-xr-x. 2 cevent cevent  4096 5月  22 2017 libexec
  -rw-r--r--. 1 cevent cevent 15429 5月  22 2017 LICENSE.txt
  drwxrwxr-x. 3 cevent cevent  4096 5月  23 20:33 logs
  -rw-r--r--. 1 cevent cevent   101 5月  22 2017 NOTICE.txt
  -rw-r--r--. 1 cevent cevent  1366 5月  22 2017 README.txt
  drwxr-xr-x. 2 cevent cevent  4096 5月  22 2017 sbin
  drwxr-xr-x. 4 cevent cevent  4096 5月  22 2017 share
  [cevent@hadoop207 hadoop-2.7.2]$ sbin/start-dfs.sh 
  Starting namenodes on
  [hadoop207.cevent.com]
  hadoop207.cevent.com: starting namenode,
  logging to
  /opt/module/hadoop-2.7.2/logs/hadoop-cevent-namenode-hadoop207.cevent.com.out
  hadoop207.cevent.com: starting datanode,
  logging to
  /opt/module/hadoop-2.7.2/logs/hadoop-cevent-datanode-hadoop207.cevent.com.out
  Starting secondary namenodes [hadoop207.cevent.com]
  hadoop207.cevent.com: starting
  secondarynamenode, logging to
  /opt/module/hadoop-2.7.2/logs/hadoop-cevent-secondarynamenode-hadoop207.cevent.com.out
  [cevent@hadoop207 hadoop-2.7.2]$ sbin/start-yarn.sh 
  starting yarn daemons
  starting resourcemanager, logging to
  /opt/module/hadoop-2.7.2/logs/yarn-cevent-resourcemanager-hadoop207.cevent.com.out
  hadoop207.cevent.com: starting
  nodemanager, logging to
  /opt/module/hadoop-2.7.2/logs/yarn-cevent-nodemanager-hadoop207.cevent.com.out
  [cevent@hadoop207 hadoop-2.7.2]$ jps
  3384 DataNode
  3765 ResourceManager
  3592 SecondaryNameNode
  3879 NodeManager
  3272 NameNode
  3959 Jps
  
 





 
  
  [cevent@hadoop207 ~]$ cd
  /opt/module/hive-1.2.1/
  [cevent@hadoop207 hive-1.2.1]$ ll
  总用量 524
  drwxrwxr-x. 3 cevent cevent   4096 4月  30 15:59 bin
  drwxrwxr-x. 2 cevent cevent   4096 5月   9 18:40 conf
  -rw-rw-r--. 1 cevent cevent  20403 5月  19 13:19 derby.log
  drwxrwxr-x. 4 cevent cevent   4096 4月  30 15:59 examples
  drwxrwxr-x. 7 cevent cevent   4096 4月  30 15:59 hcatalog
  -rw-rw-r--. 1 cevent cevent     23 5月   9 13:37 hive01.sql
  drwxrwxr-x. 4 cevent cevent   4096 5月   7 13:51 lib
  -rw-rw-r--. 1 cevent cevent  24754 4月  30 2015 LICENSE
  drwxrwxr-x. 2 cevent cevent   4096 5月  23 20:33 logs
  drwxrwxr-x. 5 cevent cevent   4096 4月  30 16:44 metastore_db
  -rw-rw-r--. 1 cevent cevent    397 6月  19 2015 NOTICE
  -rw-rw-r--. 1 cevent cevent   4366 6月  19 2015 README.txt
  -rw-rw-r--. 1 cevent cevent 421129 6月  19 2015 RELEASE_NOTES.txt
  -rw-rw-r--. 1 cevent cevent     11 5月   9 13:27 result.txt
  drwxrwxr-x. 3 cevent cevent   4096 4月  30 15:59 scripts
  -rw-rw-r--. 1 cevent cevent    171 5月   9 13:24 server.log
  -rw-rw-r--. 1 cevent cevent      5 5月   8 14:05 server.pid
  -rw-rw-r--. 1 cevent cevent    326 5月  22 13:34 student.txt
  [cevent@hadoop207 hive-1.2.1]$ bin/hiveserver2 
  OK
  
 


2.排序查询




 
  
  [cevent@hadoop207 ~]$ cd
  /opt/module/hive-1.2.1/
  [cevent@hadoop207 hive-1.2.1]$ ll
  总用量 524
  drwxrwxr-x. 3 cevent cevent   4096 4月  30 15:59 bin
  drwxrwxr-x. 2 cevent cevent   4096 5月   9 18:40 conf
  -rw-rw-r--. 1 cevent cevent  20403 5月  19 13:19 derby.log
  drwxrwxr-x. 4 cevent cevent   4096 4月  30 15:59 examples
  drwxrwxr-x. 7 cevent cevent   4096 4月  30 15:59 hcatalog
  -rw-rw-r--. 1 cevent cevent     23 5月   9 13:37 hive01.sql
  drwxrwxr-x. 4 cevent cevent   4096 5月   7 13:51 lib
  -rw-rw-r--. 1 cevent cevent  24754 4月  30 2015 LICENSE
  drwxrwxr-x. 2 cevent cevent   4096 5月  24 13:22 logs
  drwxrwxr-x. 5 cevent cevent   4096 4月  30 16:44 metastore_db
  -rw-rw-r--. 1 cevent cevent    397 6月  19 2015 NOTICE
  -rw-rw-r--. 1 cevent cevent   4366 6月  19 2015 README.txt
  -rw-rw-r--. 1 cevent cevent 421129 6月  19 2015 RELEASE_NOTES.txt
  -rw-rw-r--. 1 cevent cevent     11 5月   9 13:27 result.txt
  drwxrwxr-x. 3 cevent cevent   4096 4月  30 15:59 scripts
  -rw-rw-r--. 1 cevent cevent    171 5月   9 13:24 server.log
  -rw-rw-r--. 1 cevent cevent      5 5月   8 14:05 server.pid
  -rw-rw-r--. 1 cevent cevent    326 5月  22 13:34 student.txt
  [cevent@hadoop207 hive-1.2.1]$ bin/beeline 
  Beeline version 1.2.1 by Apache Hive
  beeline>
  !connect jdbc:hive2://hadoop207.cevent.com:10000
  Connecting to
  jdbc:hive2://hadoop207.cevent.com:10000
  Enter username for
  jdbc:hive2://hadoop207.cevent.com:10000: cevent
  Enter password for
  jdbc:hive2://hadoop207.cevent.com:10000: ******
  Connected to: Apache Hive (version 1.2.1)
  Driver: Hive JDBC (version 1.2.1)
  Transaction isolation:
  TRANSACTION_REPEATABLE_READ
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> use
  cevent01;
  No rows affected (2.165 seconds)
  0: jdbc:hive2://hadoop207.cevent.com:10000> select * from c_emp order by empno desc;  根据员工id排序
  INFO 
  : Number of reduce tasks determined at compile time: 1
  INFO 
  : In order to change the average load for a reducer (in bytes):
  INFO 
  :   set hive.exec.reducers.bytes.per.reducer=<number>
  INFO 
  : In order to limit the maximum number of reducers:
  INFO 
  :   set
  hive.exec.reducers.max=<number>
  INFO 
  : In order to set a constant number of reducers:
  INFO 
  :   set
  mapreduce.job.reduces=<number>
  INFO 
  : number of splits:1
  INFO 
  : Submitting tokens for job: job_1590297732730_0001
  INFO 
  : The url to track the job:
  http://hadoop207.cevent.com:8088/proxy/application_1590297732730_0001/
  INFO 
  : Starting Job = job_1590297732730_0001, Tracking URL =
  http://hadoop207.cevent.com:8088/proxy/application_1590297732730_0001/
  INFO 
  : Kill Command = /opt/module/hadoop-2.7.2/bin/hadoop job  -kill job_1590297732730_0001
  INFO 
  : Hadoop job information for Stage-1: number of mappers: 1; number of
  reducers: 1
  INFO 
  : 2020-05-24 13:27:09,233 Stage-1 map = 0%,  reduce = 0%
  INFO 
  : 2020-05-24 13:27:22,497 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.2 sec
  INFO 
  : 2020-05-24 13:27:30,990 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 3.05 sec
  INFO 
  : MapReduce Total cumulative CPU time: 3 seconds 50 msec
  INFO 
  : Ended Job = job_1590297732730_0001
  +--------------+--------------+-------------+----------------+-----------------+---------------+----------------+--------------+--+
  | c_emp.empno  |
  c_emp.ename  | c_emp.jobs  | c_emp.manager  | c_emp.hiredata  | c_emp.salary  | c_emp.comment  | c_emp.depno  |
  +--------------+--------------+-------------+----------------+-----------------+---------------+----------------+--------------+--+
  | 7934         | MILLER       | CLERK       | 7782           | 1982-1-23       | 1300.0        | NULL           | 10           |
  | 7902         | FORD         | ANALYST     | 7566           | 1981-12-3       | 3000.0        | NULL           | 20           |
  | 7900         | JAMES        | CLERK       | 7698           | 1981-12-3       | 950.0         | NULL           | 30           |
  | 7876         | ADAMS        | CLERK       | 7788           | 1987-5-23       | 1100.0        | NULL           | 20           |
  | 7844         | TURNER       | SALESMAN    | 7698  
          | 1981-9-8        | 1500.0        | 0.0            | 30           |
  | 7839         | KING         | PRESIDENT   | NULL           | 1981-11-17      | 5000.0        | NULL           | 10           |
  | 7788         | SCOTT        | ANALYST     | 7566           | 1987-4-19       | 3000.0        | NULL           | 20           |
  | 7782         | CLARK        | MANAGER     | 7839           | 1981-6-9        | 2450.0        | NULL           | 10           |
  | 7698         | BLAKE        | MANAGER     | 7839           | 1981-5-1        | 2850.0        | NULL           | 30           |
  | 7654         | MARTIN       | SALESMAN    | 7698           | 1981-9-28       | 1250.0        | 1400.0         | 30           |
  | 7566         | JONES        | MANAGER     | 7839           | 1981-4-2        | 2975.0        | NULL           | 20           |
  | 7521         | WARD         | SALESMAN    | 7698           | 1981-2-22       | 1250.0        | 500.0          | 30           |
  | 7499         | ALLEN        | SALESMAN    | 7698           | 1981-2-20       | 1600.0        | 300.0          | 30           |
  | 7369         | SMITH        | CLERK       | 7902           | 1980-12-17      | 800.0         | NULL           | 20           |
  | NULL         | cevent       | NULL        | NULL           | NULL            | NULL          | NULL           | 619          |
  +--------------+--------------+-------------+----------------+-----------------+---------------+----------------+--------------+--+
  15 rows selected (42.862 seconds)
  0: jdbc:hive2://hadoop207.cevent.com:10000>
  select ename,salary*2 double_sal from emp order by
  double_sal; 别名排序
  Error: Error while compiling statement:
  FAILED: SemanticException [Error 10001]: Line 1:38 Table not found 'emp'
  (state=42S02,code=10001)
  0: jdbc:hive2://hadoop207.cevent.com:10000>
  select ename,salary*2 double_sal from c_emp order by double_sal;
  INFO 
  : Number of reduce tasks determined at compile time: 1
  INFO 
  : In order to change the average load for a reducer (in bytes):
  INFO 
  :   set hive.exec.reducers.bytes.per.reducer=<number>
  INFO 
  : In order to limit the maximum number of reducers:
  INFO 
  :   set
  hive.exec.reducers.max=<number>
  INFO 
  : In order to set a constant number of reducers:
  INFO 
  :   set
  mapreduce.job.reduces=<number>
  INFO 
  : number of splits:1
  INFO 
  : Submitting tokens for job: job_1590297732730_0002
  INFO 
  : The url to track the job:
  http://hadoop207.cevent.com:8088/proxy/application_1590297732730_0002/
  INFO 
  : Starting Job = job_1590297732730_0002, Tracking URL =
  http://hadoop207.cevent.com:8088/proxy/application_1590297732730_0002/
  INFO 
  : Kill Command = /opt/module/hadoop-2.7.2/bin/hadoop job  -kill job_1590297732730_0002
  INFO 
  : Hadoop job information for Stage-1: number of mappers: 1; number of
  reducers: 1
  INFO 
  : 2020-05-24 13:30:49,697 Stage-1 map = 0%,  reduce = 0%
  INFO 
  : 2020-05-24 13:30:58,621 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.91 sec
  INFO 
  : 2020-05-24 13:31:08,017 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 3.46 sec
  INFO 
  : MapReduce Total cumulative CPU time: 3 seconds 460 msec
  INFO 
  : Ended Job = job_1590297732730_0002
  +---------+-------------+--+
  | 
  ename  | double_sal  |
  +---------+-------------+--+
  | cevent 
  | NULL        |
  | SMITH  
  | 1600.0      |
  | JAMES  
  | 1900.0      |
  | ADAMS  
  | 2200.0      |
  | WARD   
  | 2500.0      |
  | MARTIN 
  | 2500.0      |
  | MILLER 
  | 2600.0      |
  | TURNER 
  | 3000.0      |
  | ALLEN  
  | 3200.0      |
  | CLARK  
  | 4900.0      |
  | BLAKE  
  | 5700.0      |
  | JONES  
  | 5950.0      |
  | SCOTT  
  | 6000.0      |
  | FORD   
  | 6000.0      |
  | KING   
  | 10000.0     |
  +---------+-------------+--+
  15 rows selected (29.1 seconds)
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> select
  ename,depno,salary from c_emp order by depno,salary; 先根据no后salary
  INFO 
  : Number of reduce tasks determined at compile time: 1
  INFO 
  : In order to change the average load for a reducer (in bytes):
  INFO 
  :   set
  hive.exec.reducers.bytes.per.reducer=<number>
  INFO 
  : In order to limit the maximum number of reducers:
  INFO 
  :   set hive.exec.reducers.max=<number>
  INFO 
  : In order to set a constant number of reducers:
  INFO 
  :   set
  mapreduce.job.reduces=<number>
  INFO 
  : number of splits:1
  INFO 
  : Submitting tokens for job: job_1590297732730_0003
  INFO 
  : The url to track the job: http://hadoop207.cevent.com:8088/proxy/application_1590297732730_0003/
  INFO 
  : Starting Job = job_1590297732730_0003, Tracking URL =
  http://hadoop207.cevent.com:8088/proxy/application_1590297732730_0003/
  INFO 
  : Kill Command = /opt/module/hadoop-2.7.2/bin/hadoop job  -kill job_1590297732730_0003
  INFO 
  : Hadoop job information for Stage-1: number of mappers: 1; number of
  reducers: 1
  INFO 
  : 2020-05-24 13:32:35,476 Stage-1 map = 0%,  reduce = 0%
  INFO 
  : 2020-05-24 13:32:42,899 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.25 sec
  INFO 
  : 2020-05-24 13:32:50,116 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 2.75 sec
  INFO 
  : MapReduce Total cumulative CPU time: 2 seconds 750 msec
  INFO 
  : Ended Job = job_1590297732730_0003
  +---------+--------+---------+--+
  |  ename 
  | depno  | salary  |
  +---------+--------+---------+--+
  | MILLER 
  | 10     | 1300.0  |
  | CLARK  
  | 10     | 2450.0  |
  | KING   
  | 10     | 5000.0  |
  | SMITH  
  | 20     | 800.0   |
  | ADAMS  
  | 20     | 1100.0  |
  | JONES  
  | 20     | 2975.0  |
  | SCOTT  
  | 20     | 3000.0  |
  | FORD   
  | 20     | 3000.0  |
  | JAMES  
  | 30     | 950.0   |
  | MARTIN 
  | 30     | 1250.0  |
  | WARD   
  | 30     | 1250.0  |
  | TURNER 
  | 30     | 1500.0  |
  | ALLEN  
  | 30     | 1600.0  |
  | BLAKE  
  | 30     | 2850.0  |
  | cevent 
  | 619    | NULL    |
  +---------+--------+---------+--+
  15 rows selected (24.087 seconds)
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> set
  mapreduce.job.reduces=3;  设置reduce数量
  No rows affected (0.327 seconds)
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> select
  * from c_emp sort by depno desc; 局部排序sort by
  INFO 
  : Number of reduce tasks not specified. Defaulting to jobconf value
  of: 3
  INFO 
  : In order to change the average load for a reducer (in bytes):
  INFO 
  :   set
  hive.exec.reducers.bytes.per.reducer=<number>
  INFO 
  : In order to limit the maximum number of reducers:
  INFO 
  :   set
  hive.exec.reducers.max=<number>
  INFO 
  : In order to set a constant number of reducers:
  INFO 
  :   set
  mapreduce.job.reduces=<number>
  INFO 
  : number of splits:1
  INFO 
  : Submitting tokens for job: job_1590297732730_0004
  INFO 
  : The url to track the job:
  http://hadoop207.cevent.com:8088/proxy/application_1590297732730_0004/
  INFO 
  : Starting Job = job_1590297732730_0004, Tracking URL =
  http://hadoop207.cevent.com:8088/proxy/application_1590297732730_0004/
  INFO 
  : Kill Command = /opt/module/hadoop-2.7.2/bin/hadoop job  -kill job_1590297732730_0004
  INFO 
  : Hadoop job information for Stage-1: number of mappers: 1; number of
  reducers: 3
  INFO 
  : 2020-05-24 13:37:53,832 Stage-1 map = 0%,  reduce = 0%
  INFO 
  : 2020-05-24 13:38:02,135 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.31 sec
  INFO 
  : 2020-05-24 13:38:15,943 Stage-1 map = 100%,  reduce = 67%, Cumulative CPU 5.47 sec
  INFO 
  : 2020-05-24 13:38:17,773 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 7.28 sec
  INFO 
  : MapReduce Total cumulative CPU time: 7 seconds 280 msec
  INFO 
  : Ended Job = job_1590297732730_0004
  +--------------+--------------+-------------+----------------+-----------------+---------------+----------------+--------------+--+
  | c_emp.empno  |
  c_emp.ename  | c_emp.jobs  | c_emp.manager  | c_emp.hiredata  | c_emp.salary  | c_emp.comment  | c_emp.depno  |
  +--------------+--------------+-------------+----------------+-----------------+---------------+----------------+--------------+--+
  | 7654         | MARTIN       | SALESMAN    | 7698           | 1981-9-28       | 1250.0        | 1400.0         | 30           |
  | 7698         | BLAKE        | MANAGER     | 7839           | 1981-5-1        | 2850.0        | NULL           | 30           |
  | 7788         | SCOTT        | ANALYST     | 7566           | 1987-4-19       | 3000.0        | NULL           | 20           |
  | 7566         | JONES        | MANAGER     | 7839           | 1981-4-2        | 2975.0        | NULL           | 20           |
  | 7934         | MILLER       | CLERK       | 7782           | 1982-1-23       | 1300.0        | NULL           | 10           |
  | 7839         | KING         | PRESIDENT   | NULL           | 1981-11-17      | 5000.0        | NULL           | 10           |
  | 7782         | CLARK        | MANAGER     | 7839           | 1981-6-9        | 2450.0        | NULL           | 10           |
  | 7521         | WARD         | SALESMAN    | 7698           | 1981-2-22       | 1250.0        | 500.0          | 30           |
  | 7499         | ALLEN        | SALESMAN    | 7698           | 1981-2-20       | 1600.0        | 300.0          | 30           |
  | 7844         | TURNER       | SALESMAN    | 7698           | 1981-9-8        | 1500.0        | 0.0            | 30           |
  | 7876         | ADAMS        | CLERK       | 7788           | 1987-5-23       | 1100.0        | NULL           | 20           |
  | 7902         | FORD         | ANALYST     | 7566           | 1981-12-3       | 3000.0        | NULL           | 20           |
  | 7369         | SMITH        | CLERK       | 7902           | 1980-12-17      | 800.0         | NULL           | 20           |
  | NULL         | cevent       | NULL        | NULL           | NULL            | NULL          | NULL           | 619          |
  | 7900         | JAMES        | CLERK       | 7698           | 1981-12-3       | 950.0         | NULL           | 30           |
  +--------------+--------------+-------------+----------------+-----------------+---------------+----------------+--------------+--+
  15 rows selected (38.167 seconds)
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> select
  * from c_emp distribute by depno sort by empno desc; 先分区后排序
  INFO 
  : Number of reduce tasks not specified. Defaulting to jobconf value
  of: 3
  INFO 
  : In order to change the average load for a reducer (in bytes):
  INFO 
  :   set
  hive.exec.reducers.bytes.per.reducer=<number>
  INFO 
  : In order to limit the maximum number of reducers:
  INFO 
  :   set hive.exec.reducers.max=<number>
  INFO 
  : In order to set a constant number of reducers:
  INFO 
  :   set
  mapreduce.job.reduces=<number>
  INFO 
  : number of splits:1
  INFO 
  : Submitting tokens for job: job_1590297732730_0005
  INFO 
  : The url to track the job: http://hadoop207.cevent.com:8088/proxy/application_1590297732730_0005/
  INFO 
  : Starting Job = job_1590297732730_0005, Tracking URL =
  http://hadoop207.cevent.com:8088/proxy/application_1590297732730_0005/
  INFO 
  : Kill Command = /opt/module/hadoop-2.7.2/bin/hadoop job  -kill job_1590297732730_0005
  INFO 
  : Hadoop job information for Stage-1: number of mappers: 1; number of
  reducers: 3
  INFO 
  : 2020-05-24 13:42:37,342 Stage-1 map = 0%,  reduce = 0%
  INFO 
  : 2020-05-24 13:42:43,648 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.24 sec
  INFO 
  : 2020-05-24 13:42:55,154 Stage-1 map = 100%,  reduce = 22%, Cumulative CPU 2.38 sec
  INFO 
  : 2020-05-24 13:43:02,071 Stage-1 map = 100%,  reduce = 44%, Cumulative CPU 3.19 sec
  INFO 
  : 2020-05-24 13:43:04,632 Stage-1 map = 100%,  reduce = 67%, Cumulative CPU 5.35 sec
  INFO 
  : 2020-05-24 13:43:06,406 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 7.12 sec
  INFO 
  : MapReduce Total cumulative CPU time: 7 seconds 120 msec
  INFO 
  : Ended Job = job_1590297732730_0005
  +--------------+--------------+-------------+----------------+-----------------+---------------+----------------+--------------+--+
  | c_emp.empno  |
  c_emp.ename  | c_emp.jobs  | c_emp.manager  | c_emp.hiredata  | c_emp.salary  | c_emp.comment  | c_emp.depno  |
  +--------------+--------------+-------------+----------------+-----------------+---------------+----------------+--------------+--+
  | 7900         | JAMES        | CLERK       | 7698           | 1981-12-3       | 950.0         | NULL           | 30           |
  | 7844         | TURNER       | SALESMAN    | 7698           | 1981-9-8        | 1500.0        | 0.0            | 30           |
  | 7698         | BLAKE        | MANAGER     | 7839           | 1981-5-1        | 2850.0        | NULL           | 30           |
  | 7654         | MARTIN       | SALESMAN    | 7698           | 1981-9-28       | 1250.0        | 1400.0         | 30           |
  | 7521         | WARD         | SALESMAN    | 7698           | 1981-2-22       | 1250.0        | 500.0          | 30           |
  | 7499         | ALLEN        | SALESMAN    | 7698           | 1981-2-20       | 1600.0        | 300.0          | 30           |
  | 7934         | MILLER       | CLERK       | 7782           | 1982-1-23       | 1300.0        | NULL           | 10           |
  | 7839         | KING         | PRESIDENT   | NULL           | 1981-11-17      | 5000.0        | NULL           | 10           |
  | 7782         | CLARK        | MANAGER     | 7839           | 1981-6-9        | 2450.0        | NULL           | 10           |
  | NULL         | cevent       | NULL        | NULL           | NULL            | NULL          | NULL           | 619          |
  | 7902         | FORD         | ANALYST     | 7566           | 1981-12-3       | 3000.0        | NULL           | 20           |
  | 7876         | ADAMS        | CLERK       | 7788           | 1987-5-23       | 1100.0        | NULL           | 20           |
  | 7788         | SCOTT        | ANALYST     | 7566           | 1987-4-19       | 3000.0        | NULL           | 20           |
  | 7566         | JONES        | MANAGER     | 7839           | 1981-4-2        | 2975.0        | NULL           | 20           |
  | 7369         | SMITH        | CLERK       | 7902           | 1980-12-17      | 800.0         | NULL           | 20           |
  +--------------+--------------+-------------+----------------+-----------------+---------------+----------------+--------------+--+
  15 rows selected (51.113 seconds)
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> select
  * from c_emp cluster by empno;  
  当distribute by 和sort by字段相同时,可以采用cluster
  by方式
  完全相等的结果SQL,cluster by只有升序排序
  INFO 
  : Number of reduce tasks not specified. Defaulting to jobconf value
  of: 3
  INFO 
  : In order to change the average load for a reducer (in bytes):
  INFO 
  :   set
  hive.exec.reducers.bytes.per.reducer=<number>
  INFO 
  : In order to limit the maximum number of reducers:
  INFO 
  :   set
  hive.exec.reducers.max=<number>
  INFO 
  : In order to set a constant number of reducers:
  INFO 
  :   set mapreduce.job.reduces=<number>
  INFO 
  : number of splits:1
  INFO 
  : Submitting tokens for job: job_1590297732730_0006
  INFO 
  : The url to track the job:
  http://hadoop207.cevent.com:8088/proxy/application_1590297732730_0006/
  INFO 
  : Starting Job = job_1590297732730_0006, Tracking URL =
  http://hadoop207.cevent.com:8088/proxy/application_1590297732730_0006/
  INFO 
  : Kill Command = /opt/module/hadoop-2.7.2/bin/hadoop job  -kill job_1590297732730_0006
  INFO 
  : Hadoop job information for Stage-1: number of mappers: 1; number of
  reducers: 3
  INFO 
  : 2020-05-24 13:50:47,150 Stage-1 map = 0%,  reduce = 0%
  INFO 
  : 2020-05-24 13:50:53,522 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.2 sec
  INFO 
  : 2020-05-24 13:51:05,258 Stage-1 map = 100%,  reduce = 33%, Cumulative CPU 3.79 sec
  INFO 
  : 2020-05-24 13:51:06,391 Stage-1 map = 100%,  reduce = 67%, Cumulative CPU 5.89 sec
  INFO 
  : 2020-05-24 13:51:07,423 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 7.54 sec
  INFO 
  : MapReduce Total cumulative CPU time: 7 seconds 540 msec
  INFO 
  : Ended Job = job_1590297732730_0006
  +--------------+--------------+-------------+----------------+-----------------+---------------+----------------+--------------+--+
  | c_emp.empno  |
  c_emp.ename  | c_emp.jobs  | c_emp.manager  | c_emp.hiredata  | c_emp.salary  | c_emp.comment  | c_emp.depno  |
  +--------------+--------------+-------------+----------------+-----------------+---------------+----------------+--------------+--+
  | NULL         | cevent       | NULL        | NULL           | NULL            | NULL          | NULL           | 619          |
  | 7521         | WARD         | SALESMAN    | 7698           | 1981-2-22       | 1250.0        | 500.0          | 30           |
  | 7566         | JONES        | MANAGER     | 7839           | 1981-4-2        | 2975.0        | NULL           | 20           |
  | 7698         | BLAKE        | MANAGER     | 7839           | 1981-5-1        | 2850.0        | NULL           | 30           |
  | 7782         | CLARK        | MANAGER     | 7839           | 1981-6-9        | 2450.0        | NULL           | 10           |
  | 7788         | SCOTT        | ANALYST     | 7566           | 1987-4-19       | 3000.0        | NULL           | 20           |
  | 7839         | KING         | PRESIDENT   | NULL           | 1981-11-17      | 5000.0        | NULL           | 10           |
  | 7902         | FORD         | ANALYST     | 7566           | 1981-12-3       | 3000.0        | NULL           | 20           |
  | 7369         | SMITH        | CLERK       | 7902           | 1980-12-17      |
  800.0         | NULL           | 20           |
  | 7654         | MARTIN       | SALESMAN    | 7698           | 1981-9-28       | 1250.0        | 1400.0         | 30           |
  | 7876         | ADAMS        | CLERK       | 7788           | 1987-5-23       | 1100.0        | NULL           | 20           |
  | 7900         | JAMES        | CLERK       | 7698           | 1981-12-3       | 950.0         | NULL           | 30           |
  | 7499         | ALLEN        | SALESMAN    | 7698           | 1981-2-20       | 1600.0        | 300.0          | 30           |
  | 7844         | TURNER       | SALESMAN    | 7698           | 1981-9-8        | 1500.0        | 0.0            | 30           |
  | 7934         | MILLER       | CLERK       | 7782           | 1982-1-23       | 1300.0        | NULL           | 10           |
  +--------------+--------------+-------------+----------------+-----------------+---------------+----------------+--------------+--+
  15 rows selected (39.574 seconds)
  0: jdbc:hive2://hadoop207.cevent.com:10000> select * from c_emp distribute by empno sort by empno;
  INFO 
  : Number of reduce tasks not specified. Defaulting to jobconf value
  of: 3
  INFO 
  : In order to change the average load for a reducer (in bytes):
  INFO 
  :   set
  hive.exec.reducers.bytes.per.reducer=<number>
  INFO 
  : In order to limit the maximum number of reducers:
  INFO 
  :   set
  hive.exec.reducers.max=<number>
  INFO 
  : In order to set a constant number of reducers:
  INFO 
  :   set
  mapreduce.job.reduces=<number>
  INFO 
  : number of splits:1
  INFO 
  : Submitting tokens for job: job_1590297732730_0007
  INFO 
  : The url to track the job:
  http://hadoop207.cevent.com:8088/proxy/application_1590297732730_0007/
  INFO 
  : Starting Job = job_1590297732730_0007, Tracking URL =
  http://hadoop207.cevent.com:8088/proxy/application_1590297732730_0007/
  INFO 
  : Kill Command = /opt/module/hadoop-2.7.2/bin/hadoop job  -kill job_1590297732730_0007
  INFO 
  : Hadoop job information for Stage-1: number of mappers: 1; number of
  reducers: 3
  INFO 
  : 2020-05-24 13:51:52,774 Stage-1 map = 0%,  reduce = 0%
  INFO 
  : 2020-05-24 13:51:57,999 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.99 sec
  INFO 
  : 2020-05-24 13:52:12,405 Stage-1 map = 100%,  reduce = 44%, Cumulative CPU 3.8 sec
  INFO 
  : 2020-05-24 13:52:17,683 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 7.61 sec
  INFO 
  : MapReduce Total cumulative CPU time: 7 seconds 610 msec
  INFO 
  : Ended Job = job_1590297732730_0007
  +--------------+--------------+-------------+----------------+-----------------+---------------+----------------+--------------+--+
  | c_emp.empno  |
  c_emp.ename  | c_emp.jobs  | c_emp.manager  | c_emp.hiredata  | c_emp.salary  | c_emp.comment  | c_emp.depno  |
  +--------------+--------------+-------------+----------------+-----------------+---------------+----------------+--------------+--+
  | NULL         | cevent       | NULL        | NULL           | NULL            | NULL          | NULL           | 619          |
  | 7521         | WARD         | SALESMAN    | 7698           | 1981-2-22       | 1250.0        | 500.0          | 30           |
  | 7566         | JONES        | MANAGER     | 7839           | 1981-4-2        | 2975.0        | NULL           | 20           |
  | 7698         | BLAKE        | MANAGER     | 7839           | 1981-5-1        | 2850.0        | NULL           | 30           |
  | 7782         | CLARK        | MANAGER     | 7839           | 1981-6-9        | 2450.0        | NULL           | 10           |
  | 7788         | SCOTT        | ANALYST     | 7566           | 1987-4-19       | 3000.0        | NULL           | 20           |
  | 7839         | KING         | PRESIDENT   | NULL           | 1981-11-17      | 5000.0        | NULL           | 10           |
  | 7902         | FORD         | ANALYST     | 7566           | 1981-12-3       | 3000.0        | NULL           | 20           |
  | 7369         | SMITH        | CLERK       | 7902           | 1980-12-17      | 800.0         | NULL           | 20           |
  | 7654         | MARTIN       | SALESMAN    | 7698           | 1981-9-28       | 1250.0        | 1400.0         | 30           |
  | 7876         | ADAMS        | CLERK       | 7788           | 1987-5-23       | 1100.0        | NULL           | 20           |
  | 7900         | JAMES        | CLERK       | 7698           | 1981-12-3       | 950.0         | NULL           | 30           |
  | 7499         | ALLEN        | SALESMAN    | 7698           | 1981-2-20       | 1600.0        | 300.0          | 30           |
  | 7844         | TURNER       | SALESMAN    | 7698           | 1981-9-8        | 1500.0        | 0.0            | 30           |
  | 7934         | MILLER       | CLERK       | 7782           | 1982-1-23       | 1300.0        | NULL           | 10           |
  +--------------+--------------+-------------+----------------+-----------------+---------------+----------------+--------------+--+
  15 rows selected (45.339 seconds)
  
 


3.分桶查询cluser



[cevent@hadoop207 ~]$ cd /opt/module/hive-1.2.1/

[cevent@hadoop207 hive-1.2.1]$ ll

总用量 524

drwxrwxr-x. 3 cevent cevent   4096 4月  30 15:59 bin

drwxrwxr-x. 2 cevent cevent   4096 5月   9 18:40 conf

-rw-rw-r--. 1 cevent cevent  20403 5月  19 13:19 derby.log

drwxrwxr-x. 4 cevent cevent   4096 4月  30 15:59 examples

drwxrwxr-x. 7 cevent cevent   4096 4月  30 15:59 hcatalog

-rw-rw-r--. 1 cevent cevent     23 5月   9 13:37 hive01.sql

drwxrwxr-x. 4 cevent cevent   4096 5月   7 13:51 lib

-rw-rw-r--. 1 cevent cevent  24754 4月  30 2015 LICENSE

drwxrwxr-x. 2 cevent cevent   4096 5月  25 13:21 logs

drwxrwxr-x. 5 cevent cevent   4096 4月  30 16:44 metastore_db

-rw-rw-r--. 1 cevent cevent    397 6月  19 2015 NOTICE

-rw-rw-r--. 1 cevent cevent   4366 6月  19 2015 README.txt

-rw-rw-r--. 1 cevent cevent 421129 6月 
19 2015 RELEASE_NOTES.txt

-rw-rw-r--. 1 cevent cevent     11 5月   9 13:27 result.txt

drwxrwxr-x. 3 cevent cevent   4096 4月  30 15:59 scripts

-rw-rw-r--. 1 cevent cevent    171 5月   9 13:24 server.log

-rw-rw-r--. 1 cevent cevent      5 5月   8 14:05 server.pid

-rw-rw-r--. 1 cevent cevent    326 5月  22 13:34 student.txt

[cevent@hadoop207 hive-1.2.1]$ bin/beeline 

Beeline version 1.2.1 by Apache Hive

beeline> !connect
jdbc:hive2://hadoop207.cevent.com:10000

Connecting to
jdbc:hive2://hadoop207.cevent.com:10000

Enter username for
jdbc:hive2://hadoop207.cevent.com:10000: cevent

Enter password for
jdbc:hive2://hadoop207.cevent.com:10000: ******

Connected to: Apache Hive (version 1.2.1)

Driver: Hive JDBC (version 1.2.1)

Transaction isolation:
TRANSACTION_REPEATABLE_READ 创建分桶表

0:
jdbc:hive2://hadoop207.cevent.com:10000> create
table student_bucket(id int,name string)

0:
jdbc:hive2://hadoop207.cevent.com:10000> clustered
by(id)

0: jdbc:hive2://hadoop207.cevent.com:10000>
into 4 buckets

0:
jdbc:hive2://hadoop207.cevent.com:10000> row
format delimited fields terminated by '\t';

No rows affected (2.565 seconds)

0:
jdbc:hive2://hadoop207.cevent.com:10000> drop
table student_bucket;

No rows affected (1.049 seconds)

0:
jdbc:hive2://hadoop207.cevent.com:10000> use
cevent01;

No rows affected (0.039 seconds)

0:
jdbc:hive2://hadoop207.cevent.com:10000> show
databases;

+----------------+--+

| database_name  |

+----------------+--+

| cevent01       |

| default        |

+----------------+--+

2 rows selected (0.425 seconds)

0:
jdbc:hive2://hadoop207.cevent.com:10000> create
table student_bucket(id int,name string)

0:
jdbc:hive2://hadoop207.cevent.com:10000>
clustered by(id)  根据id分桶

0:
jdbc:hive2://hadoop207.cevent.com:10000> into 4
buckets

0:
jdbc:hive2://hadoop207.cevent.com:10000> row
format delimited fields terminated by '\t';

No rows affected (0.12 seconds)

0:
jdbc:hive2://hadoop207.cevent.com:10000> desc
formatted student_bucket;

+-------------------------------+----------------------------------------------------------------------------------+-----------------------+--+

|           col_name            |                                   
data_type                               
     |        comment        |

+-------------------------------+----------------------------------------------------------------------------------+-----------------------+--+

| # col_name                    |
data_type                                     
                                  |
comment               |

|                               |
NULL                                                                            
| NULL                  |

| id                            |
int                                                                              |                       |

| name                          |
string                                                                          
|                       |

|                               | NULL                                                                            
| NULL                  |

| # Detailed Table Information  |
NULL                                                                            
| NULL                  |

| Database:                     |
cevent01                                                                        
| NULL                  |

| Owner:                        |
cevent                                                                          
| NULL                  |

| CreateTime:                   |
Mon May 25 13:27:15 CST 2020                                                    
| NULL                  |

| LastAccessTime:               |
UNKNOWN                                                 
                        |
NULL                  |

| Protect Mode:                 |
None                                                                            
| NULL                  |

| Retention:                    |
0                                                                                |
NULL                  |

| Location:                     |
hdfs://hadoop207.cevent.com:8020/user/hive/warehouse/cevent01.db/student_bucket  | NULL                  |

| Table Type:                   |
MANAGED_TABLE                                                                   
| NULL                  |

| Table Parameters:             |
NULL                                                                            
| NULL                  |

|                               | transient_lastDdlTime                                                           
| 1590384435            |

|                               |
NULL                                                                            
| NULL                  |

| # Storage Information         |
NULL                                                                            
| NULL                  |

| SerDe Library:                |
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe                               | NULL                  |

| InputFormat:                  |
org.apache.hadoop.mapred.TextInputFormat                                         |
NULL                  |

| OutputFormat:                 |
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat                       | NULL                  |

| Compressed:                   |
No                                                                              
| NULL                  |

| Num Buckets:                  | 4                                                                                |
NULL                  |

| Bucket
Columns:               | [id]                                                                             |
NULL                  |

| Sort Columns:                 | []                                                                              
| NULL                  |

| Storage Desc Params:          |
NULL                                                                            
| NULL                  |

|                               |
field.delim                                                                     
| \t                    |

|                               |
serialization.format                                                          
  | \t                    |

+-------------------------------+----------------------------------------------------------------------------------+-----------------------+--+

28 rows selected (0.128 seconds)

0:
jdbc:hive2://hadoop207.cevent.com:10000> 

load data local inpath
'/opt/module/datas/student.txt' into table student_bucket;  分桶表实现分桶,不能直接加载数据

INFO 
: Loading data to table cevent01.student_bucket from
file:/opt/module/datas/student.txt

INFO 
: Table cevent01.student_bucket stats: [numFiles=1, totalSize=165]

No rows affected (1.183 seconds)

0:
jdbc:hive2://hadoop207.cevent.com:10000> truncate
table student_bucket; 清洗数据

No rows affected (0.3 seconds)

0:
jdbc:hive2://hadoop207.cevent.com:10000> show
tables;

+-----------------+--+

|   
tab_name     |

+-----------------+--+

| c_dept          |

| c_emp           |

| locations       |

| student_bucket  |

+-----------------+--+

4 rows selected (0.05 seconds)

0:
jdbc:hive2://hadoop207.cevent.com:10000> select * from locations;

+----------------------+---------------------------+--+

| locations.locations  | locations.locations_name  |

+----------------------+---------------------------+--+

| 1700                 | Beijing                   |

| 1800                 | London                    |

| 1900                 | Tokyo                     |

+----------------------+---------------------------+--+

3 rows selected (1.282 seconds)

0:
jdbc:hive2://hadoop207.cevent.com:10000> create
table student(id int,name string); 

非规格创建未实现row format delimited fields terminated by '\t'

No rows affected (0.086 seconds)

0:
jdbc:hive2://hadoop207.cevent.com:10000> show tables;

+-----------------+--+

|   
tab_name     |

+-----------------+--+

| c_dept          |

| c_emp           |

| locations       |

| student         |

| student_bucket  |

+-----------------+--+

5 rows selected (0.042 seconds)

0:
jdbc:hive2://hadoop207.cevent.com:10000>

 load data local inpath
'/opt/module/datas/student.txt' into table student;  不能正常显示数据

INFO 
: Loading data to table cevent01.student from
file:/opt/module/datas/student.txt

INFO 
: Table cevent01.student stats: [numFiles=1, totalSize=165]

No rows affected (0.286 seconds)

0:
jdbc:hive2://hadoop207.cevent.com:10000> select *
from student;

+-------------+---------------+--+

| student.id  | student.name  |

+-------------+---------------+--+

| NULL        | NULL          |

| NULL        | NULL          |

| NULL        | NULL          |

| NULL        | NULL          |

| NULL        | NULL          |

| NULL        | NULL          |

| NULL        | NULL          |

| NULL        | NULL          |

| NULL        | NULL          |

| NULL        | NULL          |

| NULL        | NULL          |

| NULL        | NULL          |

| NULL        | NULL          |

| NULL        | NULL          |

| NULL        | NULL          |

| NULL        | NULL          |

+-------------+---------------+--+

16 rows selected (0.101 seconds)

0:
jdbc:hive2://hadoop207.cevent.com:10000> drop table
student;

No rows affected (0.557 seconds)

0:
jdbc:hive2://hadoop207.cevent.com:10000> 

create table
student(id int,name string) row format delimited fields terminated by '\t';  创建规格表

No rows affected (0.102 seconds)

0:
jdbc:hive2://hadoop207.cevent.com:10000> 

load data local inpath
'/opt/module/datas/student.txt' into table student; 加载数据

INFO 
: Loading data to table cevent01.student from
file:/opt/module/datas/student.txt

INFO 
: Table cevent01.student stats: [numFiles=1, totalSize=165]

No rows affected (0.274 seconds)

0:
jdbc:hive2://hadoop207.cevent.com:10000> select *
from student;

+-------------+---------------+--+

| student.id  | student.name  |

+-------------+---------------+--+

| 1001 
      | ss1           |

| 1002        | ss2           |

| 1003        | ss3           |

| 1004        | ss4           |

| 1005        | ss5           |

| 1006        | ss6           |

| 1007        | ss7           |

| 1008        | ss8           |

| 1009 
      | ss9           |

| 1010        | ss10          |

| 1011        | ss11          |

| 1012        | ss12          |

| 1013        | ss13          |

| 1014        | ss14          |

| 1015        | ss15          |

| 1016        | ss16          |

+-------------+---------------+--+

16 rows selected (0.106 seconds)

0:
jdbc:hive2://hadoop207.cevent.com:10000> set
mapreduce.job.reduces=-1; 将reduces设置为-1,默认自动分发reduce数量

No rows affected (0.007 seconds)

0:
jdbc:hive2://hadoop207.cevent.com:10000> insert into
student_bucket select * from student; 根据查询结果插入分桶表

INFO 
: Number of reduce tasks is set to 0 since there's no reduce operator

INFO 
: number of splits:1

INFO 
: Submitting tokens for job: job_1590384013935_0001

INFO 
: The url to track the job: http://hadoop207.cevent.com:8088/proxy/application_1590384013935_0001/

INFO 
: Starting Job = job_1590384013935_0001, Tracking URL =
http://hadoop207.cevent.com:8088/proxy/application_1590384013935_0001/

INFO 
: Kill Command = /opt/module/hadoop-2.7.2/bin/hadoop job  -kill job_1590384013935_0001

INFO 
: Hadoop job information for Stage-1: number of mappers: 1; number of
reducers: 0

INFO 
: 2020-05-25 13:41:32,932 Stage-1 map = 0%,  reduce = 0%

INFO 
: 2020-05-25 13:41:44,335 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.52 sec

INFO 
: MapReduce Total cumulative CPU time: 1 seconds 520 msec

INFO 
: Ended Job = job_1590384013935_0001

INFO 
: Stage-4 is selected by condition resolver.

INFO 
: Stage-3 is filtered out by condition resolver.

INFO 
: Stage-5 is filtered out by condition resolver.

INFO 
: Moving data to:
hdfs://hadoop207.cevent.com:8020/user/hive/warehouse/cevent01.db/student_bucket/.hive-staging_hive_2020-05-25_13-41-16_219_3490270901530759228-1/-ext-10000
from hdfs://hadoop207.cevent.com:8020/user/hive/warehouse/cevent01.db/student_bucket/.hive-staging_hive_2020-05-25_13-41-16_219_3490270901530759228-1/-ext-10002

INFO 
: Loading data to table cevent01.student_bucket from
hdfs://hadoop207.cevent.com:8020/user/hive/warehouse/cevent01.db/student_bucket/.hive-staging_hive_2020-05-25_13-41-16_219_3490270901530759228-1/-ext-10000

INFO 
: Table cevent01.student_bucket stats: [numFiles=1, numRows=16,
totalSize=151, rawDataSize=135]

No rows affected (30.855 seconds)

0:
jdbc:hive2://hadoop207.cevent.com:10000> set
hive.enforce.bucketing=true; 强制分桶enforce

No rows affected (0.064 seconds)

0:
jdbc:hive2://hadoop207.cevent.com:10000> truncate
table student_bucket; 清洗数据

No rows affected (0.173 seconds)

0: jdbc:hive2://hadoop207.cevent.com:10000>
insert into student_bucket select * from student; 插入分桶表数据

INFO 
: Number of reduce tasks determined at compile time: 4

INFO 
: In order to change the average load for a reducer (in bytes):

INFO 
:   set
hive.exec.reducers.bytes.per.reducer=<number>

INFO 
: In order to limit the maximum number of reducers:

INFO 
:   set
hive.exec.reducers.max=<number>

INFO 
: In order to set a constant number of reducers:

INFO 
:   set
mapreduce.job.reduces=<number>

INFO 
: number of splits:1

INFO 
: Submitting tokens for job: job_1590384013935_0002

INFO 
: The url to track the job:
http://hadoop207.cevent.com:8088/proxy/application_1590384013935_0002/

INFO 
: Starting Job = job_1590384013935_0002, Tracking URL =
http://hadoop207.cevent.com:8088/proxy/application_1590384013935_0002/

INFO 
: Kill Command = /opt/module/hadoop-2.7.2/bin/hadoop job  -kill job_1590384013935_0002

INFO 
: Hadoop job information for Stage-1: number of mappers: 1; number of
reducers: 4

INFO 
: 2020-05-25 13:43:13,851 Stage-1 map = 0%,  reduce = 0%

INFO 
: 2020-05-25 13:43:24,823 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.15 sec

INFO 
: 2020-05-25 13:43:48,648 Stage-1 map = 100%,  reduce = 54%, Cumulative CPU 2.41 sec

INFO 
: 2020-05-25 13:43:55,963 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 8.49 sec

INFO 
: MapReduce Total cumulative CPU time: 8 seconds 490 msec

INFO 
: Ended Job = job_1590384013935_0002

INFO 
: Loading data to table cevent01.student_bucket from
hdfs://hadoop207.cevent.com:8020/user/hive/warehouse/cevent01.db/student_bucket/.hive-staging_hive_2020-05-25_13-43-04_971_5674088293092128498-1/-ext-10000

INFO 
: Table cevent01.student_bucket stats: [numFiles=4, numRows=16,
totalSize=151, rawDataSize=135]

No rows affected (69.532 seconds)

4.校验结果

访问链接:http://hadoop207.cevent.com:50070/explorer.html#/user/hive/warehouse/cevent01.db/student_bucket

分桶结果

5.根据id分桶抽样




 
  
  [cevent@hadoop207 ~]$ cd /opt/module/hive-1.2.1/
  [cevent@hadoop207 hive-1.2.1]$ ll
  总用量 524
  drwxrwxr-x. 3 cevent cevent   4096 4月  30 15:59 bin
  drwxrwxr-x. 2 cevent cevent   4096 5月   9 18:40 conf
  -rw-rw-r--. 1 cevent cevent  20403 5月  19 13:19 derby.log
  drwxrwxr-x. 4 cevent cevent   4096 4月  30 15:59 examples
  drwxrwxr-x. 7 cevent cevent   4096 4月  30 15:59 hcatalog
  -rw-rw-r--. 1 cevent cevent     23 5月   9 13:37 hive01.sql
  drwxrwxr-x. 4 cevent cevent   4096 5月   7 13:51 lib
  -rw-rw-r--. 1 cevent cevent  24754 4月  30 2015 LICENSE
  drwxrwxr-x. 2 cevent cevent   4096 5月  25 13:21 logs
  drwxrwxr-x. 5 cevent cevent   4096 4月  30 16:44 metastore_db
  -rw-rw-r--. 1 cevent cevent    397 6月  19 2015 NOTICE
  -rw-rw-r--. 1 cevent cevent   4366 6月  19 2015 README.txt
  -rw-rw-r--. 1 cevent cevent 421129 6月  19 2015 RELEASE_NOTES.txt
  -rw-rw-r--. 1 cevent cevent     11 5月   9 13:27 result.txt
  drwxrwxr-x. 3 cevent cevent   4096 4月  30 15:59 scripts
  -rw-rw-r--. 1 cevent cevent    171 5月   9 13:24 server.log
  -rw-rw-r--. 1 cevent cevent      5 5月   8 14:05 server.pid
  -rw-rw-r--. 1 cevent cevent    326 5月  22 13:34 student.txt
  [cevent@hadoop207 hive-1.2.1]$ bin/beeline 
  Beeline version 1.2.1 by Apache Hive
  beeline> !connect
  jdbc:hive2://hadoop207.cevent.com:10000
  Connecting to
  jdbc:hive2://hadoop207.cevent.com:10000
  Enter username for
  jdbc:hive2://hadoop207.cevent.com:10000: cevent
  Enter password for
  jdbc:hive2://hadoop207.cevent.com:10000: ******
  Connected to: Apache Hive (version 1.2.1)
  Driver: Hive JDBC (version 1.2.1)
  Transaction isolation:
  TRANSACTION_REPEATABLE_READ 创建分桶表
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> create
  table student_bucket(id int,name string)
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> clustered
  by(id)
  0: jdbc:hive2://hadoop207.cevent.com:10000>
  into 4 buckets
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> row
  format delimited fields terminated by '\t';
  No rows affected (2.565 seconds)
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> drop
  table student_bucket;
  No rows affected (1.049 seconds)
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> use
  cevent01;
  No rows affected (0.039 seconds)
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> show
  databases;
  +----------------+--+
  | database_name  |
  +----------------+--+
  | cevent01       |
  | default        |
  +----------------+--+
  2 rows selected (0.425 seconds)
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> create
  table student_bucket(id int,name string)
  0:
  jdbc:hive2://hadoop207.cevent.com:10000>
  clustered by(id)  根据id分桶
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> into 4
  buckets
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> row
  format delimited fields terminated by '\t';
  No rows affected (0.12 seconds)
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> desc
  formatted student_bucket;
  +-------------------------------+----------------------------------------------------------------------------------+-----------------------+--+
  |           col_name            |                                   
  data_type                                     |       
  comment        |
  +-------------------------------+----------------------------------------------------------------------------------+-----------------------+--+
  | # col_name                   
  | data_type                                                                        |
  comment               |
  |                              
  | NULL                                                                            
  | NULL                  |
  | id                           
  | int                                                                              |                       |
  | name                         
  | string                                                                          
  |                       |
  |                               | NULL                                                                            
  | NULL                  |
  | # Detailed Table Information 
  | NULL                                                                            
  | NULL                  |
  | Database:                    
  | cevent01                                                                        
  | NULL                  |
  | Owner:                       
  | cevent                                                                          
  | NULL                  |
  | CreateTime:                  
  | Mon May 25 13:27:15 CST 2020                                                    
  | NULL                  |
  | LastAccessTime:              
  | UNKNOWN                                                 
                          |
  NULL                  |
  | Protect Mode:                
  | None                                                                            
  | NULL                  |
  | Retention:                   
  | 0                             
                                                    |
  NULL                  |
  | Location:                    
  |
  hdfs://hadoop207.cevent.com:8020/user/hive/warehouse/cevent01.db/student_bucket  | NULL                  |
  | Table Type:                  
  | MANAGED_TABLE                                                                   
  | NULL                  |
  | Table Parameters:            
  | NULL                                                                            
  | NULL                  |
  |                               |
  transient_lastDdlTime                                                           
  | 1590384435            |
  |                              
  | NULL                                                                            
  | NULL                  |
  | # Storage Information        
  | NULL                                                                            
  | NULL                  |
  | SerDe Library:               
  | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe                               | NULL                  |
  | InputFormat:                 
  | org.apache.hadoop.mapred.TextInputFormat                                         |
  NULL                  |
  | OutputFormat:                
  | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat                       | NULL                  |
  | Compressed:                  
  | No                                                                              
  | NULL                  |
  | Num
  Buckets:                  | 4                                                                                |
  NULL                  |
  | Bucket
  Columns:               | [id]                                                                             |
  NULL                  |
  | Sort Columns:                 | []                                                                              
  | NULL                  |
  | Storage Desc Params:         
  | NULL                                                                            
  | NULL                  |
  |                              
  | field.delim                                                                     
  | \t                    |
  |                              
  | serialization.format                                                          
    | \t                    |
  +-------------------------------+----------------------------------------------------------------------------------+-----------------------+--+
  28 rows selected (0.128 seconds)
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> 
  load data local
  inpath '/opt/module/datas/student.txt' into table student_bucket;  分桶表实现分桶,不能直接加载数据
  INFO 
  : Loading data to table cevent01.student_bucket from
  file:/opt/module/datas/student.txt
  INFO 
  : Table cevent01.student_bucket stats: [numFiles=1, totalSize=165]
  No rows affected (1.183 seconds)
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> truncate
  table student_bucket; 清洗数据
  No rows affected (0.3 seconds)
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> show
  tables;
  +-----------------+--+
  |   
  tab_name     |
  +-----------------+--+
  | c_dept          |
  | c_emp           |
  | locations       |
  | student_bucket  |
  +-----------------+--+
  4 rows selected (0.05 seconds)
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> select * from locations;
  +----------------------+---------------------------+--+
  | locations.locations  | locations.locations_name  |
  +----------------------+---------------------------+--+
  | 1700                 | Beijing                   |
  | 1800                 | London                    |
  | 1900                 | Tokyo                     |
  +----------------------+---------------------------+--+
  3 rows selected (1.282 seconds)
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> create
  table student(id int,name string); 
  非规格创建未实现row format delimited fields terminated by '\t'
  No rows affected (0.086 seconds)
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> show tables;
  +-----------------+--+
  |   
  tab_name     |
  +-----------------+--+
  | c_dept          |
  | c_emp           |
  | locations       |
  | student         |
  | student_bucket  |
  +-----------------+--+
  5 rows selected (0.042 seconds)
  0:
  jdbc:hive2://hadoop207.cevent.com:10000>
   load data local inpath
  '/opt/module/datas/student.txt' into table student;  不能正常显示数据
  INFO 
  : Loading data to table cevent01.student from
  file:/opt/module/datas/student.txt
  INFO 
  : Table cevent01.student stats: [numFiles=1, totalSize=165]
  No rows affected (0.286 seconds)
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> select
  * from student;
  +-------------+---------------+--+
  | student.id  |
  student.name  |
  +-------------+---------------+--+
  | NULL        | NULL          |
  | NULL        | NULL          |
  | NULL        | NULL          |
  | NULL        | NULL          |
  | NULL        | NULL          |
  | NULL        | NULL          |
  | NULL        | NULL          |
  | NULL        | NULL          |
  | NULL        | NULL          |
  | NULL        | NULL          |
  | NULL        | NULL          |
  | NULL        | NULL          |
  | NULL        | NULL          |
  | NULL        | NULL          |
  | NULL        | NULL          |
  | NULL        | NULL          |
  +-------------+---------------+--+
  16 rows selected (0.101 seconds)
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> drop
  table student;
  No rows affected (0.557 seconds)
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> 
  create table
  student(id int,name string) row format delimited fields terminated by '\t';  创建规格表
  No rows affected (0.102 seconds)
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> 
  load data local
  inpath '/opt/module/datas/student.txt' into table student; 加载数据
  INFO 
  : Loading data to table cevent01.student from
  file:/opt/module/datas/student.txt
  INFO 
  : Table cevent01.student stats: [numFiles=1, totalSize=165]
  No rows affected (0.274 seconds)
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> select
  * from student;
  +-------------+---------------+--+
  | student.id  | student.name  |
  +-------------+---------------+--+
  | 1001 
        | ss1           |
  | 1002        | ss2           |
  | 1003        | ss3           |
  | 1004        | ss4           |
  | 1005        | ss5           |
  | 1006        | ss6           |
  | 1007        | ss7           |
  | 1008        | ss8           |
  | 1009 
        | ss9           |
  | 1010        | ss10          |
  | 1011        | ss11          |
  | 1012        | ss12          |
  | 1013        | ss13          |
  | 1014        | ss14          |
  | 1015        | ss15          |
  | 1016        | ss16          |
  +-------------+---------------+--+
  16 rows selected (0.106 seconds)
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> set
  mapreduce.job.reduces=-1; 将reduces设置为-1,默认自动分发reduce数量
  No rows affected (0.007 seconds)
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> insert
  into student_bucket select * from student; 根据查询结果插入分桶表
  INFO 
  : Number of reduce tasks is set to 0 since there's no reduce operator
  INFO 
  : number of splits:1
  INFO 
  : Submitting tokens for job: job_1590384013935_0001
  INFO 
  : The url to track the job: http://hadoop207.cevent.com:8088/proxy/application_1590384013935_0001/
  INFO 
  : Starting Job = job_1590384013935_0001, Tracking URL =
  http://hadoop207.cevent.com:8088/proxy/application_1590384013935_0001/
  INFO 
  : Kill Command = /opt/module/hadoop-2.7.2/bin/hadoop job  -kill job_1590384013935_0001
  INFO 
  : Hadoop job information for Stage-1: number of mappers: 1; number of
  reducers: 0
  INFO 
  : 2020-05-25 13:41:32,932 Stage-1 map = 0%,  reduce = 0%
  INFO 
  : 2020-05-25 13:41:44,335 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.52 sec
  INFO 
  : MapReduce Total cumulative CPU time: 1 seconds 520 msec
  INFO 
  : Ended Job = job_1590384013935_0001
  INFO 
  : Stage-4 is selected by condition resolver.
  INFO 
  : Stage-3 is filtered out by condition resolver.
  INFO 
  : Stage-5 is filtered out by condition resolver.
  INFO 
  : Moving data to:
  hdfs://hadoop207.cevent.com:8020/user/hive/warehouse/cevent01.db/student_bucket/.hive-staging_hive_2020-05-25_13-41-16_219_3490270901530759228-1/-ext-10000
  from hdfs://hadoop207.cevent.com:8020/user/hive/warehouse/cevent01.db/student_bucket/.hive-staging_hive_2020-05-25_13-41-16_219_3490270901530759228-1/-ext-10002
  INFO 
  : Loading data to table cevent01.student_bucket from
  hdfs://hadoop207.cevent.com:8020/user/hive/warehouse/cevent01.db/student_bucket/.hive-staging_hive_2020-05-25_13-41-16_219_3490270901530759228-1/-ext-10000
  INFO 
  : Table cevent01.student_bucket stats: [numFiles=1, numRows=16,
  totalSize=151, rawDataSize=135]
  No rows affected (30.855 seconds)
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> set
  hive.enforce.bucketing=true; 强制分桶enforce
  No rows affected (0.064 seconds)
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> truncate
  table student_bucket; 清洗数据
  No rows affected (0.173 seconds)
  0: jdbc:hive2://hadoop207.cevent.com:10000>
  insert into student_bucket select * from student; 插入分桶表数据
  INFO 
  : Number of reduce tasks determined at compile time: 4
  INFO 
  : In order to change the average load for a reducer (in bytes):
  INFO 
  :   set
  hive.exec.reducers.bytes.per.reducer=<number>
  INFO 
  : In order to limit the maximum number of reducers:
  INFO 
  :   set
  hive.exec.reducers.max=<number>
  INFO 
  : In order to set a constant number of reducers:
  INFO 
  :   set
  mapreduce.job.reduces=<number>
  INFO 
  : number of splits:1
  INFO 
  : Submitting tokens for job: job_1590384013935_0002
  INFO 
  : The url to track the job:
  http://hadoop207.cevent.com:8088/proxy/application_1590384013935_0002/
  INFO 
  : Starting Job = job_1590384013935_0002, Tracking URL =
  http://hadoop207.cevent.com:8088/proxy/application_1590384013935_0002/
  INFO 
  : Kill Command = /opt/module/hadoop-2.7.2/bin/hadoop job  -kill job_1590384013935_0002
  INFO 
  : Hadoop job information for Stage-1: number of mappers: 1; number of
  reducers: 4
  INFO 
  : 2020-05-25 13:43:13,851 Stage-1 map = 0%,  reduce = 0%
  INFO 
  : 2020-05-25 13:43:24,823 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.15 sec
  INFO 
  : 2020-05-25 13:43:48,648 Stage-1 map = 100%,  reduce = 54%, Cumulative CPU 2.41 sec
  INFO 
  : 2020-05-25 13:43:55,963 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 8.49 sec
  INFO 
  : MapReduce Total cumulative CPU time: 8 seconds 490 msec
  INFO 
  : Ended Job = job_1590384013935_0002
  INFO 
  : Loading data to table cevent01.student_bucket from
  hdfs://hadoop207.cevent.com:8020/user/hive/warehouse/cevent01.db/student_bucket/.hive-staging_hive_2020-05-25_13-43-04_971_5674088293092128498-1/-ext-10000
  INFO 
  : Table cevent01.student_bucket stats: [numFiles=4, numRows=16,
  totalSize=151, rawDataSize=135]
  No rows affected (69.532 seconds)
   
  访问链接:http://hadoop207.cevent.com:50070/explorer.html#/user/hive/warehouse/cevent01.db/student_bucket
  
   
   
    
    
    
    
    
    
    
    
    
    
    
    
   
   
   
  
   
  
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> desc
  formatted student_bucket; 查询formatted表
  +-------------------------------+----------------------------------------------------------------------------------+-----------------------+--+
  |           col_name            |                                    data_type                                     |        comment        |
  +-------------------------------+----------------------------------------------------------------------------------+-----------------------+--+
  | # col_name                   
  | data_type                                                                       
  | comment               |
  |                              
  | NULL                                                                            
  | NULL                  |
  | id                            | int                                                                             
  |                       |
  | name                         
  | string                                                                          
  |                       |
  |                              
  | NULL                                                                            
  | NULL                  |
  | # Detailed Table Information 
  | NULL                                                              
                | NULL                  |
  | Database:                    
  | cevent01                                                                        
  | NULL                  |
  | Owner:                       
  | cevent                                                                           |
  NULL                  |
  | CreateTime:                  
  | Mon May 25 13:27:15 CST 2020                                                    
  | NULL                  |
  | LastAccessTime:              
  | UNKNOWN                                                                          |
  NULL                  |
  | Protect Mode:                
  | None                                                                            
  | NULL                  |
  | Retention:                    | 0                                                                               
  | NULL                  |
  | Location:                    
  |
  hdfs://hadoop207.cevent.com:8020/user/hive/warehouse/cevent01.db/student_bucket  | NULL                  |
  | Table Type:                  
  | MANAGED_TABLE                                                                   
  | NULL                  |
  | Table Parameters:            
  | NULL                                                                         
     | NULL                  |
  |                              
  | COLUMN_STATS_ACCURATE                                                           
  | true                  |
  |                              
  | numFiles                                                                         | 4                     |
  |                              
  | numRows                                                                         
  | 16                    |
  |                              
  | rawDataSize                                                                      |
  135                   |
  |                              
  | totalSize                                                                       
  | 151                   |
  |                              
  | transient_lastDdlTime                                                           
  | 1590385454            |
  |                              
  | NULL                                                                            
  | NULL                  |
  | # Storage Information        
  | NULL                                                                            
  | NULL                  |
  | SerDe Library:               
  | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe                               | NULL                  |
  | InputFormat:                 
  | org.apache.hadoop.mapred.TextInputFormat                                         |
  NULL                  |
  | OutputFormat:                
  | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat                       | NULL                  |
  | Compressed:                  
  | No                                                                              
  | NULL                  |
  | Num Buckets:                 
  | 4                                  
                                               |
  NULL                  |
  | Bucket Columns:              
  | [id]                                                                            
  | NULL                  |
  | Sort Columns:                
  | []                                                                               |
  NULL                  |
  | Storage Desc Params:         
  | NULL                                                                            
  | NULL                  |
  |                               | field.delim                                                                     
  | \t                    |
  |                              
  | serialization.format                                                            
  | \t                    |
  +-------------------------------+----------------------------------------------------------------------------------+-----------------------+--+
  33 rows selected (1.099 seconds)
  0: jdbc:hive2://hadoop207.cevent.com:10000> select * from student tablesample(bucket 1 out of 4 on id);
  
  +-------------+---------------+--+
  | student.id  |
  student.name  |
  +-------------+---------------+--+
  | 1004        | ss4           |
  | 1008        | ss8           |
  | 1012        | ss12          |
  | 1016        | ss16          |
  +-------------+---------------+--+
  4 rows selected (2.827 seconds)
  0: jdbc:hive2://hadoop207.cevent.com:10000> select * from student_bucket tablesample(bucket 1 out of 4
  on id); 根据id分4份抽1份
  +--------------------+----------------------+--+
  | student_bucket.id  |
  student_bucket.name  |
  +--------------------+----------------------+--+
  | 1016               |
  ss16                 |
  | 1012               |
  ss12                 |
  | 1008               | ss8                  |
  | 1004               | ss4                  |
  +--------------------+----------------------+--+
  4 rows selected (0.665 seconds)
  
 


6.分桶、分区SQL解析




 
  
  //分区:减小数据扫描量
  // order by:全局排序(所有排序进入同一个reduce,归并排序),一般后年要跟limit,不写limit限制条件 非常耗费性能
  // sort by:局部排序,为分区排序(先分区后排序),多个reduce执行查询,结果拼接。按照hash值分区,对3(reduce数量=3)取余的结果
  // cluster by:当distribute by 和sort by字段相同时,可以采用cluster
  by方式。
   
  //完全相等的结果SQL,cluster by只有升序排序
  select * from
  c_emp cluster by empno;  
  select * from
  c_emp distribute by empno sort by empno;
   
  //分桶:为抽样查询服务
  //1.创建分桶表,clustered by(id)
  into 4 buckets按照id分建4个桶
  create table student_bucket(id int,name string)
  clustered by(id)
  into 4 buckets
  row format delimited fields terminated by '\t';
   
  //2.导入数据(常规导入不能分区)
  load data local inpath '/opt/module/datas/student.txt' into
  table student_bucket;
   
  //3.查询链接
  http://hadoop207.cevent.com:50070/explorer.html#/user/hive/warehouse/cevent01.db/student_bucket
   
  //4.抽样查询(bucket 1 out of 4
  on id),按照id把数据分成4份,从中取出1//分桶查询适用于非分桶表(按照表计算),执行抽样于分桶表效率高(按照桶计算)
  select * from student_bucket tablesample(bucket 1 out of 4
  on id);
  
 


7.concat、collect_list/concat_ws 行转列查询




 
  
  [cevent@hadoop207 ~]$ cd /opt/module/hive-1.2.1/
  [cevent@hadoop207 hive-1.2.1]$ ll
  总用量 524
  drwxrwxr-x. 3 cevent cevent   4096 4月  30 15:59 bin
  drwxrwxr-x. 2 cevent cevent   4096 5月   9 18:40 conf
  -rw-rw-r--. 1 cevent cevent  20403 5月  19 13:19 derby.log
  drwxrwxr-x. 4 cevent cevent   4096 4月  30 15:59 examples
  drwxrwxr-x. 7 cevent cevent   4096 4月  30 15:59 hcatalog
  -rw-rw-r--. 1 cevent cevent     23 5月   9 13:37 hive01.sql
  drwxrwxr-x. 4 cevent cevent   4096 5月   7 13:51 lib
  -rw-rw-r--. 1 cevent cevent  24754 4月  30 2015 LICENSE
  drwxrwxr-x. 2 cevent cevent   4096 5月  26 13:22 logs
  drwxrwxr-x. 5 cevent cevent   4096 4月  30 16:44 metastore_db
  -rw-rw-r--. 1 cevent cevent    397 6月  19 2015 NOTICE
  -rw-rw-r--. 1 cevent cevent   4366 6月  19 2015 README.txt
  -rw-rw-r--. 1 cevent cevent 421129 6月  19 2015 RELEASE_NOTES.txt
  -rw-rw-r--. 1 cevent cevent     11 5月   9 13:27 result.txt
  drwxrwxr-x. 3 cevent cevent   4096 4月  30 15:59 scripts
  -rw-rw-r--. 1 cevent cevent    171 5月   9 13:24 server.log
  -rw-rw-r--. 1 cevent cevent      5 5月   8 14:05 server.pid
  -rw-rw-r--. 1 cevent cevent    326 5月  22 13:34 student.txt
  [cevent@hadoop207 hive-1.2.1]$ bin/beeline 
  Beeline version 1.2.1 by Apache Hive
  beeline>
  !connect jdbc:hive2://hadoop207.cevent.com:10000
  Connecting to
  jdbc:hive2://hadoop207.cevent.com:10000
  Enter username for
  jdbc:hive2://hadoop207.cevent.com:10000: cevent
  Enter password for
  jdbc:hive2://hadoop207.cevent.com:10000: ******
  Connected to: Apache Hive (version 1.2.1)
  Driver: Hive JDBC (version 1.2.1)
  Transaction isolation:
  TRANSACTION_REPEATABLE_READ
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> show
  databases;
  +----------------+--+
  | database_name  |
  +----------------+--+
  | cevent01       |
  | default        |
  +----------------+--+
  2 rows selected (0.584 seconds)
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> use
  cevent01;
  No rows affected (0.08 seconds)
  0: jdbc:hive2://hadoop207.cevent.com:10000> show tables;
  +-----------------+--+
  |   
  tab_name     |
  +-----------------+--+
  | c_dept          |
  | c_emp           |
  | locations       |
  | student         |
  | student_bucket  |
  +-----------------+--+
  5 rows selected (0.058 seconds)
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> select
  ename,comment from c_emp;
  +---------+----------+--+
  | 
  ename  | comment  |
  +---------+----------+--+
  | cevent 
  | NULL     |
  | SMITH  
  | NULL     |
  | ALLEN  
  | 300.0    |
  | WARD   
  | 500.0    |
  | JONES  
  | NULL     |
  | MARTIN 
  | 1400.0   |
  | BLAKE  
  | NULL     |
  | CLARK  
  | NULL     |
  | SCOTT  
  | NULL     |
  | KING   
  | NULL     |
  | TURNER 
  | 0.0      |
  | ADAMS  
  | NULL     |
  | JAMES  
  | NULL     |
  | FORD   
  | NULL     |
  | MILLER 
  | NULL     |
  +---------+----------+--+
  15 rows selected (2.033 seconds)
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> select
  ename,nvl(comment,-1) from c_emp;  空字段赋值null value=nvl(字段,)
  +---------+---------+--+
  |  ename  |  
  _c1   |
  +---------+---------+--+
  | cevent  | -1.0    |
  | SMITH   | -1.0    |
  | ALLEN   | 300.0   |
  | WARD    | 500.0   |
  | JONES   | -1.0    |
  | MARTIN  | 1400.0  |
  | BLAKE   | -1.0    |
  | CLARK   | -1.0    |
  | SCOTT   | -1.0    |
  | KING    | -1.0    |
  | TURNER  | 0.0     |
  | ADAMS   | -1.0    |
  | JAMES   | -1.0    |
  | FORD    | -1.0    |
  | MILLER  | -1.0    |
  +---------+---------+--+
  15 rows selected (0.17 seconds)  创建性别表
  0: jdbc:hive2://hadoop207.cevent.com:10000> create table emp_gender(
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> name
  string,
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> dept_id
  string,
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> gender
  string
  0: jdbc:hive2://hadoop207.cevent.com:10000> )
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> row
  format delimited fields terminated by "\t";
  No rows affected (0.206 seconds)
  0:
  jdbc:hive2://hadoop207.cevent.com:10000>
  load data local inpath '/opt/module/datas/emp_sex.txt'; 加载数据
  Error: Error while compiling statement:
  FAILED: ParseException line 1:54 mismatched input '<EOF>' expecting
  INTO near ''/opt/module/datas/emp_sex.txt'' in load statement
  (state=42000,code=40000)
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> load data local inpath
  '/opt/module/datas/emp_sex.txt' into table emp_gender;
  INFO 
  : Loading data to table cevent01.emp_gender from
  file:/opt/module/datas/emp_sex.txt
  INFO 
  : Table cevent01.emp_gender stats: [numFiles=1, totalSize=78]
  No rows affected (1.004 seconds)
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> select
  * from emp_gender;
  +------------------+---------------------+--------------------+--+
  | emp_gender.name  | emp_gender.dept_id  | emp_gender.gender  |
  +------------------+---------------------+--------------------+--+
  | 悟空               | A                   ||
  | 大海               | A                   ||
  | 宋宋               | B                   ||
  | 凤姐               | A                   ||
  | 婷姐               | B                   ||
  | 婷婷               | B                   ||
  +------------------+---------------------+--------------------+--+
  6 rows selected (0.114 seconds)
  0: jdbc:hive2://hadoop207.cevent.com:10000> 
  select
  dept_id,count(1) dep_num from emp_gender group by dept_id; 根据部门id计算人数统计
  INFO 
  : Number of reduce tasks not specified. Estimated from input data
  size: 1
  INFO 
  : In order to change the average load for a reducer (in bytes):
  INFO 
  :   set
  hive.exec.reducers.bytes.per.reducer=<number>
  INFO 
  : In order to limit the maximum number of reducers:
  INFO 
  :   set
  hive.exec.reducers.max=<number>
  INFO 
  : In order to set a constant number of reducers:
  INFO 
  :   set mapreduce.job.reduces=<number>
  INFO 
  : number of splits:1
  INFO 
  : Submitting tokens for job: job_1590470551333_0001
  INFO 
  : The url to track the job:
  http://hadoop207.cevent.com:8088/proxy/application_1590470551333_0001/
  INFO 
  : Starting Job = job_1590470551333_0001, Tracking URL =
  http://hadoop207.cevent.com:8088/proxy/application_1590470551333_0001/
  INFO 
  : Kill Command = /opt/module/hadoop-2.7.2/bin/hadoop job  -kill job_1590470551333_0001
  INFO 
  : Hadoop job information for Stage-1: number of mappers: 1; number of
  reducers: 1
  INFO 
  : 2020-05-26 13:39:09,144 Stage-1 map = 0%,  reduce = 0%
  INFO 
  : 2020-05-26 13:39:24,611 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.17 sec
  INFO 
  : 2020-05-26 13:39:35,333 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 2.74 sec
  INFO 
  : MapReduce Total cumulative CPU time: 2 seconds 740 msec
  INFO 
  : Ended Job = job_1590470551333_0001
  +----------+----------+--+
  | dept_id  | dep_num  |
  +----------+----------+--+
  | A        | 3        |
  | B        | 3        |
  +----------+----------+--+
  2 rows selected (46.274 seconds)   根据部门,分男女统计人数
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> select
  dept_id,
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> sum(case
  gender when '男' then 1 else 0 end) male,
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> sum(case
  gender when '女' then 1 else 0 end) female
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> from
  emp_gender
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> group
  by dept_id;
  INFO 
  : Number of reduce tasks not specified. Estimated from input data
  size: 1
  INFO 
  : In order to change the average load for a reducer (in bytes):
  INFO 
  :   set
  hive.exec.reducers.bytes.per.reducer=<number>
  INFO 
  : In order to limit the maximum number of reducers:
  INFO 
  :   set hive.exec.reducers.max=<number>
  INFO 
  : In order to set a constant number of reducers:
  INFO 
  :   set
  mapreduce.job.reduces=<number>
  INFO 
  : number of splits:1
  INFO 
  : Submitting tokens for job: job_1590470551333_0002
  INFO 
  : The url to track the job: http://hadoop207.cevent.com:8088/proxy/application_1590470551333_0002/
  INFO 
  : Starting Job = job_1590470551333_0002, Tracking URL =
  http://hadoop207.cevent.com:8088/proxy/application_1590470551333_0002/
  INFO 
  : Kill Command = /opt/module/hadoop-2.7.2/bin/hadoop job  -kill job_1590470551333_0002
  INFO 
  : Hadoop job information for Stage-1: number of mappers: 1; number of
  reducers: 1
  INFO 
  : 2020-05-26 13:43:57,628 Stage-1 map = 0%,  reduce = 0%
  INFO 
  : 2020-05-26 13:44:07,761 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.28 sec
  INFO 
  : 2020-05-26 13:44:17,520 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 3.12 sec
  INFO 
  : MapReduce Total cumulative CPU time: 3 seconds 120 msec
  INFO 
  : Ended Job = job_1590470551333_0002
  +----------+-------+---------+--+
  | dept_id  | male  | female 
  |
  +----------+-------+---------+--+
  | A        | 2     | 1      
  |
  | B        | 1     | 2      
  |
  +----------+-------+---------+--+
  2 rows selected (32.373 seconds)  创建人物信息表
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> create
  table person_info(
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> name
  string,
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> constellation
  string,
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> blood_type
  string
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> )
  0: jdbc:hive2://hadoop207.cevent.com:10000>
  row format delimited fields terminated by
  "\t";
  No rows affected (0.486 seconds)
  0:
  jdbc:hive2://hadoop207.cevent.com:10000>
   load data local inpath '/opt/module/datas/constellation.txt'
  into table person_info;  加载数据
  INFO 
  : Loading data to table cevent01.person_info from
  file:/opt/module/datas/constellation.txt
  INFO 
  : Table cevent01.person_info stats: [numFiles=1, totalSize=129]
  No rows affected (1.094 seconds)
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> select *
  from person_info;
  +-------------------+----------------------------+-------------------------+--+
  | person_info.name  | person_info.constellation  | person_info.blood_type  |
  +-------------------+----------------------------+-------------------------+--+
  | 孙悟空              
  | 白羊座                        | A                       |
  | 大海               
  | 射手座                        | A                       |
  | 宋宋               
  | 白羊座            
             | B                       |
  | 猪八戒              
  | 白羊座                        | A                       |
  | 凤姐               
  | 射手座                        | A                       |
  | 苍老师              
  | 白羊座                        | B                       |
  +-------------------+----------------------------+-------------------------+--+
  6 rows selected (0.272 seconds) 根据星座,血型计算人数
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> select
  
  0: jdbc:hive2://hadoop207.cevent.com:10000>
  constellation,
  0:
  jdbc:hive2://hadoop207.cevent.com:10000>
  blood_type,
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> count(1)
  counts
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> from
  person_info
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> group
  by constellation,blood_type;
  INFO 
  : Number of reduce tasks not specified. Estimated from input data
  size: 1
  INFO 
  : In order to change the average load for a reducer (in bytes):
  INFO 
  :   set
  hive.exec.reducers.bytes.per.reducer=<number>
  INFO 
  : In order to limit the maximum number of reducers:
  INFO 
  :   set
  hive.exec.reducers.max=<number>
  INFO 
  : In order to set a constant number of reducers:
  INFO 
  :   set
  mapreduce.job.reduces=<number>
  INFO 
  : number of splits:1
  INFO 
  : Submitting tokens for job: job_1590470551333_0003
  INFO 
  : The url to track the job:
  http://hadoop207.cevent.com:8088/proxy/application_1590470551333_0003/
  INFO 
  : Starting Job = job_1590470551333_0003, Tracking URL =
  http://hadoop207.cevent.com:8088/proxy/application_1590470551333_0003/
  INFO 
  : Kill Command = /opt/module/hadoop-2.7.2/bin/hadoop job  -kill job_1590470551333_0003
  INFO 
  : Hadoop job information for Stage-1: number of mappers: 1; number of
  reducers: 1
  INFO 
  : 2020-05-26 13:55:59,920 Stage-1 map = 0%,  reduce = 0%
  INFO 
  : 2020-05-26 13:56:07,516 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.21 sec
  INFO 
  : 2020-05-26 13:56:16,940 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 2.99 sec
  INFO 
  : MapReduce Total cumulative CPU time: 2 seconds 990 msec
  INFO 
  : Ended Job = job_1590470551333_0003
  +----------------+-------------+---------+--+
  | constellation  |
  blood_type  | counts  |
  +----------------+-------------+---------+--+
  | 射手座           
  | A           | 2       |
  | 白羊座           
  | A           | 2       |
  | 白羊座            | B           | 2       |
  +----------------+-------------+---------+--+
  3 rows selected (26.977 seconds)
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> 和并列,concat(a,"",b)
  newCol
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> select
  
  0: jdbc:hive2://hadoop207.cevent.com:10000>
  concat(constellation,",",blood_type)
  constellation_blood,
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> count(1) counts
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> from
  person_info
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> group
  by constellation,blood_type;
  INFO 
  : Number of reduce tasks not specified. Estimated from input data
  size: 1
  INFO 
  : In order to change the average load for a reducer (in bytes):
  INFO 
  :   set hive.exec.reducers.bytes.per.reducer=<number>
  INFO 
  : In order to limit the maximum number of reducers:
  INFO 
  :   set
  hive.exec.reducers.max=<number>
  INFO 
  : In order to set a constant number of reducers:
  INFO 
  :   set
  mapreduce.job.reduces=<number>
  INFO 
  : number of splits:1
  INFO 
  : Submitting tokens for job: job_1590470551333_0004
  INFO 
  : The url to track the job:
  http://hadoop207.cevent.com:8088/proxy/application_1590470551333_0004/
  INFO 
  : Starting Job = job_1590470551333_0004, Tracking URL = http://hadoop207.cevent.com:8088/proxy/application_1590470551333_0004/
  INFO 
  : Kill Command = /opt/module/hadoop-2.7.2/bin/hadoop job  -kill job_1590470551333_0004
  INFO 
  : Hadoop job information for Stage-1: number of mappers: 1; number of
  reducers: 1
  INFO 
  : 2020-05-26 14:00:07,997 Stage-1 map = 0%,  reduce = 0%
  INFO 
  : 2020-05-26 14:00:15,545 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.17 sec
  INFO 
  : 2020-05-26 14:00:23,967 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 3.08 sec
  INFO 
  : MapReduce Total cumulative CPU time: 3 seconds 80 msec
  INFO 
  : Ended Job = job_1590470551333_0004
  +----------------------+---------+--+
  | constellation_blood  | counts 
  |
  +----------------------+---------+--+
  | 射手座,A               
  | 2       |
  | 白羊座,A               
  | 2       |
  | 白羊座,B               
  | 2       |
  +----------------------+---------+--+
  3 rows selected (30.086 seconds)
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> show
  functions like "collect*";  查询collect函数
  +---------------+--+
  |  
  tab_name    |
  +---------------+--+
  | collect_list  |
  | collect_set   |
  +---------------+--+
  2 rows selected (0.566 seconds)
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> desc
  function collect_list; collect_list返回list对象集合,duplicate重复
  +--------------------------------------------------------------+--+
  |                           tab_name                           |
  +--------------------------------------------------------------+--+
  | collect_list(x) - Returns a list of objects with duplicates  |
  +--------------------------------------------------------------+--+
  1 row selected (0.087 seconds)
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> select
  
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> concat(constellation,",",blood_type)
  constellation_blood,
  0: jdbc:hive2://hadoop207.cevent.com:10000> collect_list(name) names
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> from
  person_info
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> group
  by constellation,blood_type;
  INFO 
  : Number of reduce tasks not specified. Estimated from input data
  size: 1
  INFO 
  : In order to change the average load for a reducer (in bytes):
  INFO 
  :   set
  hive.exec.reducers.bytes.per.reducer=<number>
  INFO 
  : In order to limit the maximum number of reducers:
  INFO 
  :   set
  hive.exec.reducers.max=<number>
  INFO 
  : In order to set a constant number of reducers:
  INFO 
  :   set
  mapreduce.job.reduces=<number>
  INFO 
  : number of splits:1
  INFO 
  : Submitting tokens for job: job_1590470551333_0005
  INFO 
  : The url to track the job:
  http://hadoop207.cevent.com:8088/proxy/application_1590470551333_0005/
  INFO 
  : Starting Job = job_1590470551333_0005, Tracking URL =
  http://hadoop207.cevent.com:8088/proxy/application_1590470551333_0005/
  INFO 
  : Kill Command = /opt/module/hadoop-2.7.2/bin/hadoop job  -kill job_1590470551333_0005
  INFO 
  : Hadoop job information for Stage-1: number of mappers: 1; number of
  reducers: 1
  INFO 
  : 2020-05-26 14:05:52,477 Stage-1 map = 0%,  reduce = 0%
  INFO 
  : 2020-05-26 14:05:59,784 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.17 sec
  INFO 
  : 2020-05-26 14:06:09,210 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 3.13 sec
  INFO 
  : MapReduce Total cumulative CPU time: 3 seconds 130 msec
  INFO 
  : Ended Job = job_1590470551333_0005
  +----------------------+----------------+--+
  | constellation_blood  |     names      |
  +----------------------+----------------+--+
  | 射手座,A               
  | ["大海","凤姐"]    |
  | 白羊座,A               
  | ["孙悟空","猪八戒"]  |
  | 白羊座,B               
  | ["宋宋","苍老师"]   |
  +----------------------+----------------+--+
  3 rows selected (26.667 seconds)
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> desc
  function concat_ws; 将数组转换为字符串concat_ws
  +-------------------------------------------------------------------------------------------------------------------------+--+
  |        
                                                 tab_name                                                        
  |
  +-------------------------------------------------------------------------------------------------------------------------+--+
  | concat_ws(separator, [string | array(string)]+) - returns the
  concatenation of the strings separated by the separator.  |
  Concat_ws(分离器separator,[字符串 | 数组]) 返回 concatenation串联起来的字符串
  +-------------------------------------------------------------------------------------------------------------------------+--+
  1 row selected (0.128 seconds)
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> select
  
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> concat(constellation,",",blood_type)
  constellation_blood,
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> concat_ws("
  | ",collect_list(name) ) names 拼接concat_ws("分隔符",集合对象)
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> from
  person_info
  0:
  jdbc:hive2://hadoop207.cevent.com:10000> group
  by constellation,blood_type;
  INFO 
  : Number of reduce tasks not specified. Estimated from input data
  size: 1
  INFO 
  : In order to change the average load for a reducer (in bytes):
  INFO 
  :   set hive.exec.reducers.bytes.per.reducer=<number>
  INFO 
  : In order to limit the maximum number of reducers:
  INFO 
  :   set
  hive.exec.reducers.max=<number>
  INFO 
  : In order to set a constant number of reducers:
  INFO 
  :   set
  mapreduce.job.reduces=<number>
  INFO 
  : number of splits:1
  INFO 
  : Submitting tokens for job: job_1590470551333_0006
  INFO 
  : The url to track the job:
  http://hadoop207.cevent.com:8088/proxy/application_1590470551333_0006/
  INFO 
  : Starting Job = job_1590470551333_0006, Tracking URL =
  http://hadoop207.cevent.com:8088/proxy/application_1590470551333_0006/
  INFO 
  : Kill Command = /opt/module/hadoop-2.7.2/bin/hadoop job  -kill job_1590470551333_0006
  INFO 
  : Hadoop job information for Stage-1: number of mappers: 1; number of
  reducers: 1
  INFO 
  : 2020-05-26 14:09:27,405 Stage-1 map = 0%,  reduce = 0%
  INFO 
  : 2020-05-26 14:09:37,084 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.31 sec
  INFO 
  : 2020-05-26 14:09:46,372 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 3.13 sec
  INFO 
  : MapReduce Total cumulative CPU time: 3 seconds 130 msec
  INFO 
  : Ended Job = job_1590470551333_0006
  +----------------------+------------+--+
  | constellation_blood  |   names   
  |
  +----------------------+------------+--+
  | 射手座,A               
  | 大海 | 凤姐    |
  | 白羊座,A               
  | 孙悟空 | 猪八戒  |
  | 白羊座,B               
  | 宋宋 | 苍老师   |
  +----------------------+------------+--+
  3 rows selected (29.077 seconds)
   
  
 


  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值