1. 常用函数
1hive (default)> select ename,sal,
2 > case
3 > when sal>1 and sal<=1000 then "lower"
4 > when sal>1000 and sal<=2000 then "just so so"
5 > when sal>2000 and sal<=4000 then "ok"
6 > else "high"
7 > end
8 > from emp03;
9OK
10ename sal _c2
11'SMITH' 800.0 lower
12'ALLEN' 1600.0 just so so
13'WARD' 1250.0 just so so
14'JONES' 2975.0 ok
15'MARTIN' 1250.0 just so so
16'BLAKE' 2850.0 ok
17'CLARK' 2450.0 ok
18'SCOTT' 3000.0 ok
19'KING' 5000.0 high
20'TURNER' 1500.0 just so so
21'ADAMS' 1100.0 just so so
22'JAMES' 950.0 lower
23'FORD' 3000.0 ok
24'MILLER' 1300.0 just so so
25Time taken: 0.04 seconds, Fetched: 14 row(s)
2. 4 个 by
1order by 全局排序,只会开启一个reduce,如果数据量过大,任务会很慢
2
3hive (default)> select * from emp03 order by empno;
4Query ID = hadoop_20190718155454_48a2d16a-3b26-416c-99f1-e2a7873d34ec
5Total jobs = 1
6Launching Job 1 out of 1
7Number of reduce tasks determined at compile time: 1
8In order to change the average load for a reducer (in bytes):
9 set hive.exec.reducers.bytes.per.reducer=<number>
10In order to limit the maximum number of reducers:
11 set hive.exec.reducers.max=<number>
12In order to set a constant number of reducers:
13 set mapreduce.job.reduces=<number>
14Starting Job = job_1563262646611_0003, Tracking URL = http://hadoop001:8098/proxy/application_1563262646611_0003/
15Kill Command = /home/hadoop/app/hadoop/bin/hadoop job -kill job_1563262646611_0003
16Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 只开启一个reduce
172019-07-18 15:55:00,270 Stage-1 map = 0%, reduce = 0%
182019-07-18 15:55:05,599 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.38 sec
192019-07-18 15:55:12,982 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.21 sec
20MapReduce Total cumulative CPU time: 3 seconds 210 msec
21Ended Job = job_1563262646611_0003
22MapReduce Jobs Launched:
23Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 3.21 sec HDFS Read: 9779 HDFS Write: 761 SUCCESS
24Total MapReduce CPU Time Spent: 3 seconds 210 msec
25OK
26emp03.empno emp03.ename emp03.job emp03.mgr emp03.hiredate emp03.sal emp03.comm emp03.deptno
277369 'SMITH' 'CLERK' 7902 '1980-12-17' 800.0 NULL 20
287499 'ALLEN' 'SALESMAN' 7698 '1981-02-20' 1600.0 300.0 30
297521 'WARD' 'SALESMAN' 7698 '1981-02-22' 1250.0 500.0 30
307566 'JONES' 'MANAGER' 7839 '1981-04-02' 2975.0 NULL 20
317654 'MARTIN' 'SALESMAN' 7698 '1981-09-28' 1250.0 1400.0 30
327698 'BLAKE' 'MANAGER' 7839 '1981-05-01' 2850.0 NULL 30
337782 'CLARK' 'MANAGER' 7839 '1981-06-09' 2450.0 NULL 10
347788 'SCOTT' 'ANALYST' 7566 '1982-12-09' 3000.0 NULL 20
357839 'KING' 'PRESIDENT' NULL '1981-11-17' 5000.0 NULL 10
367844 'TURNER' 'SALESMAN' 7698 '1981-09-08' 1500.0 0.0 30
377876 'ADAMS' 'CLERK' 7788 '1983-01-12' 1100.0 NULL 20
387900 'JAMES' 'CLERK' 7698 '1981-12-03' 950.0 NULL 30
397902 'FORD' 'ANALYST' 7566 '1981-12-03' 3000.0 NULL 20
407934 'MILLER' 'CLERK' 7782 '1982-01-23' 1300.0 NULL 10
41Time taken: 20.565 seconds, Fetched: 14 row(s)
42
43sort by 局部排序,每个reduce内是有序的
44如果数据量太少,展示不出效果,可以修改这个变量 set mapred.reduce.tasks=3;
45
46hive (default)> select * from emp03 sort by empno desc;
47Query ID = hadoop_20190718155858_755df3d6-36ed-4a40-a606-8732740c369e
48Total jobs = 1
49Launching Job 1 out of 1
50Number of reduce tasks not specified. Defaulting to jobconf value of: 3
51In order to change the average load for a reducer (in bytes):
52 set hive.exec.reducers.bytes.per.reducer=<number>
53In order to limit the maximum number of reducers:
54 set hive.exec.reducers.max=<number>
55In order to set a constant number of reducers:
56 set mapreduce.job.reduces=<number>
57Starting Job = job_1563262646611_0005, Tracking URL = http://hadoop001:8098/proxy/application_1563262646611_0005/
58Kill Command = /home/hadoop/app/hadoop/bin/hadoop job -kill job_1563262646611_0005
59Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 3 开启了3个reduce
602019-07-18 15:58:39,028 Stage-1 map = 0%, reduce = 0%
612019-07-18 15:58:45,302 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.97 sec
622019-07-18 15:58:56,033 Stage-1 map = 100%, reduce = 33%, Cumulative CPU 4.17 sec
632019-07-18 15:58:58,140 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 7.99 sec
64MapReduce Total cumulative CPU time: 7 seconds 990 msec
65Ended Job = job_1563262646611_0005
66MapReduce Jobs Launched:
67Stage-Stage-1: Map: 1 Reduce: 3 Cumulative CPU: 7.99 sec HDFS Read: 19199 HDFS Write: 761 SUCCESS
68Total MapReduce CPU Time Spent: 7 seconds 990 msec
69OK
70emp03.empno emp03.ename emp03.job emp03.mgr emp03.hiredate emp03.sal emp03.comm emp03.deptno
717844 'TURNER' 'SALESMAN' 7698 '1981-09-08' 1500.0 0.0 30
727839 'KING' 'PRESIDENT' NULL '1981-11-17' 5000.0 NULL 10
737788 'SCOTT' 'ANALYST' 7566 '1982-12-09' 3000.0 NULL 20
747782 'CLARK' 'MANAGER' 7839 '1981-06-09' 2450.0 NULL 10
757698 'BLAKE' 'MANAGER' 7839 '1981-05-01' 2850.0 NULL 30
767654 'MARTIN' 'SALESMAN' 7698 '1981-09-28' 1250.0 1400.0 30
77第1个reduce
787934 'MILLER' 'CLERK' 7782 '1982-01-23' 1300.0 NULL 10
797900 'JAMES' 'CLERK' 7698 '1981-12-03' 950.0 NULL 30
807876 'ADAMS' 'CLERK' 7788 '1983-01-12' 1100.0 NULL 20
817566 'JONES' 'MANAGER' 7839 '1981-04-02' 2975.0 NULL 20
827521 'WARD' 'SALESMAN' 7698 '1981-02-22' 1250.0 500.0 30
837499 'ALLEN' 'SALESMAN' 7698 '1981-02-20' 1600.0 300.0 30
84第2个reduce
857902 'FORD' 'ANALYST' 7566 '1981-12-03' 3000.0 NULL 20
867369 'SMITH' 'CLERK' 7902 '1980-12-17' 800.0 NULL 20
87第3个reduce
88Time taken: 27.88 seconds, Fetched: 14 row(s)
89
90distribute by 按照一定的规则把数据分散到某个reducer,不属于排序
91
92hive (default)> select * from emp03 distribute by length(ename);
93Query ID = hadoop_20190718162626_69db6f08-aafe-4195-9083-318899345aa4
94Total jobs = 1
95Launching Job 1 out of 1
96Number of reduce tasks not specified. Defaulting to jobconf value of: 3
97In order to change the average load for a reducer (in bytes):
98 set hive.exec.reducers.bytes.per.reducer=<number>
99In order to limit the maximum number of reducers:
100 set hive.exec.reducers.max=<number>
101In order to set a constant number of reducers:
102 set mapreduce.job.reduces=<number>
103Starting Job = job_1563262646611_0015, Tracking URL = http://hadoop001:8098/proxy/application_1563262646611_0015/
104Kill Command = /home/hadoop/app/hadoop/bin/hadoop job -kill job_1563262646611_0015
105Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 3
1062019-07-18 16:26:32,526 Stage-1 map = 0%, reduce = 0%
1072019-07-18 16:26:37,798 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.81 sec
1082019-07-18 16:26:48,575 Stage-1 map = 100%, reduce = 33%, Cumulative CPU 3.39 sec
1092019-07-18 16:26:49,650 Stage-1 map = 100%, reduce = 67%, Cumulative CPU 5.2 sec
1102019-07-18 16:26:50,692 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 7.06 sec
111MapReduce Total cumulative CPU time: 7 seconds 60 msec
112Ended Job = job_1563262646611_0015
113MapReduce Jobs Launched:
114Stage-Stage-1: Map: 1 Reduce: 3 Cumulative CPU: 7.06 sec HDFS Read: 19770 HDFS Write: 815 SUCCESS
115Total MapReduce CPU Time Spent: 7 seconds 60 msec
116OK
117emp03.empno emp03.ename emp03.job emp03.mgr emp03.hiredate emp03.sal emp03.comm emp03.deptno
1187902 'FORD' 'ANALYST' 7566 '1981-12-03' 3000.0 NULL 20
1197839 'KING' 'PRESIDENT' NULL '1981-11-17' 5000.0 NULL 10
120第1个reduce
1217521 'WARD' 'SALESMAN' 7698 '1981-02-22' 1250.0 500.0 30
1227788 'SCOTT' 'ANALYST' 7566 '1982-12-09' 3000.0 NULL 20
1237900 'JAMES' 'CLERK' 7698 '1981-12-03' 950.0 NULL 30
1247876 'ADAMS' 'CLERK' 7788 '1983-01-12' 1100.0 NULL 20
1257782 'CLARK' 'MANAGER' 7839 '1981-06-09' 2450.0 NULL 10
1267698 'BLAKE' 'MANAGER' 7839 '1981-05-01' 2850.0 NULL 30
1277566 'JONES' 'MANAGER' 7839 '1981-04-02' 2975.0 NULL 20
1287499 'ALLEN' 'SALESMAN' 7698 '1981-02-20' 1600.0 300.0 30
1297369 'SMITH' 'CLERK' 7902 '1980-12-17' 800.0 NULL 20
130第2个reduce
1317934 'MILLER' 'CLERK' 7782 '1982-01-23' 1300.0 NULL 10
1327844 'TURNER' 'SALESMAN' 7698 '1981-09-08' 1500.0 0.0 30
1337654 'MARTIN' 'SALESMAN' 7698 '1981-09-28' 1250.0 1400.0 30
134第3个reduce
135Time taken: 25.021 seconds, Fetched: 14 row(s)
136length()为内置函数
137内置函数查询方法:
138show functions;
139desc function length;
140desc function extended length;
141
142cluster by = distribute by xxx sort by xxx
143
144hive (default)> select * from emp03 cluster by ename;;
145Query ID = hadoop_20190718163333_d8eacf37-94ff-44a3-baeb-a06ae3733d3d
146Total jobs = 1
147Launching Job 1 out of 1
148Number of reduce tasks not specified. Defaulting to jobconf value of: 3
149In order to change the average load for a reducer (in bytes):
150 set hive.exec.reducers.bytes.per.reducer=<number>
151In order to limit the maximum number of reducers:
152 set hive.exec.reducers.max=<number>
153In order to set a constant number of reducers:
154 set mapreduce.job.reduces=<number>
155Starting Job = job_1563262646611_0016, Tracking URL = http://hadoop001:8098/proxy/application_1563262646611_0016/
156Kill Command = /home/hadoop/app/hadoop/bin/hadoop job -kill job_1563262646611_0016
157Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 3
1582019-07-18 16:33:23,244 Stage-1 map = 0%, reduce = 0%
1592019-07-18 16:33:28,522 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.44 sec
1602019-07-18 16:33:38,282 Stage-1 map = 100%, reduce = 33%, Cumulative CPU 3.02 sec
1612019-07-18 16:33:39,357 Stage-1 map = 100%, reduce = 67%, Cumulative CPU 4.88 sec
1622019-07-18 16:33:40,394 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 6.84 sec
163MapReduce Total cumulative CPU time: 6 seconds 840 msec
164Ended Job = job_1563262646611_0016
165MapReduce Jobs Launched:
166Stage-Stage-1: Map: 1 Reduce: 3 Cumulative CPU: 6.84 sec HDFS Read: 19344 HDFS Write: 815 SUCCESS
167Total MapReduce CPU Time Spent: 6 seconds 840 msec
168OK
169emp03.empno emp03.ename emp03.job emp03.mgr emp03.hiredate emp03.sal emp03.comm emp03.deptno
1707876 'ADAMS' 'CLERK' 7788 '1983-01-12' 1100.0 NULL 20
1717499 'ALLEN' 'SALESMAN' 7698 '1981-02-20' 1600.0 300.0 30
1727839 'KING' 'PRESIDENT' NULL '1981-11-17' 5000.0 NULL 10
1737534 'MANAGER' 'CLERK' 7582 '1982-02-23' 1300.0 NULL 20
1747788 'SCOTT' 'ANALYST' 7566 '1982-12-09' 3000.0 NULL 20
175第1个reduce
1767782 'CLARK' 'MANAGER' 7839 '1981-06-09' 2450.0 NULL 10
1777900 'JAMES' 'CLERK' 7698 '1981-12-03' 950.0 NULL 30
1787566 'JONES' 'MANAGER' 7839 '1981-04-02' 2975.0 NULL 20
1797654 'MARTIN' 'SALESMAN' 7698 '1981-09-28' 1250.0 1400.0 30
1807934 'MILLER' 'CLERK' 7782 '1982-01-23' 1300.0 NULL 10
1817369 'SMITH' 'CLERK' 7902 '1980-12-17' 800.0 NULL 20
1827844 'TURNER' 'SALESMAN' 7698 '1981-09-08' 1500.0 0.0 30
183第2个reduce
1847698 'BLAKE' 'MANAGER' 7839 '1981-05-01' 2850.0 NULL 30
1857902 'FORD' 'ANALYST' 7566 '1981-12-03' 3000.0 NULL 20
1867521 'WARD' 'SALESMAN' 7698 '1981-02-22' 1250.0 500.0 30
187第3个reduce
188Time taken: 24.887 seconds, Fetched: 15 row(s)