声明:未经本人允许,不得转载哦!
哈喽,大家好。这两天就要高考了,我原本是一名物理老师,这一届初高中的学生带完,估计就要开始找大数据岗位的工作了。目前掌握的是技能有java+linux++mysql+hadoop+hive+hbase,正在学习的是shell,计划2016年接着要学习的是scala+spark。祝我好运吧。
今天我们一起来学习的是【Hive高级查询group、order语法】。话不多说,咱们开始吧。
1 Hive的高级查询操作有很多,主要有:
group by #按K来把数据进行分组
order by #全局排序
join #两个表进行连接
distribute by #把数据打散,按字段把数据分到不同的文件里面
sort by #会把每个reducer上的数据进行排序,局部排序
cluster by #cluster by 除了具有 distribute by 的功能外还兼具 sort by 的功能。
union all 把多个表进行组合起来形成一个新表
这些操作其底层实现的都是mapreduce.
2 几个简单得聚合操作
count计数
count(*) count(1) count(col)
sum求和
sum(可转成数字的值)返回bigint
sum(col)+cast(1 as bigint)
avg求平均值
avg(可转成数字的值)返回double
distinct不同值个数
count(distinct col)
3 order by
这个函数的功能是:按照某些字段排序
样例是:
select col1,other...
from table
where condition
order by col1,col2[asc|desc]
关于order by值得注意的是:
order by 后面可以有多列进行排序,默认按字典排序;
order by为全局排序;
order by需要reduce操作,且只有一个reduce,与配置有关。
4好的,接下来我们来实战一下:建立一个M表。
hive> create table M(
> col string,
> col2 string
> )
> row format delimited fields terminated by '\t'
> lines terminated by '\n'
> stored as textfile;
OK
Time taken: 0.283 seconds
hive>
加载本地的数据进入M表中:
hive> load data local inpath '/usr/host/M' into table M;
Copying data from file:/usr/host/M
Copying file: file:/usr/host/M
Loading data to table default.m
OK
Time taken: 0.721 seconds
hive>
接下来进行查询:
hive> select * from M;
OK
A 1
B 5
B 2
C 3
Time taken: 0.288 seconds
hive> select * from M order by col desc,col2 asc;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
Job running in-process (local Hadoop)
Hadoop job information for null: number of mappers: 1; number of reducers: 1
2016-06-06 01:28:20,284 null map = 0%, reduce = 0%
2016-06-06 01:28:40,233 null map = 100%, reduce = 0%, Cumulative CPU 1.18 sec
2016-06-06 01:28:43,409 null map = 100%, reduce = 0%, Cumulative CPU 1.18 sec
2016-06-06 01:28:44,480 null map = 100%, reduce = 0%, Cumulative CPU 1.18 sec
2016-06-06 01:28:45,560 null map = 100%, reduce = 0%, Cumulative CPU 1.18 sec
2016-06-06 01:28:46,621 null map = 100%, reduce = 0%, Cumulative CPU 1.18 sec
2016-06-06 01:28:47,676 null map = 100%, reduce = 0%, Cumulative CPU 1.18 sec
2016-06-06 01:28:48,753 null map = 100%, reduce = 0%, Cumulative CPU 1.18 sec
2016-06-06 01:28:49,831 null map = 100%, reduce = 0%, Cumulative CPU 1.18 sec
2016-06-06 01:28:50,918 null map = 100%, reduce = 0%, Cumulative CPU 1.18 sec
2016-06-06 01:28:51,987 null map = 100%, reduce = 0%, Cumulative CPU 1.18 sec
2016-06-06 01:28:53,041 null map = 100%, reduce = 0%, Cumulative CPU 1.18 sec
2016-06-06 01:28:54,137 null map = 100%, reduce = 0%, Cumulative CPU 1.18 sec
2016-06-06 01:28:55,198 null map = 100%, reduce = 0%, Cumulative CPU 1.18 sec
2016-06-06 01:28:56,242 null map = 100%, reduce = 100%, Cumulative CPU 1.86 sec
2016-06-0