在Hive中查询语句往往都要被解析成MapReduce的job进行计算,但是有两个查询语句是不走MapReduce的,如下:
1.查询某张表的所有数据
结果显示如下:
- hive> select * from employees;
- OK
- lavimer 15000.0 ["li","lu","wang"] {"k1":1.0,"k2":2.0,"k3":3.0} {"street":"dingnan","city":"ganzhou","num":101}
- liao 18000.0 ["liu","li","huang"] {"k4":2.0,"k5":3.0,"k6":6.0} {"street":"dingnan","city":"ganzhou","num":102}
- zhang 19000.0 ["xiao","wen","tian"] {"k7":7.0,"k8":8.0,"k8":8.0} {"street":"dingnan","city":"ganzhou","num":103}
- Time taken: 0.176 seconds
- hive>
从上述语句中我们可以发现这个查询语句并没有走MapReduce。
2.抽样查询
- select * from employees limit 2;
注:在MYSQL中limit是取前几条记录,但是在Hive中,limit是抽样,会随机返回对应的记录数。
结果显示如下:
- hive> select * from employees limit 2;
- OK
- lavimer 15000.0 ["li","lu","wang"] {"k1":1.0,"k2":2.0,"k3":3.0} {"street":"dingnan","city":"ganzhou","num":101}
- liao 18000.0 ["liu","li","huang"] {"k4":2.0,"k5":3.0,"k6":6.0} {"street":"dingnan","city":"ganzhou","num":102}
- Time taken: 0.079 seconds
- hive>
从上述语句中我们可以发现这个查询语句并没有走MapReduce。
其实,为了查询效率,简单查询hive就不走mr,这个是可以设置的,在hive-site.xml里面有个配置参数叫:
hive.fetch.task.conversion
将这个参数设置为more,简单查询就不走map/reduce了,设置为minimal,就任何简单select都会走map/reduce