1,开启FetchTask
一个简单的查询语句,是指一个没有函数、排序等功能的语句,当开启一个Fetch Task功能,就执行一个简单的查询语句不会生成MapRreduce作业,而是直接使用FetchTask,从hdfs文件系统中进行查询输出数据,从而提高效率。
设置的方式:
Hive.fetch.task.conversion 默认为minimal
修改配置文件hive-site.xml
hive.fetch.task.conversion
more
Some select queries can be converted to single FETCH task
minimizing latency.Currently the query should be single
sourced not having any subquery and should not have
any aggregations or distincts (which incurrs RS),
lateral views and joins.
1. minimal : SELECT STAR, FILTER on partition columns, LIMIT only
2. more : SELECT, FILTER, LIMIT only (+TABLESAMPLE, virtual columns)
或者当前session修改
hive> set hive.fetch.task.conversion=more;
执行SELECT id, money FROM m limit 10; 不走mr
2,合并中间表
一个日志文件中,