Hive, Map-Reduce and Local-Mode

最新推荐文章于 2021-02-10 16:45:29 发布

weixin_34044273

最新推荐文章于 2021-02-10 16:45:29 发布

阅读量74

点赞数

文章标签：大数据 runtime

原文链接：http://blog.51cto.com/zorro/1073570

版权

转自 https://cwiki.apache.org/Hive/gettingstarted.html

Hive, Map-Reduce and Local-Mode

Hive compiler generates map-reduce jobs for most queries. These jobs are then submitted to the Map-Reduce cluster indicated by the variable:

  mapred.job.tracker

While this usually points to a map-reduce cluster with multiple nodes, Hadoop also offers a nifty option to run map-reduce jobs locally on the user's workstation. This can be very useful to run queries over small data sets - in such cases local mode execution is usually significantly faster than submitting jobs to a large cluster. Data is accessed transparently from HDFS. Conversely, local mode only runs with one reducer and can be very slow processing larger data sets.

Starting v-0.7, Hive fully supports local mode execution. To enable this, the user can enable the following option:

  hive> SET mapred.job.tracker=local;

In addition, mapred.local.dir should point to a path that's valid on the local machine (for example /tmp/<username>/mapred/local). (Otherwise, the user will get an exception allocating local disk space).

Starting v-0.7, Hive also supports a mode to run map-reduce jobs in local-mode automatically. The relevant options are:

  hive> SET hive.exec.mode.local.auto=false;

note that this feature is disabled by default. If enabled - Hive analyzes the size of each map-reduce job in a query and may run it locally if the following thresholds are satisfied:

The total input size of the job is lower than: hive.exec.mode.local.auto.inputbytes.max (128MB by default)
The total number of map-tasks is less than: hive.exec.mode.local.auto.tasks.max (4 by default)
The total number of reduce tasks required is 1 or 0.

So for queries over small data sets, or for queries with multiple map-reduce jobs where the input to subsequent jobs is substantially smaller (because of reduction/filtering in the prior job), jobs may be run locally.

Note that there may be differences in the runtime environment of hadoop server nodes and the machine running the hive client (because of different jvm versions or different software libraries). This can cause unexpected behavior/errors while running in local mode. Also note that local mode execution is done in a separate, child jvm (of the hive client). If the user so wishes, the maximum amount of memory for this child jvm can be controlled via the option hive.mapred.local.mem. By default, it's set to zero, in which case Hive lets Hadoop determine the default memory limits of the child jvm.

转载于:https://blog.51cto.com/zorro/1073570

weixin_34044273

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Hive, Map-Reduce and Local-Mode

转自https://cwiki.apache.org/Hive/gettingstarted.htmlHive, Map-Reduce and Local-ModeHive compiler generates map-reduce jobs for most queries. These jobs are then submitted to the Map-Re...
复制链接

扫一扫