Hadoop | Mapreduce的引擎:作业跟踪器和任务跟踪器

Introduction

介绍

Above the filesystem, there comes the MapReduce Engine, which consists of one JobTracker, to which client applications submit MapReduce jobs.

文件系统上方是MapReduce引擎 ,该引擎由一个JobTracker组成,客户端应用程序向其提交MapReduce作业

The Job tracker basically pushes work out to available TaskTracker nodes in the cluster, striving to keep the work as close to the data as possible.

作业跟踪程序基本上将工作推送到群集中的可用TaskTracker节点,努力使工作尽可能接近数据。

Through the rack-aware file system, the JobTracker basically knows which node contains the data or which has the information about data, and which other machines are nearby and If the work cannot be hosted on the original node where the data resides or stays then the priority will be given to nodes in the same rack.

通过机架式文件系统,JobTracker基本上可以知道哪个节点包含数据或哪个具有有关数据的信息,以及附近有哪些其他计算机,如果无法将工作托管在数据所在或保留的原始节点上,则优先级将给予同一机架中的节点。

This basically helps in reducing the network traffic on the main backbone network.

这基本上有助于减少主干网上的网络流量。

The part of the job is rescheduled when a TaskTracker fails or times out. The TaskTracker on each node spawns a separate Java Virtual Machine process to prevent the TaskTracker itself from falling if the running job crashes its JVM.

TaskTracker失败或超时时,将重新安排作业的一部分。 每个节点上的TaskTracker会生成一个单独的Java虚拟机进程,以防止正在运行的作业使JVM崩溃时TaskTracker本身掉落。

A heartbeat is sent from the TaskTracker to the JobTracker every few minutes to check is status. The JobTracker and TaskTracker status and information is exposed by jetty and can be viewed from a web browser.

每隔几分钟就会从TaskTracker向JobTracker发送一次心跳,以检查状态。 JobTracker和TaskTracker的状态和信息通过码头暴露出来,可以从Web浏览器中查看。

Limitation of this Approaches are as follows:

该方法的局限性如下:

  1. The allocation of work to TaskTracker is very simple. In this, every tasktracker is provided with a no. of available slots. Every active map basically takes up one slot. Role of Job Tracker is basically to allocates work to the tracker nearest to the data within an available slot. There is no consideration of the current system load of the allocated system, and hence its actual availability.

    将工作分配给TaskTracker非常简单。 在此,每个tasktracker都带有一个no。 可用插槽数。 每个活动地图基本上占用一个插槽。 作业跟踪程序的作用基本上是将工作分配给最接近可用插槽中数据的跟踪程序。 没有考虑分配系统的当前系统负载,因此也没有考虑它的实际可用性。

  2. If one tasktracker is very slow. It can help to basically delay the entire MapReduce job-especially towards the end of a job, where everything can end up waiting for the slowest task. With speculative execution enabled, however, the single task can be executed on multiple slave nodes.

    如果一个tasktracker非常慢。 它可以从根本上帮助延迟整个MapReduce作业,尤其是在作业结束时,所有事情最终都会等待最慢的任务。 但是,启用推测执行后,可以在多个从属节点上执行单个任务。

Scheduling

排程

By default, Hadoop uses First In First Out scheduling technique, 5 scheduling priorities to schedule jobs from a work queue. In version 0.19 the job scheduler was refracted out of the Job Tracker while adding the ability to basically use an alternate scheduler such as Fair Scheduler or Capacity Scheduler.

默认情况下,Hadoop使用先进先出调度技术,5种调度优先级来调度工作队列中的作业。 在版本0.19中,作业计划程序从“作业跟踪器”中删除,同时增加了基本使用替代计划程序(如“公平计划程序”或“容量计划程序”)的功能。

Let us discuss fair scheduler and Capacity Scheduler...

让我们讨论公平的调度程序和容量调度程序 ...

1) Fair Scheduler

1)公平的日程安排

This scheduler was developed by Facebook. The Basic goal of developing this to provide fast response times for small jobs and QoS for production jobs.

该调度程序由Facebook开发。 开发此服务的基本目标是为小型作业提供快速响应时间,为生产作业提供QoS。

Basic Concepts of this are as follows:

其基本概念如下:

  1. Jobs are basically grouped into pools.

    作业基本上分为池。

  2. In this type of scheduler each pool is basically assigned to a guaranteed minimum share.

    在这种类型的调度程序中,每个池基本上分配有保证的最小份额。

  3. Excess capacity is basically split between the jobs.

    多余的产能基本上在工作之间分配。

Pools basically specify the minimum no. of map slots, reduce slots and a limit on running jobs.

池基本上指定了最小编号。 映射插槽的数量,减少插槽和限制正在运行的作业。

2) Capacity Scheduler

2)容量调度程序

This scheduler was basically designed by Yahoo. The capacity scheduler basically supports several features that are too supported by fair scheduler:

该调度程序基本上是由Yahoo设计的。 容量调度程序基本上支持公平调度程序也支持的几个功能:

  1. Queues are allocated a fraction of total resource capacity.

    队列被分配了总资源容量的一小部分。

  2. The free resource is allocated to queues beyond their total capacity.

    空闲资源被分配给超出其总容量的队列。

  3. There is no preemption once a job starts running.

    作业开始运行后,没有任何抢占。

  4. Within a queue the job which has the highest priority has the access to the queue’s resources.

    在队列中,优先级最高的作业有权访问队列的资源。

翻译自: https://www.includehelp.com/big-data/mapreduces-engine-job-tracker-and-task-tracker-in-hadoop.aspx

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值