Hadoop体系结构– YARN，HDFS和MapReduce

最新推荐文章于 2021-02-16 16:20:18 发布

cunchi4221

最新推荐文章于 2021-02-16 16:20:18 发布

阅读量356

点赞数

文章标签：分布式 java hadoop 大数据编程语言

原文链接：https://www.journaldev.com/8800/hadoop-architecture-yarn-hdfs-mapreduce

版权

Before reading this post, please go through my previous post at “Hadoop 1.x: Architecture and How it Works” to get basic knowledge about Hadoop.

在阅读本文之前，请仔细阅读我以前的文章“ Hadoop 1.x：架构及其工作原理”，以获取有关Hadoop的基本知识。

Hadoop架构 (Hadoop Architecture)

In this post, we are going to discuss about Apache Hadoop 2.x Architecture and How it’s components work in detail.

在本文中，我们将讨论Apache Hadoop 2.x体系结构及其组件的详细工作方式。

邮政的简要目录 (Post’s Brief Table of Contents)

Hadoop 2.x Architecture
Hadoop 2.x架构
Hadoop 2.x Major Components
Hadoop 2.x主要组件
How Hadoop 2.x Major Components Works
Hadoop 2.x主要组件如何工作

Hadoop 2.x架构 (Hadoop 2.x Architecture)

Apache Hadoop 2.x or later versions are using the following Hadoop Architecture. It is a Hadoop 2.x High-level Architecture. We will discuss in-detailed Low-level Architecture in coming sections.

Apache Hadoop 2.x或更高版本使用以下Hadoop体系结构。它是Hadoop 2.x高级架构。我们将在接下来的部分中讨论详细的低层架构。

Hadoop Common Module is a Hadoop Base API (A Jar file) for all Hadoop Components. All other components works on top of this module.
Hadoop通用模块是适用于所有Hadoop组件的Hadoop基本API（一个Jar文件）。所有其他组件均在此模块上运行。
HDFS stands for Hadoop Distributed File System. It is also know as HDFS V2 as it is part of Hadoop 2.x with some enhanced features. It is used as a Distributed Storage System in Hadoop Architecture.
HDFS代表Hadoop分布式文件系统。它也被称为HDFS V2，因为它是Hadoop 2.x的一部分，具有一些增强的功能。它用作Hadoop体系结构中的分布式存储系统。
YARN stands for Yet Another Resource Negotiator. It is new Component in Hadoop 2.x Architecture. It is also know as “MR V2”.
YARN代表“另一个资源谈判者”。它是Hadoop 2.x架构中的新组件。也称为“ MR V2”。
MapReduce is a Batch Processing or Distributed Data Processing Module. It is also know as “MR V1” as it is part of Hadoop 1.x with some updated features.
MapReduce是批处理或分布式数据处理模块。它也被称为“ MR V1”，因为它是Hadoop 1.x的一部分，具有一些更新的功能。
Remaining all Hadoop Ecosystem components work on top of these three major components: HDFS, YARN and MapReduce. We will discuss all Hadoop Ecosystem components in-detail in my coming posts.
其余所有Hadoop生态系统组件均在以下三个主要组件之上工作：HDFS，YARN和MapReduce。在我的后续文章中，我们将详细讨论所有Hadoop生态系统组件。

When compared to Hadoop 1.x, Hadoop 2.x Architecture is designed completely different. It has added one new component : YARN and also updated HDFS and MapReduce component’s Responsibilities.

与Hadoop 1.x相比，Hadoop 2.x架构的设计完全不同。它添加了一个新组件：YARN，还更新了HDFS和MapReduce组件的职责。

Hadoop 2.x主要组件 (Hadoop 2.x Major Components)

Hadoop 2.x has the following three Major Components:

Hadoop 2.x具有以下三个主要组件：

HDFS
HDFS
YARN
纱
MapReduce
MapReduce

These three are also known as Three Pillars of Hadoop 2. Here major key component change is YARN. It is really game changing component in BigData Hadoop System.

这三个也被称为Hadoop 2的三大Struts。这里的主要关键组件更改是YARN。它确实是BigData Hadoop系统中改变游戏规则的组件。

Hadoop 2.x主要组件如何工作 (How Hadoop 2.x Major Components Works)

Hadoop 2.x components follow this architecture to interact each other and to work parallel in a reliable, highly available and fault-tolerant manner.

Hadoop 2.x组件遵循此架构相互交互，并以可靠，高度可用和容错的方式并行工作。

Hadoop 2.x Components High-Level Architecture

Hadoop 2.x组件高级架构

All Master Nodes and Slave Nodes contains both MapReduce and HDFS Components.
所有主节点和从节点都包含MapReduce和HDFS组件。
One Master Node has two components:
一个主节点包含两个组件：

Resource Manager(YARN or MapReduce v2)
资源管理器（YARN或MapReduce v2）
HDFS
HDFS

It’s HDFS component is also knows as NameNode. It’s NameNode is used to store Meta Data.

它的HDFS组件也称为NameNode。它的NameNode用于存储元数据。

In Hadoop 2.x, some more Nodes acts as Master Nodes as shown in the above diagram. Each this 2nd level Master Node has 3 components:
如上图所示，在Hadoop 2.x中，更多的节点充当主节点。每个此第二级主节点具有3个组件：
1. Node Manager
  节点管理器
2. Application Master
  应用大师
3. Data Node
  数据节点
Each this 2nd level Master Node again contains one or more Slave Nodes as shown in the above diagram.
如上图所示，每个第二级主节点又包含一个或多个从节点。
These Slave Nodes have two components:
这些从节点具有两个组成部分：
1. Node Manager
  节点管理器
2. HDFS
  HDFS
It’s HDFS component is also knows as Data Node. It’s Data Node component is used to store actual our application Big Data. These nodes does not contain Application Master component.

它的HDFS组件也称为数据节点。它的数据节点组件用于存储实际的应用程序大数据。这些节点不包含Application Master组件。

Hadoop 2.x Components In-detail Architecture

Hadoop 2.x组件详细架构

Hadoop 2.x Architecture Description

Hadoop 2.x架构描述
- Resource Manager: 资源经理：
- Resource Manager is a Per-Cluster Level Component.
  资源管理器是每个群集级别的组件。
- Resource Manager is again divided into two components:
  资源管理器再次分为两个组件：
1. Scheduler
  排程器
2. Application Manager
  应用经理
Resource Manager’s Scheduler is :
资源管理器的调度程序是：
1. Responsible to schedule required resources to Applications (that is Per-Application Master).
  负责调度应用程序（即每个应用程序的主控）所需的资源。
2. It does only scheduling.
  它仅调度。
3. It does care about monitoring or tracking of those Applications.
  它确实关心监视或跟踪那些应用程序。
Application Master:

应用程序主管：
Application Master is a per-application level component. It is responsible for:
Application Master是每个应用程序级别的组件。它负责：
1. Managing assigned Application Life cycle.
  管理分配的应用程序生命周期。
2. It interacts with both Resource Manager’s Scheduler and Node Manager
  它与资源管理器的调度程序和节点管理器交互
3. It interacts with Scheduler to acquire required resources.
  它与Scheduler交互以获取所需的资源。
4. It interacts with Node Manager to execute assigned tasks and monitor those task’s status.
  它与节点管理器交互以执行分配的任务并监视那些任务的状态。
Node Manager:

节点管理器：
Node Manager is a Per-Node Level component.
节点管理器是每个节点级别的组件。
It is responsible for:
它负责：
1. Managing the life-cycle of the Container.
  管理容器的生命周期。
2. Monitoring each Container’s Resources utilization.
  监视每个容器的资源利用率。
Container:

容器：
Each Master Node or Slave Node contains set of Containers. In this diagram, Main Node’s Name Node is not showing the Containers. However, it also contains a set of Containers.
每个主节点或从节点都包含一组容器。在此图中，主节点的名称节点未显示容器。但是，它也包含一组容器。
Container is a portion of Memory in HDFS (Either Name Node or Data Node).
容器是HDFS（名称节点或数据节点）中内存的一部分。
In Hadoop 2.x, Container is similar to Data Slots in Hadoop 1.x. We will see the major differences between these two Components: Slots Vs Containers in my coming posts.
在Hadoop 2.x中，容器类似于Hadoop 1.x中的数据槽。我们将在以后的文章中看到这两个组件之间的主要区别：插槽与容器。
NOTE:-

注意：-
- Resource Manager is Per-Cluster component where as Application Master is per-application component.
  资源管理器是每个群集组件，其中，作为应用程序主控是每个应用程序组件。
- Both Hadoop 1.x and Hadoop 2.x Architectures follow Master-Slave Architecture Model.
  Hadoop 1.x和Hadoop 2.x架构都遵循主从架构模型。
NOTE:-
Both Hadoop 1.x and 2.x Architecture posts (my previous post and this post) are still in progress. But you can read it once to get some idea. I’m going to do investigate about Hadoop 2 Architecture in detail and will update images and description accordingly on Monday.

注意：-
Hadoop 1.x和2.x Architecture帖子（我之前的帖子和这篇文章）仍在进行中。但是您可以阅读一次以获得一些想法。我将详细研究Hadoop 2 Architecture，并将在星期一相应地更新映像和描述。

That’s it all about Hadoop 2.x Architecture and How it’s Major Components work. Now we got some clear picture about both Hadoop 1.x and Hadoop 2.x systems.

Hadoop 2.x架构及其主要组件的工作原理就是这样。现在，我们对Hadoop 1.x和Hadoop 2.x系统有了清晰的了解。

It’s time to compare both Hadoop 1.x and Hadoop 2.x to find out: The major drawbacks of Hadoop 1.x, The Major benefits of Hadoop 2.x and Why They have redesigned complete Architecture. Please read my next post to get these useful information.

现在该对Hadoop 1.x和Hadoop 2.x进行比较了，以了解：Hadoop 1.x的主要缺点，Hadoop 2.x的主要优点以及为什么他们重新设计了完整的体系结构。请阅读我的下一篇文章以获取这些有用的信息。

Please drop me a comment if you like my post or have any issues/suggestions.

如果您喜欢我的帖子或有任何问题/建议，请给我评论。

翻译自: https://www.journaldev.com/8800/hadoop-architecture-yarn-hdfs-mapreduce