Hadoop结构认识以及相关作用

Apache Hadoop是一个开源的分布式计算框架,包括HDFS、YARN和MapReduce。HDFS是高容错、低成本的分布式文件系统,提供高吞吐量的数据访问。YARN将资源管理和任务调度分离,实现更高效的集群资源利用。MapReduce则为大规模数据处理提供编程模型。这三个组件共同构建了Hadoop的大数据处理能力。
摘要由CSDN通过智能技术生成

Hadoop

官方介绍
http://hadoop.apache.org/

The Apache™ Hadoop® project develops open-source software for
reliable, scalable, distributed computing.
The Apache Hadoop software library is a framework that allows for the
distributed processing of large data sets across clusters of computers
using simple programming models. It is designed to scale up from
single servers to thousands of machines, each offering local
computation and storage. Rather than rely on hardware to deliver
high-availability, the library itself is designed to detect and handle
failures at the application layer, so delivering a highly-available
service on top of a cluster of computers, each of which may be prone
to failures.

hadoop是一个开源的软件库

  • 可靠
  • 可扩展
  • 分布式

主要模块

  • HDFS
  • MapReduce
  • Yarn

HDFS

介绍
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html

The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware. It has many similarities with existing distributed file systems. However, the differences from other distributed file systems are significant. HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware. HDFS provides high throughput access to application data and is suitable for applications that have large data sets. HDFS relaxes a few POSIX requirements to enable streaming access to file system data. HDFS was originally built as infrastructure for the Apache Nutch web search engine project. HDFS is part of the Apache Hadoop Core project. The project URL is http://hadoop.apache.org/.

总结几点

  • 高容错能力
  • 低硬件成本
  • 高吞吐量
  • 适用于大数据集

HDFS结构
在这里插入图片描述

  • Namenode
    • 一个HDFS集群只有一个Namenode 。
    • 管理命名空间及规范文件的访问。
  • Datanode
    • 一个HDFS存在多个Datanode。
    • 一个节点有一个Datanode。
    • 文件的读取,写入,删除,以及分块的添加和删除。

YARN

在这里插入图片描述
介绍
https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html

The fundamental idea of YARN is to split up the functionalities of resource management and job scheduling/monitoring into separate daemons. The idea is to have a global ResourceManager (RM) and per-application ApplicationMaster (AM). An application is either a single job or a DAG of jobs.

  • yarn将资源管理功能和任务调度/监控功能分开

组成

  • ResourceManager: 主要组件Scheduler and ApplicationsManager
    • Scheduler :给各种运行的程序分配资源
    • ApplicationsManager:根据Scheuler发布的一些任务,执行程序

MapReduce

Hadoop MapReduce is a software framework for easily writing applications which process vast amounts of data (multi-terabyte data-sets) in-parallel on large clusters (thousands of nodes) of commodity hardware in a reliable, fault-tolerant manner.

  • 计算框架,做集群数据处理的。
  • x1版本包括资源,x2分离出yarn解耦合
  • spark替代MapReduce
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

爱折腾的Albert

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值