作者:翁松秀
Hadoop的基本介绍
Hadoop是由Apache基金会开发的分布式系统基础架构,使用户能在不了解分布式底层细节的情况下,进行分布式程序的开发。充分利用分布式集群存储和计算能力来实现用户的需求。
Hadoop架构最核心的两个设计师HDFS(Hadoop Distributed File System)和MapReduce。HDFS是Hadoop的分布式文件系统,部署在低廉的硬件集群上,实现了超大的存储能力。MapReduce是Hadoop的核心计算框架,一种由两个功能函数Map和Reduce实现的分布式计算框架,用于大数据的计算。
Hadoop的体系结构
Hadoop的主要模块
- Hadoop Common: The common utilities that support the other Hadoop modules.
- Hadoop Distributed File System (HDFS): A distributed file system that provides high-throughput access to application data.
- Hadoop YARN: A framework for job scheduling and cluster resource management.
- Hadoop MapReduce: A YARN-based system for parallel processing of large data sets.