1 介绍
1)概念
The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing.
Hadoop是一个可靠的,可扩展的,分布式的计算框架。
2)包含的组件
Hadoop Common: The common utilities that support the other Hadoop modules.
Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput access to application data.
Hadoop YARN: A framework for job scheduling and cluster resource management.
Hadoop MapReduce: A YARN-based system for parallel processing of large data sets.
- HDFS -> 分布式文件系统,用于数据存储
- YARN -> 工作调度和资源管理
- MapReduce -> 并行处理数据
3)应用
- 搜索引擎
- 日志分析
- 商业智能
- 数据挖掘
4)优势
- 高可靠性
- 数据多副本
- 重新调度计算
- 高扩展性
- 其他
- 使用廉价机器,降低成本
- 成熟的生态圈