Hadoop
The Apache Hadoop project develops open-source software for reliable, scalable, distributed computing.
主要组件:
- Hadoop Distributed File System (HDFS™)
A distributed file system that provides high-throughput access to application data.
https://storage.googleapis.com/pub-tools-public-publication-data/pdf/035fc972c796d33122033a0614bc94cff1527999.pdf
- Hadoop MapReduce
A YARN-based system for parallel processing of large data sets.
MapReduce: Simplified Data Processing on Large Clusters
https://storage.googleapis.com/pub-tools-public-publication-data/pdf/16cb30b4b92fd4989b8619a61752a2387c6dd474.pdf
其他相关论文:
https://ai.google/research/pubs/pub32721
https://ai.google/research/pubs/pub33004
https://ai.google/research/pubs/pub36249
https://ai.google/research/pubs/pub37796
https://dl.acm.org/citation.cfm?id=1327492
- Hadoop YARN
A framework for job scheduling and cluster resource management.
作业调度和集群资源管理
HBase
Apache HBase is the Hadoop database, a distributed, scalable, big data store.
Google’s Bigtable: A Distributed Storage System for Structured Data
https://storage.googleapis.com/pub-tools-public-publication-data/pdf/68a74a85e1662fe02ff3967497f31fda7f32225c.pdf
附录
https://ai.google/research/pubs/
中文版
请查看:
https://blog.csdn.net/w1104014017/article/details/47335187
更多资料
https://blog.csdn.net/qq947089960/article/details/81476968