MapReduce的过程:把原来的数据分成块,一条一条记录使用MAP函数生成键值对,以键值把把键值对归类形成集合,再把这些集合进行排序。
开发者定义四个过程:输入-》键值对,MAP,REDUCE,键值对=》输出
炼数成金
- hadoop不是数据库,因为它不提供数据库的基本功能
- **hadoop不适用实时计算,因为有时间差?什么是实时计算?分析股票行情出发一些动作,红绿灯调度问题
Pro Apache Hadoop
- namenode:metadata
- Configuration file
- default file and site file
- -
- secondary namenode: not backup, housekeeping
- merge edit and fsimage
- edits: accumulate the change since the last changepoint
- fsimage:last checkpoing
- fstime: contains the timestamp of the last checkpoint
- Task Tracker:
- accepts requests for task such as map, reduce ad shuffle
- slota= cores on the machine
- ???多处理器和多核的区别???
- hearbeat: tell whether healthy and how many free slots are available
- Job Tracker:
- schedule: close to the data block
- determin number of taks
- YERN
- the idea is to have a global resource manager and a per-application Application Master.
- components
- global resouece mannager
- primaly a schedular
- ensure uptimal cluster utilization
- node manager
- local resource manager
- slave service.
- take requests form resource manager and allocates containers to application
- eachnode has its own node manager
- application-specific application master
- is the key defferentiatorbetween the older MapReduce v1 framework and YARN
- each type has an application master
- improved scalability
- a more generic framework
- scheduler
- container
- CPU and memory
- -
- global resouece mannager