Hadoop生态圈:
Hive: support simple SQL-like language, and interpreted to Map-Reduce system. More used for long batched jobs
PIG: Yet another scripting language that is simpler than writeing MR
Impala: using SQL. directly access HDFS. Optimized, used for low latency queries
HBase: real time DB built on top of HDFS
Sqoop: get traditional DB data and put it to HDFS
Flume: simliar to sqoop
-----------------------------
HDFS:
将data分为默认64MB的chunk,存储于datanode中。每个chunk默认复制3份。 所有chunk的信息(怎么构成原来的data,复制的chunk在哪里)存储于namenode。
Namenode很重要,可以使用active/standby来保证availability
Map reduce:
有mapreduce任务时,每个data node有一个task track的daemon (程序) 负责任务