Hadoop学习经验分享
书籍和paper
<Hadoop: The Definitive Guide>: 里面内容非常好,既有高屋建瓴,又有微观把握,比如mapreduce各个子阶段,经常问道join在里面也有代码实现,
google的三量马车,GFS, MapReduce, BigTable
入门:
知道MapReduce大致流程,map, shuffle, reduce
知道combiner, partition作用,设置compression
搭建hadoop集群,master/slave 都运行那些服务
HDFS,replica如何定位
版本0.20.2->0.20.203->0.20.205, 0.21, 0.23, 1.0
新旧API不同
进阶:
Hadoop 参数调优,cluster level: JVM, map/reduce slots, job level: reducer #,
memory, use combiner? use compression?
pig latin, Hive 简单语法
HBase, zookeeper 搭建
最新:
关注cloudera, hortonworks blog
next generation MR2框架
高可靠性, namenode: avoid single point of failure
数据流系统:streaming storm(twitter).
演练算法:
wordcount
terasort
字典同位词
翻译sql语句 select count(x) from a group by b;