- 博客(13)
- 资源 (2)
- 收藏
- 关注
原创 Top K算法的实现
1. 选择排序每次选出最大的一个,共K次。复杂度:n + (n-1) + .. + (n-k+1) = (2n-k+1)*k/2 = o(kn)冒泡跟归并排序类似。2. 快速排序每次都采用快排的方法进行划分,之后迭代的出力Top K所属的那一部分,直到找到为止。复杂度:最好情况:n + n/2 + ... = 2n最坏情况:o(kn)桶排
2012-06-10 21:41:03 724
原创 Hadoop Mapreduce
What happens if the value list for 1 key is largerfor 1 reduce task?Before reduce, framework will sort it first. It should be able to be processed in memory. Otherwise, External Sorting is needed.
2012-06-09 16:06:12 281
原创 Hadoop in Action Note
MapReduce vs. Relational DataBasescale out vs. scale upkey/value pairs vs. relational tablesfunctional programming vs. declarative queryoffline batch processing vs. online transactions(MapReduce i
2012-06-09 15:31:44 396
原创 有用的IT站点
1. InfoQInfoQ(Information Queue)是一个时刻关注企业软件开发领域变化与创新的在线独立社区,读者受众群主要为技术架构师、技术团队带头人(高级开发人员)和项目经理等。通过由各个技术领域专家提供的最新的新闻、技术文章、视频访谈、视频演讲和迷你书等,InfoQ 为Java、.NET、Ruby、SOA、敏捷、架构和运维七大社区提供一流的资讯。Hadoop Topic
2012-05-26 10:09:44 325
原创 Hadoop各Release关系
0.20->0.20.1->0.20.2-+security and user limits+->0.20.203.x->0.20.204.x->0.20.205.x | /
2012-03-15 00:05:03 460
原创 Hadoop配置
Config Filescore-default.xml, hdfs-default.xml, mapred-default.xml located in root dir of corresponding jar file (in folder share/hadoop for 0.23.1)which is added to classpath by bin filescore-cit
2012-03-13 23:16:02 270
原创 The Goole File System笔记
Assumptions:built from many commodity machines and able to detect and recover from failuresStore modest number of large files and thus optimize for itPrimarily two kinds of readlarge streaming
2012-02-26 17:13:08 637
原创 利用程序中的局部性
将注意力集中在内部循环上,大部分计算与存储器访问都放生在那里按照数据在存储器的顺序读,是空间局部性最大一旦读入某数据,尽可能多的使用它,使时间局部性最大缓存命中率只是影响性能的一个重要因素,存储器访问数量也很重要,两者需要折中考虑 摘自《深入理解计算机系统》第6章
2012-02-25 23:40:59 613
原创 Distributed Sort via MapReduce vs. K路归并+快排
Distributed Sort via MapReduceMap function just output key+recordPartition immediate keys to R pieces and this R pieces is sorted partitions for the key value domain. This functions as bucket sort
2012-02-23 09:54:09 2284
原创 Google: MapReduce in a Week Note
1. Failure is the number one concern in distributed system designHardware failureSoftware failure Heisenbug: A bug that seems to disappear or alter its characteristics when it is observe
2012-02-21 22:37:59 409
原创 Hadoop学习笔记
1. Quick Start on MapReduceGoogle: MapReduce in a WeekMapReduce paper笔记The Goole File System笔记2. HadoopHadoop各Release关系Hadoop配置3. Map-Reduce应用场景MapReduce Patterns, Algorith
2012-02-21 22:16:37 304
Learning Spark
2019-03-06
空空如也
TA创建的收藏夹 TA关注的收藏夹
TA关注的人