2012年02月_TigerYang414

06月 05月 04月 03月 02月

原创 The Goole File System笔记

Assumptions: built from many commodity machines and able to detect and recover from failuresStore modest number of large files and thus optimize for itPrimarily two kinds of read large streaming

2012-02-26 17:13:08 637

原创利用程序中的局部性

将注意力集中在内部循环上，大部分计算与存储器访问都放生在那里按照数据在存储器的顺序读，是空间局部性最大一旦读入某数据，尽可能多的使用它，使时间局部性最大缓存命中率只是影响性能的一个重要因素，存储器访问数量也很重要，两者需要折中考虑摘自《深入理解计算机系统》第6章

2012-02-25 23:40:59 613

原创 Distributed Sort via MapReduce vs. K路归并+快排

Distributed Sort via MapReduce Map function just output key+recordPartition immediate keys to R pieces and this R pieces is sorted partitions for the key value domain. This functions as bucket sort

2012-02-23 09:54:09 2284

原创 Google: MapReduce in a Week Note

1. Failure is the number one concern in distributed system design Hardware failureSoftware failure Heisenbug: A bug that seems to disappear or alter its characteristics when it is observe

2012-02-21 22:37:59 409

原创 Hadoop学习笔记

1. Quick Start on MapReduce Google: MapReduce in a Week MapReduce paper笔记 The Goole File System笔记 2. Hadoop Hadoop各Release关系 Hadoop配置 3. Map-Reduce应用场景 MapReduce Patterns, Algorith

2012-02-21 22:16:37 304

Spark 是一种与 Hadoop 相似的开源集群计算环境，但是两者之间还存在一些不同之处，这些有用的不同之处使 Spark 在某些工作负载方面表现得更加优越，换句话说，Spark 启用了内存分布数据集，除了能够提供交互式查询外，它还可以优化迭代工作负载。 Spark 是在 Scala 语言中实现的，它将 Scala 用作其应用程序框架。与 Hadoop 不同，Spark 和 Scala 能够紧密集成，其中的 Scala 可以像操作本地集合对象一样轻松地操作分布式数据集。

2019-03-06

The Google File System

Paper for The Google File System

2012-03-06

空空如也

TA创建的收藏夹 TA关注的收藏夹

TA关注的人

TigerYang414的专栏

原创 The Goole File System笔记

原创利用程序中的局部性

原创 Distributed Sort via MapReduce vs. K路归并+快排

原创 Google: MapReduce in a Week Note

原创 Hadoop学习笔记

Learning Spark

The Google File System

空空如也

原创 The Goole File System笔记

原创 利用程序中的局部性

原创 Distributed Sort via MapReduce vs. K路归并+快排

原创 Google: MapReduce in a Week Note

原创 Hadoop学习笔记

Learning Spark

The Google File System

空空如也

原创利用程序中的局部性