The Google File System : part8 RELATED WORK and CONCLUSIONS and ACKNOWLEDGMENTS

最新推荐文章于 2020-02-08 16:02:59 发布

雙重底

最新推荐文章于 2020-02-08 16:02:59 发布

阅读量738

点赞数

分类专栏： Google GFS

本文链接：https://blog.csdn.net/Android_XiuChou/article/details/77841635

版权

Google 同时被 2 个专栏收录

18 篇文章 0 订阅

订阅专栏

GFS

8 篇文章 0 订阅

订阅专栏

8.RELATED WORK
Like other large distributed file systems such as AFS,GFS provides a location independent namespace which enables data to be moved transparently for load balance or fault tolerance.
Unlike AFS, GFS spreads a file’s data across storage servers in a way more akin to xFS and Swift in order to deliver aggregate performance and increased fault tolerance.
As disks are relatively cheap and replication is simpler than more sophisticated RAID approaches, GFS currently uses only replication for redundancy and so consumes more raw storage than xFS or Swift.
In contrast to systems like AFS, xFS, Frangipani , and Intermezzo , GFS does not provide any caching below the file system interface.
Our target workloads have little reuse within a single application run because they either stream through a large data set or randomly seek within it and read small amounts of data each time.
Some distributed file systems like Frangipani, xFS, Min-nesota’s GFS and GPFS remove the centralized server and rely on distributed algorithms for consistency and management.
We opt for the centralized approach in order to simplify the design, increase its reliability, and gain flexibility.

8 相关工作
像其他大型分布式文件系统（如AFS）一样，GFS提供了一个位置独立的命名空间，使数据可以透明地移动以实现负载平衡或容错。
与AFS不同，GFS以跨越存储服务器的方式扩展文件数据，更类似于xFS和Swift，以提供总体性能和增加的容错能力。
由于磁盘相对便宜，复制比更复杂的RAID方法更简单，因此GFS目前仅使用复制进行冗余，因此比xFS或Swift消耗更多的原始存储空间。
与AFS，xFS，Frangipani和Intermezzo等系统相反，GFS在文件系统界面下方不提供任何缓存。
我们的目标工作负载在单个应用程序运行中几乎没有重复使用，因为它们流过大数据集或随机地在其中寻找，并且每次读取少量数据。
一些分布式文件系统，如Frangipani，xFS，Min-nesota的GFS和GPFS，删除集中式服务器，并依靠分布式算法进行一致性和管理。
我们选择采用集中式方法，以简化设计，提高可靠性，增加灵活性。

In particular, a centralized master makes it much easier to implement sophisticated chunk placement and replication policies since the master already has most of the relevant information and controls how it changes.
We address fault tolerance by keeping the master state small and fully replicated on other machines.
Scalability and high availability (for reads) are currently provided by our shadow master mechanism.
Updates to the master state are made persistent by appending to a write-ahead log.
Therefore we could adapt a primary-copy scheme like the one in Harp to provide high availability with stronger consistency guarantees than our current scheme.
We are addressing a problem similar to Lustre in terms of delivering aggregate performance to a large number of clients.
However, we have simplified the problem significantly by focusing on the needs of our applications rather than building a POSIX-compliant file system.
Additionally, GFS assumes large number of unreliable components and so fault tolerance is central to our design.

特别地，集中式主机使得实现复杂的块布局和复制策略变得更加容易，因为主机已经具有大部分相关信息并控制其变化。
我们通过保持主状态小而完全复制在其他机器上来处理容错。
目前我们的影像主机机制提供了可扩展性和高可用性（用于读取）。
通过追加到预写日志，可以使主状态更新成为永久性。
因此，我们可以调整像Harp那样的主拷贝方案，以提供比现有方案更强的一致性保证的高可用性。
在将聚合性能提供给大量客户端方面，我们正在解决与Lustre类似的问题。
但是，我们通过专注于应用程序的需求而不是构建符合POSIX的文件系统来大大简化了问题。
此外，GFS假设大量不可靠的组件，因此容错是我们设计的核心。

GFS most closely resembles the NASD architecture .
While the NASD architecture is based on network-attached disk drives, GFS uses commodity machines as chunkservers, as done in the NASD prototype.
Unlike the NASD work, our chunkservers use lazily allocated fixed-size chunks rather than variable-length objects.
Additionally, GFS implements features such as rebalancing, replication, and recovery that are required in a production environment.
Unlike Minnesota’s GFS and NASD, we do not seek to alter the model of the storage device.
We focus on addressing day-to-day data processing needs for complicated distributed systems with existing commodity components.
The producer-consumer queues enabled by atomic record appends address a similar problem as the distributed queues in River .
While River uses memory-based queues distributed across machines and careful data flow control, GFS uses a persistent file that can be appended to concurrently by many producers.
The River model supports m-to-n distributed queues but lacks the fault tolerance that comes with persistent storage, while GFS only supports m-to-1 queues efficiently.
Multiple consumers can read the same file, but they must coordinate to partition the incoming load.

GFS与NASD架构非常相似。
NASD架构基于网络连接的磁盘驱动器，GFS使用商用机器作为块服务器，如NASD原型中所做的那样。
与NASD的工作不同，我们的chunkserver使用懒惰分配的固定大小的块，而不是可变长度的对象。
此外，GFS还实现了生产环境中所需的重新平衡，复制和恢复等功能。
与明尼苏达州的GFS和NASD不同，我们不会改变存储设备的型号。
我们专注于解决具有现有商品组件的复杂分布式系统的日常数据处理需求。
由原子记录附加启用的生产者 - 消费者队列解决与River中分布式队列类似的问题。
虽然River使用分布在机器上的基于内存的队列和仔细的数据流控制，GFS使用一个可以由许多生产者并发附加的持久性文件。
River模式支持m到n个分布式队列，但缺乏持久存储带来的容错性，而GFS只能有效地支持m到1队列。
多个消费者可以读取相同的文件，但是他们必须协调分配传入的负载。

9. CONCLUSIONS
The Google File System demonstrates the qualities essential for supporting large-scale data processing workloads on commodity hardware.
While some design decisions are specific to our unique setting, many may apply to data processing tasks of a similar magnitude and cost consciousness.
We started by reexamining traditional file system assumptions in light of our current and anticipated application workloads and technological environment.
Our observations have led to radically different points in the design space.
We treat component failures as the norm rather than the exception, optimize for huge files that are mostly appended to (perhaps concurrently) and then read (usually sequentially), and both extend and relax the standard file system interface to improve the overall system.
Our system provides fault tolerance by constant monitoring, replicating crucial data, and fast and automatic recovery.
Chunk replication allows us to tolerate chunkserver failures.
The frequency of these failures motivated a novel online repair mechanism that regularly and transparently repairs the damage and compensates for lost replicas as soon as possible.
Additionally, we use checksumming to detectdata corruption at the disk or IDE subsystem level, which becomes all too common given the number of disks in the system.
Our design delivers high aggregate throughput to many concurrent readers and writers performing a variety of tasks.
We achieve this by separating file system control, which passes through the master, from data transfer, which passes directly between chunkservers and clients.
Master involvement in common operations is minimized by a large chunk size and by chunk leases, which delegates authority to primary replicas in data mutations.
This makes possible a simple, centralized master that does not become a bottleneck.
We believe that improvements in our networking stack will lift the current limitation on the write throughput seen by an individual client.
GFS has successfully met our storage needs and is widely used within Google as the storage platform for research and development as well as production data processing.
It is an important tool that enables us to continue to innovate and attack problems on the scale of the entire web.

结论
Google文件系统展示了在商品硬件上支持大规模数据处理工作负载所必需的素质。
虽然一些设计决策是针对我们独特的设置，但许多设计决策可能适用于类似大小和成本意识的数据处理任务。
根据我们当前和预期的应用工作负载和技术环境，我们开始重新审视传统的文件系统假设。
我们的观察结果导致了设计领域的根本不同点。
我们将组件故障视为规范而不是异常，优化大部分附加到（可能并发）然后读取（通常是顺序）的大型文件，并扩展和放宽标准文件系统接口以改善整体系统。
我们的系统通过持续监控，复制关键数据，快速自动恢复来提供容错。
块复制允许我们容忍chunkserver故障。
这些失败的频率激发了一种新颖的在线修复机制，可以定期和透明地修复损坏，并尽快补偿丢失的副本。
此外，我们使用校验和来检测磁盘或IDE子系统级别的数据损坏，这在系统中的磁盘数量上变得非常普遍。
我们的设计为许多并发读者和作者执行各种任务提供了高总吞吐量。
我们通过分离通过主机的文件系统控制从数据传输中分离出来，从而直接在chunkserver和客户端之间传递数据。
主要参与常规操作由于大块大小和大块租约而被最小化，该代码授权数据突变中的主要副本。
这使得一个简单的，集中的主机成为可能不成为瓶颈。
我们认为，我们的网络堆栈的改进将提升单个客户端看到的写入吞吐量的当前限制。
GFS已经成功地满足了我们的存储需求，并被广泛应用于Google作为研发和生产数据处理的存储平台。
这是一个重要的工具，使我们能够继续创新和攻击整个网络规模的问题。

ACKNOWLEDGMENTS
We wish to thank the following people for their contributions to the system or the paper.
Brain Bershad (our shepherd) and the anonymous reviewers gave us valuable comments and suggestions.
Anurag Acharya, Jeff Dean, and David des-Jardins contributed to the early design.
Fay Chang worked on comparison of replicas across chunkservers.
Guy Edjlali worked on storage quota.
Markus Gutschke worked on a testing framework and security enhancements.
David Kramer worked on performance enhancements.
Fay Chang,Urs Hoelzle, Max Ibel, Sharon Perl, Rob Pike, and Debby Wallach commented on earlier drafts of the paper.
Many of our colleagues at Google bravely trusted their data to a new file system and gave us useful feedback.
Yoshka helped with early testing.

致谢
我们要感谢以下人员对系统或论文的贡献。
大脑Bershad（我们的牧羊人）和匿名评论者给了我们宝贵的意见和建议。
Anurag Acharya，Jeff Dean和David des-Jardins为早期设计做出了贡献。
Fay Chang主要从事比较副本的副本。
Guy Edjlali负责存储配额。
Markus Gutschke致力于测试框架和安全性的增强。
David Kramer致力于提升性能。
Fay Chang，Urs Hoelzle，Max Ibel，Sharon Perl，Rob Pike和Debby Wallach对本文的早期草案发表了评论。
Google的许多同事勇敢地将他们的数据信托到一个新的文件系统，并给了我们有用的反馈。
Yoshka帮助早期测试。