[Paper Reading] Haystack

Beaver, D., Kumar, S., Li, H. C., Sobel, J., & Vajgel, P. (2010, October). Finding a Needle in Haystack: Facebook’s Photo Storage. In OSDI (Vol. 10, pp. 1-8).


Introduction

Haystack is Facebook’s file storage system which handles billions of images and more than 20 petabytes of data.

Environment:

  • Written once
  • Read often
  • Never modified
  • Rarely deleted

Disadvantage of POSIX based filesystem:
The per-file metadata is never used, which limits the read throughput, and CDNs must be used for reads.

Goal:

  • High throughput and low latency: at most one disk operation per read
  • Fault-tolerant: replicates each photo in geographically distinct locations
  • Cost-effective:
  • Simple

Typical design:
Typical Design

NFS-based design:
NFS-based Design

Design & Implementation

Serving a photo:
Serving a photo

Core components:

  • Haystack Store
  • Haystack Directory
  • Haystack Cache

http://<CDN>/<Cache>/<Machine id>/<Logical volume, Photo>

Uploading a photo:
Uploading a photo

Haystack Directory

  • Provides a mapping from logical volumes to physical volumes.
  • Load balances writes across logical volumes and reads across physical volumes.
  • Determines whether a photo request should be handled by the CDN or by the Cache.
  • Identifies those logical volumes that are read-only either because of operational reasons or because those volumes have reached their storage capacity.

Haystack Cache

A distributed hash table and use a photo’s id as the key.

It caches a photo only if:

  • the request comes directly from a user and not the CDN
  • the photo is fetched from a write-enabled Store machine

Why only write-enabled:

  • photos are most heavily accessed soon after they are uploaded
  • perform better when doing either reads or writes but not both

Haystack Store

Physical volume -> a very large file (100 GB) saved as /hay/haystack_<logical volumn id>

Store machine keeps:

  • open file descriptors
  • in-memory mapping of photo ids to the filesystem metadata (file, offset and size in bytes)

Physical volume: a large file consisting of a superblock followed by a sequence of needles (photo).

Layout of Haystack Store file

Explanation of fields in a needle

The use of the alternate key is due to Facebook’s historical reasons, since each image have different sizes.

Photo Read

Supplies:

  • logical volume id
  • key
  • alternate key
  • cookie (randomly assigned, eliminates attacks aimed at guessing valid URLs for photos)

Photo Write

The latest version of a needle within a physical volume is the one at the highest offset.

Photo Delete

Sets the delete flag.

The Index File

Used to reconstruct its in-memory mappings.

An index file for each of their volumes.

Layout of Haystack Index file

Explanation of fields in index file

Filesystem

Should use a filesystem that does not need much memory to be able to perform random seeks within a large file quickly.

XFS:

  • blockmaps for several contiguous large files can be small enough to be stored in the main memory
  • provides efficient
    • file preallocation
    • mitigating fragmentation
    • reining in how large block maps can grow

Recovery from failures

Detection:
maintain a background task, dubbed pitchfork

Repair:
Bulk sync

Optimizations

Compaction

Free up space from deleted photos. (Young photos are a lot more likely to be deleted)

Saving more memory

Batch upload

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值