linux内核2.6.34将ceph加入到内核中，红帽基于ceph出了redhat ceph storage.
OpenStack Swift 作为稳定和高可用的开源对象存储被很多企业作为商业化部署，如新浪的 App Engine 已经上线并提供了基于 Swift 的对象存储服务，韩国电信的 Ucloud Storage 服务。
hdf全称是Hadoop distributed file system，是一个用java语言开发的分布式文件系统，有很好的伸缩性，支持10亿+的文件，上百PB数据，上千节点的集群。
* It had a single point of failure until the recent versions of HDFS
* It isn’t POSIX compliant
* It stores at least 3 copies of data
* It has a centralized name server resulting in scalability challenges
Hardware failure is the norm rather than the exception. An HDFS instance may consist of hundreds or thousands of server machines, each storing part of the file system’s data. The fact that there are a huge number of components and that each component has a non-trivial probability of failure means that some component of HDFS is always non-functional. Therefore, detection of faults and quick, automatic recovery from them is a core architectural goal of HDFS.
Streaming Data Access
Applications that run on HDFS need streaming access to their data sets. They are not general purpose applications that typically run on general purpose file systems. HDFS is designed more for batch processing rather than interactive use by users. The emphasis is on high throughput of data access rather than low latency of data access. POSIX imposes many hard requirements that are not needed for applications that are targeted for HDFS. POSIX semantics in a few key areas has been traded to increase data throughput rates.
Large Data Sets
Applications that run on HDFS have large data sets. A typical file in HDFS is gigabytes to terabytes in size. Thus, HDFS is tuned to support large files. It should provide high aggregate data bandwidth and scale to hundreds of nodes in a single cluster. It should support tens of millions of files in a single instance.
Simple Coherency Model
HDFS applications need a write-once-read-many access model for files. A file once created, written, and closed need not be changed. This assumption simplifies data coherency issues and enables high throughput data access. A MapReduce application or a web crawler application fits perfectly with this model. There is a plan to support appending-writes to files in the future.
“Moving Computation is Cheaper than Moving Data”
A computation requested by an application is much more efficient if it is executed near the data it operates on. This is especially true when the size of the data set is huge. This minimizes network congestion and increases the overall throughput of the system. The assumption is that it is often better to migrate the computation closer to where the data is located rather than moving the data to where the application is running. HDFS provides interfaces for applications to move themselves closer to where the data is located.
Portability Across Heterogeneous Hardware and Software Platforms
HDFS has been designed to be easily portable from one platform to another. This facilitates widespread adoption of HDFS as a platform of choice for a large set of applications.
|开源协议||LGPL version 2.1||Apache v2.0||Apache V2.0||?||?||?|
ceph vs minio
- 支持动态增加节点，自动平衡数据分布。（TODO，需要多长时间，add node时是否可以不间断运行）
Dynamic addition and removal of nodes are essential when all the storage nodes are managed by Minio server. Such a design is too complex and restrictive when it comes to cloud native application. Old design is to give all the resources to the storage system and let it manage them efficiently between the tenants. Minio is different by design. It is designed to solve all the needs of a single tenant. Spinning minio per tenant is the job of external orchestration layer. Any addition and removal means one has to rebalance the nodes. When Minio does it internally, it behaves like blackbox. It also adds significant complexity to Minio. Minio is designed to be deployed once and forgotten. We dont even want users to be replacing failed drives and nodes. Erasure code has enough redundancy built it. By the time half the nodes or drives are gone, it is time to refresh all the hardware. If the user still requires rebalancing, one can always start a new minio server on the same system on a different port and simply migrate the data over. It is essentially what minio would do internally. Doing it externally means more control and visibility.
We are planning to integrate the bucket name based routing inside the minio server itself. This means you can have 16 servers handle a rack full of drives (say few petabytes). Minio will schedule buckets to free 16 drives and route all operations appropriately