分布式文件存储MinIO试用对比总结

 

介绍就不说了,官方有中文文档:

https://docs.min.io/cn

 

说点个人感受。

首先,MinIO是兼容Amazon S3的,换句话说,MinIO可以伪装成Amazon S3,你可以用Amazon S3的SDK操作MinIO。

 

MinIO支持多租户,但是却不支持动态扩展。因此,大租户,就单独搭一套MinIO吧。小租户倒是可以共用一套。

 

关于服务发现和动态扩展,作者的讨论如下:(重要)

 

To get a really resilient storage cloud, service discovery needs to be implemented.

Expected Behavior

Minio should be able to use some service discovery system, such as etcd or similar, and fetch siblings from there instead of needing to hardcode them in the cli of settings.

Current Behavior

Minio needs a hardcoded list of siblings: minio server http://minio1:9000/export http://minio2:9000/export

Possible Solution

minio server --etcd=http://etcd

... And watch for changes there.

Context

I'm trying to use the full poder of docker swarm, being able to scale up or down a replicated service, and exposing just one port to the network routing mesh.

Bucket and config discovery is a work in progress as part of federation work to discover many deployments.

Individual deployment units of Minio are always expected to be proper command line options - it is very much part of the design and this is not going to change.

May I ask why? It doesn't seem to be an extremely hard feature and it seems to align well with the project philosophy IMHO. Thanks.

May I ask why? It doesn't seem to be an extremely hard feature and it seems to align well with the project philosophy IMHO. Thanks.

It adds an unneeded dependency, the design philosophy of Minio is to keep the most common deployments simple and predictable. Minio is deployed in static deployment units and each such units are completely isolated from each other.

etcd is required only when all of these different deployment units need to federate which is why we will be introducing a way to manage multiple creds across many deployments, global bucket awareness etc.

But then, adding the node nº 101 to a 100-node cluster means changing CLI on the other 100, isn't it? While just letting nodes be autodiscovered is more a Plug&play approach, while still not having to remove the current feature set, of course.
In a big, dynamic, cloud deployment, this very little piece of the puzzle means a lot of manual work...
Or maybe I didn't understand clearly what you mean with "multiple creds across many deployments, global bucket awareness" 🤔

But then, adding the node nº 101 to a 100-node cluster means changing CLI on the other 100, isn't it? While just letting nodes be autodiscovered is more a Plug&play approach, while still not having to remove the current feature set, of course.

Minio doesn't allow adding 100 to 101 - there is no dynamic expansion. If you started with 100 it stays 100 forever, you can potentially start a new 100. This is one of the reason why we never require etcd.

etcd is only useful when there are dynamic floating entities that is the case for buckets, multiple credentials which will be provided in the federation.

Minio doesn't allow adding 100 to 101 - there is no dynamic expansion. If you started with 100 it stays 100 forever, you can potentially start a new 100. This is one of the reason why we never require etcd.

Oh Ok, I get it now clearly 😉

However that's a very weird design decision... Scalability is not on the project's roadmap? How can somebody know all the disk space he will need forever? And why pay for that since the start, when you still don't need it?

Certainly Minio's features are great, but this one decision feels like narrowing its purpose for non-serious business, don't you think?... 🤔

However that's a very weird design decision... Scalability is not on the project's roadmap? How can somebody know all the disk space he will need forever? And why pay for that since the start, when you still don't need it?

Certainly Minio's features are great, but this one decision feels like narrowing its purpose for non-serious business, don't you think?.

Making a single large PB sized volume where the disks and nodes are managed like a blackbox by the filesystem is quite scary. Any crash means we blew up all the tenants at once. 1000s of individual minio tenants means, I know when I add the million'th minio instance, it is not any more complex than the first instance of Minio. Provisioning with k8 or external orchestration tools is better than Minio's own resource management system. When it comes to the applications, objects are just represented as URLs. Some data sitting on Amazon S3 and some on Minio makes no difference to the application. With this principle in mind Minio is designed for scalability, you scale in smaller scalable units.

 

    简单的讲,MinIO作者认为的动态扩展,是MinIO集群的整体扩展(增加一套集群),而非在集群内增加节点。嗯,这显然是个偷懒的做法,作者也说了,动态增加节点会给MinIO带来非常大的复杂度,言下之意,MinIO只想做个小而美的存储系统。但是,如果只是做个简单存储系统,那么单节点其实就足够了,何必要做集群呢?如果单节点撑不住,就在新搭一套就行了。嗯,貌似很有道理,仔细想。但是支持集群也好吧,至少可以联合不同的主机工作,带动不同的磁盘,而且可以防止单点故障。

    不支持动态增加节点,这种设计到底对使用者有什么影响,我觉得需要考虑清楚。如果业务量陡增,则需要配合监控和一系列自动化工具,自动完成扩展,虽然这个过程有点复杂:首先要复制一套集群(包括自动分配机器,自动挂载磁盘,然后启动),然后的自动配置负载均衡,自动配置监控等,完成后,一套新的集群可用了,然后应用新上传文件要自动切换到这个新的集群上。总觉得还是很不方便。于是我又想到了一个更好的方案:

    首先提前准备一个备用MinIO集群,配置好,甚至启动好,放在那里,平时不用。当业务量陡增,原MinIO集群容量告警时,应用自动启用备份MinIO集群。只要解决了眼下的问题,后面可以整合两个集群的文件,整一个更大的集群。

    针对这一特点,我觉得MinIO的最佳使用方式为:

  1. 通常采用最小4节点部署,磁盘容量设置不宜过大和过小,根据业务评估适中即可。每个节点的IP挂负载均衡。

  2. 文件多的系统,单独使用一套,其他系统可以共用一套。

  3. 留一套容量适中的MinIO集群备用,在应用的配置中,可以根据指令,随时启用这个备集群。

    有了我这个方案,我觉得MinIO还是值得一用的,否则的话不支持动态扩容,确实是个硬伤啊

    然后,我大概看了一下,觉得MinIO社区还比较活跃,Contributors数量且活跃的也比较多。然后还支持缓存,虽然比较占容量,但是有清除机制也还好。当然,感觉如果深度使用的话,小问题可能还是会不少。

    其他的分布式存储,我接触得很少,也不好比较。就拿我熟悉的FastDFS来说吧,FastDFS很弱鸡,客户端很难用 ,小规模使用,服务端也还算稳定,但是遇到一些问题,几乎没办法解决,没人维护,作者也不更新。总之,有了这一年多的使用经验,我应该不会推荐任何团队在任何环境使用FastDFS。相反,MinIO是更好的选择。

    另外,还有一个叫SeaweedFS的分布式存储,也是Golang写的,我比较看好,参见我的这篇文章《分布式文件存储SeaweedFS试用对比总结》。

    另外,MinIO相关的对比和说明,参见我的这篇文章《分布式文件存储MinIO SeaweedFS FastDFS对比总结》。

 

 

 

分布式Minio是一个可以将多块硬盘组成一个对象存储服务的解决方案。它通过在不同的节点上分布硬盘,避免了单点故障的问题。分布式Minio严格遵守read-after-write一致性模型,确保所有读写操作的一致性。要启动一个分布式Minio实例,你只需要将硬盘位置作为参数传递给minio server命令,并在所有其他节点上运行相同的命令。所有运行分布式Minio的节点应共享一个共同的根凭证,以便节点之间的连接和信任。建议在执行Minio服务器命令之前,将root用户和root密码导出为环境变量MINIO_ROOT_USER和MINIO_ROOT_PASSWORD,并在所有节点上导出。如果没有导出,可以使用默认凭据minioadmin/minioadmin。Minio将创建每组2到16个驱动器的纠删码集。您提供的驱动器总数必须是这些数字之一的倍数。分布式Minio选择最大的EC集大小,并将其划分为给定的驱动器总数或节点总数,以确保均匀分布,即每个节点参与每组相同数量的驱动器。每个对象都被写入单个EC集,因此它们分布在不超过16个驱动器上。建议所有运行分布式Minio设置的节点是同质的,即操作系统相同、磁盘数量相同、网络互连相同。分布式Minio需要新的目录,如果需要,可以与其他应用程序共享驱动器。您可以通过使用Minio独有的子目录来实现此目的。例如,如果您已将卷安装在/export下,则可以将其作为参数传递给Minio服务器/export/data。运行分布式Minio实例的服务器之间的时间间隔应小于15分钟。为了确保服务器之间的时间同步,建议启用NTP服务。在Windows操作系统上运行分布式Minio被认为是实验性的,请谨慎使用。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值