Ozone系列(一)整体架构

注:本文主要翻译自Ozone的官方文档。有兴趣的小伙伴可以自己看documentation。链接如下:

https://hadoop.apache.org/ozone/docs/1.0.0/concept/overview.html

介绍 Ozone 的整体架构,主要包括元数据层、数据层、协议、数据复制层和 Recon等概念,这些概念对于深入理解 ozone 的原理有极大帮助。

Ozone 是一个分布式、多副本的对象存储系统,并针对大数据场景进行了专门的优化。Ozone 主要围绕可扩展性进行设计,目标是十亿数量级以上的对象存储。

Ozone 通过对命名空间与块空间的管理进行分离,大大增加了其可扩展性,其中命名空间由 Ozone Manager (OM)管理,块空间由 Storage Container Manager(SCM)管理。

  • Ozone 由卷、桶和键组成。卷类似于个人主目录,只有管理员可以创建。
  • 卷用来存储桶,用户可以在一个卷中创建任意数量的桶,桶中包含键,在 Ozone 中通过键来存储数据。
  • Ozone 的命名空间由很多storage volumes组成,同时存储卷也用作storage accounting的管理。

下面的框图展示了 Ozone 的核心组件:

二、Ozone Manager

Ozone Manager(OM)管理 Ozone 的命名空间。

2.1 Ozone Manager 元数据

OM maintains a list of volumes, buckets, and keys. For each user, it maintains a list of volumes. For each volume, the list of buckets and for each bucket the list of keys.

OM维护了volume、bucket、key的列表。它为每个用户维护卷的列表,为每个卷维护桶的列表,为每个桶维护键的列表。

Ozone Manager will use Apache Ratis(A Raft protocol implementation) to replicate Ozone Manager state. This will ensure High Availability for Ozone.

OM 使用 Apache Ratis(Raft 协议的一种实现)来复制 OM 的状态,这为 Ozone 提供了高可用性保证。

2.2 Ozone Manager 和 Storage Container Manager

从写入键和读取键的过程来理解OM和SCM的关系。

  • 写入键:
    ①为了向 Ozone 中的某个卷下的某个桶的某个键写入数据,用户需要先向 OM 发起写请求,OM 会判断该用户是否有权限写入该键,如果权限许可,OM 分配一个块用于 Ozone 客户端数据写入。
    ②OM 通过 SCM 请求分配一个块(SCM 是数据节点的管理者),SCM 选择三个数据节点,分配新块并向 OM 返回块标识。
    ③OM 在自己的元数据中记录下块的信息,然后将块和块 token(带有向该块写数据的授权)返回给用户。
    ④用户使用块 token 证明自己有权限向该块写入数据,并向对应的数据节点写入数据。
    ⑤数据写入完成后,用户会更新该块在 OM 中的信息。

写入键的流程图:

  • 读取键:
    ① 键读取相对比较简单,用户首先向 OM 请求该键的块列表。
    ② OM 返回块列表以及对应的块 token。
    ③ 用户连接数据节点,出示块 token,然后读取键数据。

读取键的流程图:

2.3 Main components of the Ozone Manager

为了对OM有一个更细致的了解,本节介绍 provided network services (OM提供的网络服务)和 stored persisted data(存储的持久化数据)。

  • Network services provided by Ozone Manager:

Ozone为client和管理员命令提供了一个网络服务,主要的服务调用如下:

①Key, Bucket, Volume / CRUD
②Multipart upload (Initiate, Complete…)
。Supports upload of huge files in multiple steps
③FS related calls (optimized for hierarchical queries instead of a flat ObjectStore namespace)
GetFileStatus, CreateDirectory, CreateFile, LookupFile
④ACL related
----- Managing ACLs if internal ACLs are used instead of Ranger
⑤Delegation token (Get / Renew / Cancel)
⑥For security
⑦Admin APIs
----- Get S3 secret
----- ServiceList (used for service discovery)
----- DBUpdates (used by [Recon]

  • Persisted state

下面的数据被存储在OM端的特定的RocksDB 文件夹:

①Volume / Bucket / Key tables
-------This is the main responsibility of OM
-------Key metadata contains the block id (which includes container id) to find the data
②OpenKey table
-------for keys which are created, but not yet committed
③Delegation token table
-------for security
④PrefixInfo table
-------specific index table to store directory level ACL and to provide better performance for hierarchical queries

⑤S3 secret table
-------For S# secret management
⑥Multipart info table
-------Inflight uploads should be tracked
⑦Deleted table
⑧To track the blocks which should be deleted from the datanodes

2.4 值得注意的配置

三、Storage Container Manager

SCM 为 Ozone 集群提供了多种重要功能,包括:集群管理、证书管理、块管理和副本管理等。

SCM是块空间管理的领导节点,它的主要责任是:创建和管理containers(containers是OZone的主要复制单元)

3.1 Main responsibilities

Storage container manager provides multiple critical functions for the Ozone cluster. SCM acts as the cluster manager, Certificate authority, Block manager and the Replica manager.

SCM为OZone集群提供了多种关键功能,SCM充当集群管理者、证书管理、块管理和副本管理等。

SCM is in charge of creating an Ozone cluster. When an SCM is booted up via init command, SCM creates the cluster identity and root certificates needed for the SCM certificate authority. SCM manages the life cycle of a data node in the cluster.

SCM 负责创建一个 Ozone 集群,当通过 init 命令启动 SCM 时,SCM 会创建集群标识以及用于担任 CA 的根证书,SCM 负责集群中数据节点生命周期管理。

------①SCM is the block manager. SCM allocates blocks and assigns them to data nodes. Clients read and write these blocks directly.

SCM 管理 Ozone 中的块,它将块分配给数据节点,用户直接读写这些块。

------②SCM keeps track of all the block replicas. If there is a loss of data node or a disk, SCM detects it and instructs data nodes make copies of the missing blocks to ensure high availability.

SCM 会跟踪所有块副本的状态,如果检测到数据节点宕机或磁盘异常,SCM 命令其它节点生成丢失块的新副本,以此保证高可用。

------③ SCM’s Ceritificate authority is in charge of issuing identity certificates for each and every service in the cluster. This certificate infrastructure makes it easy to enable mTLS at network layer and the block token infrastructure depends on this certificate infrastructure.

SCM 的 CA 负责向集群中的每个服务颁发身份证书,证书设施方便了网络层 mTLS 协议的启用,也为块 token 机制提供了支持。

3.2 Main components

跟OM一节介绍的规则一样,本节介绍SCM提供的网络服务和存储的持久化数据:

  • Network services provided by Storage Container Manager:

①Pipelines: List/Delete/Activate/Deactivate
---------pipelines are set of datanodes to form replication groups
---------Raft groups are planned by SCM

②Containers: Create / List / Delete containers
③Admin related requests
④Safemode status/modification
⑤Replication manager start / stop
⑥CA authority service
⑦Required by other sever components
⑧Datanode HeartBeat protocol
-------------From Datanode to SCM (30 sec by default)
-------------Datanodes report the status of containers, node…
-------------SCM can add commands to the response

  • Persisted state:

The following data is persisted in Storage Container Manager side in a specific RocksDB directory

①Pipelines
-------Replication group of servers. Maintained to find a group for new container/block allocations.
②Containers
------Containers are the replication units. Data is required to act in case of data under/over replicated.
③Deleted blocks
------Block data is deleted in the background. Need a list to follow the progress.
④Valid cert, Revoked certs
⑤Used by the internal Certificate Authority to authorize other Ozone services

3.3 Notable configuration

image.png

四、Containers

Containers are the fundamental replication unit of Ozone/HDDS, they are managed by the Storage Container Manager (SCM) service.

Containers are big binary units (5Gb by default) which can contain multiple blocks:

Containers是Ozone/HDDS的基础副本单元,Containers被SCM服务所管理。
Containers是大的二进制单元(默认是5Gb),能够包含很多个块。

Blocks are local information and not managed by SCM. Therefore even if billions of small files are created in the system (which means billions of blocks are created), only of the status of the containers will be reported by the Datanodes and containers will be replicated.

Blocks是本地信息,并不被SCM所管理。因此尽管数在系统中创建了以亿计的小文件(这意味着创建了数以亿计的块),也只有containers的状态会被Datanodes报告,containers将会被复制。

When Ozone Manager requests a new Block allocation from the SCM, SCM will identify the suitable container and generate a block id which contains ContainerId + LocalId. Client will connect to the Datanode which stores the Container, and datanode can manage the separated block based on the LocalId.

当OM向SCM请求一个新的Blcok Allocation时,SCM将会找到合适的container然后生成一个block id, block id由ContainerId + LocalId组成。 客户端将会连接到存储Container的Datanode,同事datanode能管理分离的block基于LocalId。

4.1 Open vs. Closed containers

When a container is created it starts in an OPEN state. When it’s full (~5GB data is written), container will be closed and becomes a CLOSED container.
The fundamental differences between OPEN and CLOSED containers:

当一个container被创建了,它会起于OPEN状态。当它数据写满了(5GB数据被写入了),container会被关掉然后变成CLOSED container。 OPEN状态和CLOSED状态的container最基本的区别如下:

五、Datanodes

Datanodes是Ozone的worker bees(工蜂)。所有数据都存储在数据节点上。客户端按块写入数据。Datanode将这些块聚合到一个存储容器中。存储容器是关于客户端写入的数据块的数据流和元数据。

5.1 Storage Containers

A storage container is a self-contained super block. It has a list of Ozone blocks that reside inside it, as well as on-disk files which contain the actual data streams. This is the default Storage container format. From Ozone’s perspective, container is a protocol spec, actual storage layouts does not matter. In other words, it is trivial to extend or bring new container layouts. Hence this should be treated as a reference implementation of containers under Ozone.

storage container里面有很多Ozone blocks,以及磁盘文件(包含了实际的数据流)。这是默认的Storage container格式。从Ozone的视角来看,container是一个和特定协议相关的,实际的storage layouts并不重要。换句话说,扩展或者引入新的container layout是很繁琐的工作。

5.2 Understanding Ozone Blocks and Containers

When a client wants to read a key from Ozone, the client sends the name of the key to the Ozone Manager. Ozone manager returns the list of Ozone blocks that make up that key.

An Ozone block contains the container ID and a local ID. The figure below shows the logical layout out of Ozone block.

当一个client想要从Ozone中读一个key的内容时,client发送key的name给OM。 OM返回组成那个key的Ozone blocks。
一个Ozone块包含container ID 和 一个 local ID。

The container ID lets the clients discover the location of the container. The authoritative information about where a container is located is with the Storage Container Manager (SCM). In most cases, the container location will be cached by Ozone Manager and will be returned along with the Ozone blocks.

container ID让clients能够发现,关于一个container的位置认证信息和SCM在一起。在大多数情况,container location会被OM缓存起来并和Ozone blcoks一起返回。

Once the client is able to locate the container, that is, understand which data nodes contain this container, the client will connect to the datanode and read the data stream specified by Container ID:Local ID. In other words, the local ID serves as index into the container which describes what data stream we want to read from.

一旦client能够定位到container,也就是说,知道哪个数据节点含有这个container,那么这个client就会连接到这个数据节点并且读由Container ID:Local ID确定了的数据流。换句话说,local ID的作用是一个container内部的索引,用来描述我们想从什么数据读数据。

5.3 Discovering the Container Locations

How does SCM know where the containers are located ? This is very similar to what HDFS does; the data nodes regularly send container reports like block reports. Container reports are far more concise than block reports. For example, an Ozone deployment with a 196 TB data node will have around 40 thousand containers. Compare that with HDFS block count of million and half blocks that get reported. That is a 40x reduction in the block reports.

This extra indirection helps tremendously with scaling Ozone. SCM has far less block data to process and the namespace service (Ozone Manager) as a different service are critical to scaling Ozone.

SCM如何知道容器的位置?这和HDFS非常相似;数据节点定期发送container report(类似block report)。容器报告比块报告简洁得多。例如,一个拥有196 TB数据节点的Ozone deployment将拥有大约4万个容器。与之相比,HDFS报告的块数为150万个。这就是块报告减少了40倍。

这种额外的迂回策略帮助极大地扩展Ozone。SCM要处理的块数据要少得多,并且命名空间服务(OM)作为另一种服务对于扩展Ozone至关重要。

  • 0
    点赞
  • 9
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 2
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

叹了口丶气

觉得有收获就支持一下吧~

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值