pluribus算法_E pluribus unum – OpenStack Swift清单对象

最新推荐文章于 2023-12-09 11:36:54 发布

cumei1658

最新推荐文章于 2023-12-09 11:36:54 发布

阅读量1.2k

点赞数

文章标签： python java 编程语言大数据 linux

原文链接：https://www.pybloggers.com/2016/04/e-pluribus-unum-openstack-swift-manifest-objects/

版权

pluribus算法

By default, the content of an OpenStack Swift object cannot be greater than 5 GB. However, you can use a number of smaller objects to construct a large object via the concept of segmentation. From OpenStack Large Object Support, “Segments of the larger object are uploaded and a special manifest file is created that, when downloaded, sends all the segments concatenated as a single object.” This “user manifest” design exists in order to provide a transparent download of large objects to the client and still provide the uploading client with a clean API to support segmented uploads.¹

默认情况下，OpenStack Swift对象的内容不能大于5 GB。但是，可以通过分段的概念使用多个较小的对象来构造较大的对象。从OpenStack大对象支持中，“将上载较大对象的段，并创建一个特殊的清单文件，下载该文件后，将所有并置为单个对象的段发送出去。” 存在这种“用户清单”设计是为了向客户端提供大型对象的透明下载，并且仍为上载的客户端提供干净的API以支持分段上载。 ^1个

While working with large data sets, we stumbled into a challenge around the exact mechanics to represent a 14Gb file as a singular entity within IBM® Object Storage for Bluemix®. This blog post shares what we learned about the creation of OpenStack Swift Manifest objects.

在处理大型数据集时，我们遇到了围绕精确机制的挑战，以将14Gb文件表示为IBM®Object Storage forBluemix®中的单个实体。这篇博客文章分享了我们从OpenStack Swift Manifest对象的创建中学到的知识。

Background: A 3rd party uploaded 61 separate files (segment files) into our IBM Bluemix Object Storage container, but failed to upload a corresponding manifest file. Instead, they shared a manifest file that outlined the details for each HTTP PUT request with no further context on what it was or how to use it. The contents of the file were similar to …

背景：第三方将61个单独的文件（段文件）上载到我们的IBM Bluemix Object Storage容器中，但是未能上载相应的清单文件。相反，他们共享一个清单文件，该文件概述了每个HTTP PUT请求的详细信息，而没有关于它是什么或如何使用的进一步上下文。该文件的内容类似于…

{‘path’: ‘/somecontainer/someprefix-NjT2OURYBq’, ‘etag’: ‘ebc7d0d4718d8513fd5cdcf76de66f2a’, ‘size_bytes’: 234003629}, {‘path’: ‘/somecontainer/someprefix-zVliDpHox4’, ‘etag’: ‘2814e177b9371770caf13902d6587373’, ‘size_bytes’: 234521937}, {‘path’: ‘/somecontainer/someprefix-5lHhJcyjEX’, ‘etag’: ‘843fbdfb493b484b035436e0bb782560’, ‘size_bytes’: 241395892}, {‘path’: ‘/somecontainer/someprefix-Q7xSsBprGK’, ‘etag’: ’05d09e28c8994cf5f9833c9dee6494a7′, ‘size_bytes’: 237095501}, {‘path’: ‘/somecontainer/someprefix-8pQIF4w1GR’, ‘etag’: ‘e0d912fc4b88961c33ecfe70e64a7855’, ‘size_bytes’: 226289048}, …

{‘path’: ‘/somecontainer/someprefix-NjT2OURYBq’, ‘etag’: ‘ebc7d0d4718d8513fd5cdcf76de66f2a’, ‘size_bytes’: 234003629},

{ ‘path’ : ‘/somecontainer/someprefix-NjT2OURYBq’ , ‘etag’ : ‘ebc7d0d4718d8513fd5cdcf76de66f2a’ , ‘size_bytes’ : 234003629 } ,

{‘path’: ‘/somecontainer/someprefix-zVliDpHox4’, ‘etag’: ‘2814e177b9371770caf13902d6587373’, ‘size_bytes’: 234521937},

{ ‘path’ : ‘/somecontainer/someprefix-zVliDpHox4’ , ‘etag’ : ‘2814e177b9371770caf13902d6587373’ , ‘size_bytes’ : 234521937 } ,

{‘path’: ‘/somecontainer/someprefix-5lHhJcyjEX’, ‘etag’: ‘843fbdfb493b484b035436e0bb782560’, ‘size_bytes’: 241395892},

{ ‘path’ : ‘/somecontainer/someprefix-5lHhJcyjEX’ , ‘etag’ : ‘843fbdfb493b484b035436e0bb782560’ , ‘size_bytes’ : 241395892 } ,

{‘path’: ‘/somecontainer/someprefix-Q7xSsBprGK’, ‘etag’: ’05d09e28c8994cf5f9833c9dee6494a7′, ‘size_bytes’: 237095501},

{ ‘path’ : ‘/somecontainer/someprefix-Q7xSsBprGK’ , ‘etag’ : ’05d09e28c8994cf5f9833c9dee6494a7′ , ‘size_bytes’ : 237095501 } ,

{‘path’: ‘/somecontainer/someprefix-8pQIF4w1GR’, ‘etag’: ‘e0d912fc4b88961c33ecfe70e64a7855’, ‘size_bytes’: 226289048},

{ ‘path’ : ‘/somecontainer/someprefix-8pQIF4w1GR’ , ‘etag’ : ‘e0d912fc4b88961c33ecfe70e64a7855’ , ‘size_bytes’ : 226289048 } ,

...

. . .

Our Challenge: Referencing 61 individual files within our IBM Bluemix Apache Spark Service Jupyter Notebook seemed wrong. We wanted to pull in the entirety of the data by referencing a single Openstack swift url (e.g. swift://foo/man/… ) and without having to re-upload the entire series of files again. We suspected that the provided manifest file would prove useful, but had difficulty finding easy steps on using it in conjunction with OpenStack Swift and the IBM Bluemix Object Storage service. We were largely ignorant of how OpenStack Large Object support worked and how to use OpenStack Swift Manifest Objects. Sooo … here is our journey in the spirit of sharing

我们的挑战 ：在我们的IBM Bluemix Apache Spark Service Jupyter Notebook中引用61个单独的文件似乎是错误的。我们希望通过引用单个Openstack swift网址（例如swift：// foo / man /…）来提取整个数据，而不必再次重新上传整个文件系列。我们怀疑提供的清单文件可能有用，但是很难找到与OpenStack Swift和IBM Bluemix Object Storage服务结合使用的简单步骤。我们在很大程度上不了解OpenStack大对象支持的工作方式以及如何使用OpenStack Swift清单对象。太棒了……这是我们本着分享精神的旅程

Options: IBM Object Storage for Bluemix provides you with access to a fully provisioned OpenStack Object Storage (Swift) account to manage your data. IBM Object Storage for Bluemix uses OpenStack Identity (Keystone) for authentication and can be accessed directly by using Swift Object Storage API v1 calls². OpenStack Large Object Support is enabled and available for the IBM Object Storage for Bluemix service. But don’t take my word for it … issuing a HTTP GET request to the /info endpoint [https://dal.objectstorage.open.softlayer.com/info] confirms this via the presence of a slo section. To support as many use cases as possible, OpenStack swift supports two (2) flavors:

选项：IBM Object Storage for Bluemix使您可以访问完全配置的OpenStack对象存储（Swift）帐户来管理数据。用于Bluemix的IBM Object Storage使用OpenStack身份验证（Keystone）进行认证，可以通过使用Swift Object Storage API v1调用²直接访问。 OpenStack大对象支持已启用，并且可用于IBM Object Storage for Bluemix服务。但是请不要相信我的意思……向/ info端点发出HTTP GET请求[ https://dal.objectstorage.open.softlayer.com/info ]通过slo节的存在来确认这一点。为了支持尽可能多的用例，OpenStack swift支持两（2）种形式：

Static Large Objects (SLO) – Relies on a user provided manifest file. Advantageous for use cases when the developer wants to “mashup” objects from multiple containers and reference them in a self-generated manifest file. This gives you immediate access to the concatenated object after the manifest is accepted. Uploading segments into separate containers provides the opportunity for improved concurrent upload speeds. On the downside, the concatenated object’s definition is frozen until the manifest is replaced.
Dynamic Large Objects (DLO) – Relies on a container-listing zero-byte manifest file. Advantageous for use cases when the developer might add/remove segments from the manifest at any time. A few disadvantages include reliance on eventual consistent container listings which means there may be some delay before access to the full concatenated object is available. There is also a requirement for all segments to be in a single container, which can limit concurrent upload speeds.

静态大对象 （SLO）–依赖于用户提供的清单文件。对于开发人员想要从多个容器中“混搭”对象并将其引用到自生成的清单文件中的用例而言是有利的。这样，在清单被接受后，您可以立即访问级联对象。将片段上传到单独的容器中提供了提高并发上传速度的机会。不利的一面是，连接对象的定义被冻结，直到替换清单为止。
动态大型对象 （DLO ）–依赖于容器列表的零字节清单文件。对于开发人员可以随时从清单中添加/删除细分的用例而言，这是有利的。一些缺点包括依赖最终一致的容器列表，这意味着在访问完整的串联对象之前可能会有一些延迟。还要求所有段都在单个容器中，这可能会限制并发上传速度。

Reader Tip: Consider jumping to the Easy Button section if time is short and you’re looking to solve the happy path (e.g. Need to upload a local >5 Gb file into IBM Bluemix Object Storage based on OpenStack swift).

读者提示 ：如果时间很短并且您正在寻找解决问题的路径，请考虑跳到“ 轻松按钮”部分（例如，需要基于OpenStack swift将本地> 5 Gb文件上传到IBM Bluemix Object Storage）。

Game Plan:

游戏计划 ：

Obtain/Identify an IBM Object Storage instance and gather credentials
Leverage credentials to determine Swift Object Storage API URL
Depending on desired flavor of large object storage, HTTP PUT appropriate manifest file.
Reference created manifest file to gain access to a concatenated representation of the file segments

获取/识别IBM Object Storage实例并收集凭证
利用凭证来确定Swift Object Storage API URL
根据大对象存储的期望风格，HTTP PUT适当的清单文件。
引用创建的清单文件以访问文件段的串联表示形式

Mechanics to Solve Our Challenge:

解决挑战的机制 ：

Instantiate/Inspect an IBM Object Storage for Bluemix service instance to confirm allocated storage resources and generated Keystone Authentication credentials. Specifically, we care about 3 values within credentials: {projectId}, {userId} and {password}. You can find these creds within the Bluemix Web UI under the Service Credentials section of the service …

Cloud Foundry (cf) Command Line Interpreter (CLI) …

$ cf service-keys {name_of_your_object_storage_service} Getting keys for service instance {name_of_your_object_storage_service} as {your_username}…

name Credentials-1

$ cf service-keys {name_of_your_object_storage_service} Credentials-1 Getting key Credentials-1 for service instance {name_of_your_object_storage_service} as {your_username}…

{

“auth_url”: “https://identity.open.softlayer.com”,

“domainId”: “nice_long_hex_value”,

“domainName”: “some_number”,

“password”: “not_gonna_tell_you”,

“project”: “object_storage_hex_value”,

“projectId”: “project_hex_value”,

“region”: “dallas”,

“userId”: “another_fine_hex_value”,

“username”: “some_text_with_hex_values” }

$ cf service–keys {name_of_your_object_storage_service}

Getting keys for service instance {name_of_your_object_storage_service} as {your_username}...

name

Credentials–1

$ cf service–keys {name_of_your_object_storage_service} Credentials–1

Getting key Credentials–1 for service instance {name_of_your_object_storage_service} as {your_username}...

{

“auth_url”: “https://identity.open.softlayer.com”,

“domainId”: “nice_long_hex_value”,

“domainName”: “some_number”,

“password”: “not_gonna_tell_you”,

“project”: “object_storage_hex_value”,

“projectId”: “project_hex_value”,

“region”: “dallas”,

“userId”: “another_fine_hex_value”,

“username”: “some_text_with_hex_values”

}

Step 1 Complete!

Execute a HTTP POST Request to {auth_url}/v3/auth/tokens which includes the credentials from Step #1 entered within the appropriate fields of the HTTP POST JSON body

This can be accomplished with a variety of tools ranging from Google Chrome Postman to curl.For example, …

$ curl -X POST -H “Content-Type: application/json” -H “Cache-Control: no-cache” -d ‘{ “auth”: { “identity”: { “methods”: [ “password” ], “password”: { “user”: { “id”: “another_fine_hex_value”, “password”: “not_gonna_tell_you” } } }, “scope”: { “project”: { “id”: “project_hex_value” } } } }’ “https://identity.open.softlayer.com/v3/auth/tokens”

$ curl –X POST –H “Content-Type: application/json” –H “Cache-Control: no-cache” –d ‘{

“auth”: {

“identity”: {

“methods”: [

“password”

“password”: {

“user”: {

“id”: “another_fine_hex_value”,

“password”: “not_gonna_tell_you”

}

“scope”: {

“project”: {

“id”: “project_hex_value”

}

}‘ “https://identity.open.softlayer.com/v3/auth/tokens”

This should result in a 500+ Line JSON Response BODY similar to …

Specifically, we want to identify the Swift Object Storage API url

https://dal.objectstorage.open.softlayer.com/v1/AUTH_some-hex-value

1	https://dal.objectstorage.open.softlayer.com/v1/AUTH_some-hex-value

linked to your desired object storage region (dallas, london, …) and associated with a public interface. This will be found within the endpoints section which includes the name “swift”. This is illustrated in the highlighted lines of the JSON Response body above. Even more importantly, within the generated HTTP Response Header of this /v3/auth/tokens call is an authentication token that we also need to record to facilitate subsequent authenticated HTTP API calls.

Here is a sample of the HTTP Response Headers

The X-Subject-Token is the important response header. Its value will be reused within all subsequent HTTP Request Headers using the header X-Auth-Token. Obvious, right?

Step 2 Complete!

Now for the payoff! As you’ll recall, our original problem pertained to 61 segmented files which had already been uploaded to a single container within our object storage service. We were also given a manifest file outlining the specific file paths, ETag values and file sizes. The availability of this file makes it very straight-forward to pursue creation of a Static Large Object (SLO). As a bonus, since the segments also honored a specific prefix naming convention and were co-located within a single container – we can also pursue creation of a Dynamic Large Object (DLO). Let’s walk through both approaches:

SLO

A carefully crafted HTTP PUT request needs to be made to the Swift Object Storage API Url which includes a valid X-Auth-Token request header, a query string parameter named multipart-manifest with an assigned value of “put” and a valid body containing an array of dict objects that represent a single manifest of all segmented files:

PUT /v1/AUTH_some-hex-value/name_of_any_existing_container/name_of_file_with_any_extension?multipart-manifest=put HTTP/1.1 Host: dal.objectstorage.open.softlayer.com Content-Type: text/csv X-Auth-Token: value-obtained-from-X-Subject-Token-Response-Header Cache-Control: no-cache

[{‘path’: ‘/somecontainer/someprefix-zVliDpHox4’, ‘etag’: ‘2814e177b9371770caf13902d6587373’, ‘size_bytes’: 234521937}, {‘path’: ‘/somecontainer/someprefix-5lHhJcyjEX’, ‘etag’: ‘843fbdfb493b484b035436e0bb782560’, ‘size_bytes’: 241395892}, {‘path’: ‘/somecontainer/someprefix-Q7xSsBprGK’, ‘etag’: ’05d09e28c8994cf5f9833c9dee6494a7′, ‘size_bytes’: 237095501}, {‘path’: ‘/somecontainer/someprefix-8pQIF4w1GR’, ‘etag’: ‘e0d912fc4b88961c33ecfe70e64a7855’, ‘size_bytes’: 226289048}, …]

PUT /v1/AUTH_some–hex–value/name_of_any_existing_container/name_of_file_with_any_extension?multipart–manifest=put HTTP/1.1

Host: dal.objectstorage.open.softlayer.com

Content–Type: text/csv

X–Auth–Token: value–obtained–from–X–Subject–Token–Response–Header

Cache–Control: no–cache

[{‘path’: ‘/somecontainer/someprefix-zVliDpHox4’, ‘etag’: ‘2814e177b9371770caf13902d6587373’, ‘size_bytes’: 234521937},

{‘path’: ‘/somecontainer/someprefix-5lHhJcyjEX’, ‘etag’: ‘843fbdfb493b484b035436e0bb782560’, ‘size_bytes’: 241395892},

{‘path’: ‘/somecontainer/someprefix-Q7xSsBprGK’, ‘etag’: ’05d09e28c8994cf5f9833c9dee6494a7′, ‘size_bytes’: 237095501},

{‘path’: ‘/somecontainer/someprefix-8pQIF4w1GR’, ‘etag’: ‘e0d912fc4b88961c33ecfe70e64a7855’, ‘size_bytes’: 226289048},

...]

or via curl …

If all goes well, an HTTP Response Code of 201 should be returned. To validate, you can open your IBM Bluemix Object Storage Service dashboard and observe creation of the “name_of_file_with_any_extension” manifest file within the name_of_any_existing_container. It should show an aggregated size which matches the sum of all segmented files. This new manifest file can now be singularly referenced and represents a collection of the 61 individual segment files. For example, within a Jupyter notebook we loaded the data using syntax similar to “swift://name_of_any_existing_container.spark/name_of_file_with_any_extension”. Sweet!

DLO

A carefully crafted HTTP PUT request needs to be made to the Swift Object Storage API Url which includes a valid X-Auth-Token request header, a required request header named X-Object-Manifest and an optional Content-Length request header with a value of 0:

PUT /v1/AUTH_some-hex-value/name_of_any_existing_container/name_of_file_with_any_extension HTTP/1.1 Host: dal.objectstorage.open.softlayer.com Content-Type: application/json X-Auth-Token: value-obtained-from-X-Subject-Token-Response-Header Content-Length: 0 X-Object-Manifest: name_of_container_which_holds_the_segmented_files/common_prefix_label_to_match_against_for_all_segmented_files Cache-Control: no-cache

PUT /v1/AUTH_some–hex–value/name_of_any_existing_container/name_of_file_with_any_extension HTTP/1.1

Host: dal.objectstorage.open.softlayer.com

Content–Type: application/json

X–Auth–Token: value–obtained–from–X–Subject–Token–Response–Header

Content–Length: 0

X–Object–Manifest: name_of_container_which_holds_the_segmented_files/common_prefix_label_to_match_against_for_all_segmented_files

Cache–Control: no–cache

or via curl …

If all goes well, an HTTP Response Code of 201 should be returned. To validate, this new zero-byte sized manifest file can now be singularly referenced and represents a collection of the 61 individual segment files. For example, within a Jupyter notebook we loaded the data using syntax similar to “swift://name_of_any_existing_container.spark/name_of_file_with_any_extension”. What’s really cool about this approach is that in the future we could choose to upload a 62nd segment file into the same container area and if we follow the common prefix label provided earlier within the X-Object-Manifest header – then our manifest will magically auto-include the new data with no additional editing of the manifest itself. Dynamic indeed!

Mission accomplished!

实例化/检查用于Bluemix 的IBM Object Storage服务实例，以确认分配的存储资源和生成的Keystone Authentication凭证。具体来说，我们关心凭证中的3个值： {projectId} ， {userId}和{password} 。您可以在Bluemix Web UI的“ 服务凭据”部分下找到这些凭据 …

或通过Cloud Foundry（cf）命令行解释器（CLI） …

$ cf service-keys {name_of_your_object_storage_service} Getting keys for service instance {name_of_your_object_storage_service} as {your_username}…

name Credentials-1

$ cf service-keys {name_of_your_object_storage_service} Credentials-1 Getting key Credentials-1 for service instance {name_of_your_object_storage_service} as {your_username}…

{

“auth_url”: “https://identity.open.softlayer.com”,

“domainId”: “nice_long_hex_value”,

“domainName”: “some_number”,

“password”: “not_gonna_tell_you”,

“project”: “object_storage_hex_value”,

“projectId”: “project_hex_value”,

“region”: “dallas”,

“userId”: “another_fine_hex_value”,

“username”: “some_text_with_hex_values” }

$ cf service – keys { name_of_your_object_storage_service }

Getting keys for service instance { name_of_your_object_storage_service } as { your_username } . . .

name

Credentials – 1

$ cf service – keys { name_of_your_object_storage_service } Credentials – 1

Getting key Credentials – 1 for service instance { name_of_your_object_storage_service } as { your_username } . . .

{

“auth_url” : “https://identity.open.softlayer.com” ,

“domainId” : “nice_long_hex_value” ,

“domainName” : “some_number” ,

“password” : “not_gonna_tell_you” ,

“project” : “object_storage_hex_value” ,

“projectId” : “project_hex_value” ,

“region” : “dallas” ,

“userId” : “another_fine_hex_value” ,

“username” : “some_text_with_hex_values”

}

步骤1完成！

对{auth_url} / v3 / auth / tokens执行HTTP POST请求，其中包括来自步骤＃1的凭据，该凭据在HTTP POST JSON主体的相应字段中输入

这可以通过从Google Chrome Postman到curl的各种工具来完成。例如，…

$ curl – X POST – H “Content-Type: application/json” – H “Cache-Control: no-cache” – d ‘ {

“auth” : {

“identity” : {

“methods” : [

“password”

] ,

“password” : {

“user” : {

“id” : “another_fine_hex_value” ,

“password” : “not_gonna_tell_you”

}

} ,

“scope” : {

“project” : {

“id” : “project_hex_value”

}

} ‘ “https://identity.open.softlayer.com/v3/auth/tokens”

这将导致500+行JSON响应正文，类似于…

具体来说，我们要确定Swift Object Storage API 网址

https://dal.objectstorage.open.softlayer.com/v1/AUTH_some-hex-value

1	https : //dal.objectstorage.open.softlayer.com/v1/AUTH_some-hex-value

链接到所需的对象存储区域（达拉斯，伦敦等），并与公共接口关联。这可以在端点部分找到，其中包括名称“ swift”。上面的JSON响应正文的突出显示的行中对此进行了说明。甚至更重要的是，在此/ v3 / auth / tokens调用的生成的HTTP响应标头中，还有一个身份验证令牌，我们还需要记录该身份验证令牌以方便后续的经过身份验证的HTTP API调用。

这是HTTP响应标头的示例

X-Subject-Token是重要的响应头。其值将使用标头X-Auth-Token在所有后续HTTP请求标头中重用。很明显吧？

步骤2完成！

现在为回报！您会记得，我们最初的问题与61个分段文件有关，这些文件已被上传到对象存储服务中的单个容器中。我们还提供了一个清单文件，概述了特定的文件路径，ETag值和文件大小。此文件的可用性使其非常容易进行静态大对象（SLO）的创建。另外，由于这些段还遵循特定的前缀命名约定，并且位于单个容器中，因此我们也可以追求创建动态大对象（DLO）。让我们逐步介绍两种方法：

斯洛

需要对Swift Object Storage API Url进行精心设计的HTTP PUT请求，其中包括有效的X-Auth-Token请求标头，名为multipart-manifest的查询字符串参数（分配值为“ put”）和包含以下内容的有效主体代表所有分段文件的单个清单的dict对象数组：

PUT / v1 / AUTH_some – hex – value / name_of_any_existing_container / name_of_file_with_any_extension ? multipart – manifest = put HTTP / 1.1

Host : dal . objectstorage . open . softlayer . com

Content – Type : text / csv

X – Auth – Token : value – obtained – from – X – Subject – Token – Response – Header

Cache – Control : no – cache

[ { ‘path’ : ‘/somecontainer/someprefix-zVliDpHox4’ , ‘etag’ : ‘2814e177b9371770caf13902d6587373’ , ‘size_bytes’ : 234521937 } ,

{ ‘path’ : ‘/somecontainer/someprefix-5lHhJcyjEX’ , ‘etag’ : ‘843fbdfb493b484b035436e0bb782560’ , ‘size_bytes’ : 241395892 } ,

{ ‘path’ : ‘/somecontainer/someprefix-Q7xSsBprGK’ , ‘etag’ : ’05d09e28c8994cf5f9833c9dee6494a7′ , ‘size_bytes’ : 237095501 } ,

{ ‘path’ : ‘/somecontainer/someprefix-8pQIF4w1GR’ , ‘etag’ : ‘e0d912fc4b88961c33ecfe70e64a7855’ , ‘size_bytes’ : 226289048 } ,

. . . ]

或通过卷曲...

如果一切顺利，则应返回HTTP响应代码201。为了进行验证，您可以打开IBM Bluemix Object Storage Service仪表板，并观察name_of_any_existing_container中“ name_of_file_with_any_extension”清单文件的创建。它应该显示与所有分段文件的总和匹配的汇总大小。现在可以单独引用此新清单文件，该文件表示61个单独的段文件的集合。例如，在Jupyter笔记本中，我们使用类似于“ swift：//name_of_any_existing_container.spark/name_of_file_with_any_extension”的语法加载数据。甜！

DLO

需要对Swift Object Storage API Url进行精心设计的HTTP PUT请求，其中包括有效的X-Auth-Token请求标头，必需的名为X-Object-Manifest的请求标头和可选的带有值的Content-Length请求标头的0：

PUT / v1 / AUTH_some – hex – value / name_of_any_existing_container / name_of_file_with_any_extension HTTP / 1.1

Host : dal . objectstorage . open . softlayer . com

Content – Type : application / json

X – Auth – Token : value – obtained – from – X – Subject – Token – Response – Header

Content – Length : 0

X – Object – Manifest : name_of_container_which_holds_the_segmented_files / common_prefix_label_to_match_against_for_all_segmented_files

Cache – Control : no – cache

或通过卷曲...

如果一切顺利，则应返回HTTP响应代码201。为了进行验证，现在可以单独引用这个新的零字节大小的清单文件，该文件代表61个单独的段文件的集合。例如，在Jupyter笔记本中，我们使用类似于“ swift：//name_of_any_existing_container.spark/name_of_file_with_any_extension”的语法加载数据。这种方法的真正妙处在于，将来我们可以选择将第62段文件上传到相同的容器区域，如果我们遵循前面X-Object-Manifest标头中提供的公共前缀标签，那么清单将神奇地自动-包括新数据，无需对清单本身进行额外的编辑。确实有动力！

任务完成！

Supporting Resources: Creating a special manifest to represent many segmented objects needn’t be hard within IBM Bluemix Object Storage. As we’ve seen, this provides the significant advantage of dealing with data that is larger than 5Gb in size – which is often the case for larger data workloads. However, keep in mind that manifest files can be created for segmented data files aggregating to any size. We’ve explored the pros and cons of creating Static or Dynamic Large Objects and shown the HTTP REST API mechanics to achieve either. Our team has created a Bash Script to help with segmentation of large files into specified chunk sizes while avoiding mid-line splits. We recommend reading the IBM Bluemix Object Storage documentation. We also encourage readers to learn about features found within the excellent Python OpenStack Swift Client, and more specifically the swift upload command.

支持资源 ：在IBM Bluemix Object Storage中创建一个特殊的清单来表示许多分段的对象并不是一件难事。如我们所见，这提供了处理大于5Gb的数据的显着优势-大型数据工作负载通常是这种情况。但是，请记住，可以为汇总为任意大小的分段数据文件创建清单文件。我们探讨了创建静态或动态大型对象的利弊，并展示了HTTP REST API的机制。我们的团队创建了一个Bash脚本，以帮助将大型文件分割为指定的块大小，同时避免中间行分割。我们建议阅读IBM Bluemix Object Storage 文档。我们还鼓励读者学习出色的Python OpenStack Swift Client中的功能，尤其是swift upload命令。

Easy Button: At this point, you may be wondering if there is a way to obtain a SLO manifest containing all of the segemented ETAG and size values in a JSON format or if the process is easier when the large file is available to you locally rather than our odd situation. The answer is an emphatic YES. The Python OpenStack Swift Client generally provides automatic manifest creation when uploading a single large file as illustrated below.

Easy Button ：此时，您可能想知道是否有一种方法可以获取包含JSON格式的所有分段ETAG和大小值的SLO清单，或者在本地可以使用大文件时是否更容易处理比我们奇怪的情况。答案是肯定的 。如下图所示，当上传单个大文件时，Python OpenStack Swift Client通常提供自动清单创建。

Example: Locally stored large file needs to be uploaded to Object StorageApproach: Use the Python Swift Client upload feature with appropriate arguments.

示例：需要将本地存储的大文件上载到对象存储方法：使用带有适当参数的Python Swift Client上传功能。

SLO:

SLO：

$ swift –os-auth-url=https://identity.open.softlayer.com/v3 –os-user-id=some_hex_value –os-password=”weird_characters” –os-project-id=another_hex_value –os-region-name=dallas -V 3 upload my_object_storage_container_name -S int_seg_size_in_bytes my_local_large_file_with_some_extension –use-slo

my_local_large_file_with_some_extension segment 3 my_local_large_file_with_some_extension segment 1 my_local_large_file_with_some_extension segment 2 my_local_large_file_with_some_extension segment 0 my_local_large_file_with_some_extension/1443450560.000000/160872806/52428800/00000002 my_local_large_file_with_some_extension/1443450560.000000/160872806/52428800/00000003 my_local_large_file_with_some_extension/1443450560.000000/160872806/52428800/00000001 my_local_large_file_with_some_extension/1443450560.000000/160872806/52428800/00000000 my_local_large_file_with_some_extension

$ swift —os–auth–url=https://identity.open.softlayer.com/v3 —os–user–id=some_hex_value —os–password=“weird_characters” —os–project–id=another_hex_value —os–region–name=dallas –V 3 upload my_object_storage_container_name –S int_seg_size_in_bytes my_local_large_file_with_some_extension —use–slo

$ swift — os – auth – url = https : / / identity .open .softlayer .com / v3 — os – user – id = some_hex_value — os – password = “weird_characters” — os – project – id = another_hex_value — os – region – name = dallas – V 3 upload my_object_storage_container_name – S int_seg_size_in_bytes my_local_large_file_with_some_extension — use – slo

my_local_large_file_with_some_extension segment 3

my_local_large_file_with_some_extension segment 1

my_local_large_file_with_some_extension segment 2

my_local_large_file_with_some_extension segment 0

my_local_large_file_with_some_extension/1443450560.000000/160872806/52428800/00000002

my_local_large_file_with_some_extension / 1443450560.000000 / 160872806 / 52428800 / 00000002

my_local_large_file_with_some_extension/1443450560.000000/160872806/52428800/00000003

my_local_large_file_with_some_extension / 1443450560.000000 / 160872806 / 52428800 / 00000003

my_local_large_file_with_some_extension/1443450560.000000/160872806/52428800/00000001

my_local_large_file_with_some_extension / 1443450560.000000 / 160872806 / 52428800 / 00000001

my_local_large_file_with_some_extension/1443450560.000000/160872806/52428800/00000000

my_local_large_file_with_some_extension / 1443450560.000000 / 160872806 / 52428800 / 00000000

my_local_large_file_with_some_extension

Two (2) things happen. A new container named my_object_storage_container_name_segments is created to hold the segmented files and a new manifest file named my_local_large_file_with_some_extension is generated. As discussed earlier, this manifest should show the aggregated size of all segments that it represents. If you’d like to grab a copy of this SLO manifest for additional hacking, version control or inspection … you’ll need to obtain a valid X-Auth-Token (described above) and issue a HTTP GET request with a modified query-string parameter of get:

两（2）件事发生。将创建一个名为my_object_storage_container_name_segments的新容器来保存分段文件，并生成一个名为my_local_large_file_with_some_extension的新清单文件。如前所述，此清单应显示其代表的所有分段的总大小。如果您想获取此SLO清单的副本以进行其他黑客攻击，版本控制或检查…，则需要获取有效的X-Auth-Token（如上所述），并发出带有修改后的查询的HTTP GET请求- get的字符串参数：

curl -X GET -H “Content-Type: text/csv” -H “X-Auth-Token: value-obtained-from-X-Subject-Token-Response-Header” -H “Cache-Control: no-cache” “https://dal.objectstorage.open.softlayer.com/v1/AUTH_some-hex-value/name_of_any_existing_container/name_of_file_with_any_extension?multipart-manifest=get”

curl –X GET –H “Content-Type: text/csv” –H “X-Auth-Token: value-obtained-from-X-Subject-Token-Response-Header” –H “Cache-Control: no-cache” “https://dal.objectstorage.open.softlayer.com/v1/AUTH_some-hex-value/name_of_any_existing_container/name_of_file_with_any_extension?multipart-manifest=get”

curl – X GET – H “Content-Type: text/csv” – H “X-Auth-Token: value-obtained-from-X-Subject-Token-Response-Header” – H “Cache-Control: no-cache” “https://dal.objectstorage.open.softlayer.com/v1/AUTH_some-hex-value/name_of_any_existing_container/name_of_file_with_any_extension?multipart-manifest=get”

DLO:

DLO：

my_local_large_file_with_some_extension segment 3 my_local_large_file_with_some_extension segment 1 my_local_large_file_with_some_extension segment 2 my_local_large_file_with_some_extension segment 0

$ swift — os – auth – url = https : / / identity .open .softlayer .com / v3 — os – user – id = some_hex_value — os – password = “weird_characters” — os – project – id = another_hex_value — os – region – name = dallas – V 3 upload my_object_storage_container_name – S int_seg_size_in_bytes my_local_large_file_with_some_extension

my_local_large_file_with_some_extension segment 3

my_local_large_file_with_some_extension segment 1

my_local_large_file_with_some_extension segment 2

my_local_large_file_with_some_extension segment 0

两（2）件事发生。将创建一个名为my_object_storage_container_name_segments的新容器来保存分段文件，并生成一个名为my_local_large_file_with_some_extension的新清单文件。如前所述，此清单是一个零字节大小的文件，代表遵循描述的命名前缀约定的位于单个容器中的所有文件。

Food for Thought

思想的食物

OpenStack manifests allow you to solve a problem like the one we faced … handling of pre-existing segmented uploads that are missing a manifest entry point within your Object Storage
OpenStack manifests allow you to shape a variety of alternative entry-points to represent varying sizes and composition of data segments. For example, the information within this article provides you with the steps to create a variety of manifest entry points that can represent one-half, one-quarter or even 1/61 of your large file dataset.
Python OpenStack Swift client is a great tool for basic uploading and segmentation of large files into IBM Bluemix Object Storage
Static Large Object (SLO) and Dynamic Large Object (DLO) each possess unique characteristics that should be carefully considered against your usecase. A manifest can be created for an aggregate of any size. There is NO requirement that the aggregate size of the segments be > 5Gb, it just happens to be the most common reason for needing a manifest.
Leveraging Swift Storlets, I stumbled across an interesting blog post that recommended SLOs as a great approach to facilitate Storlet use cases where they need to run on several objects. This was necessary because storlets currently only run on a single stream.

OpenStack清单可让您解决类似我们所面临的问题的问题…处理对象存储中缺少清单入口的现有分段上传
OpenStack清单允许您调整各种替代的入口点，以代表不同大小和组成的数据段。例如，本文中的信息为您提供了创建各种清单入口点的步骤，这些清单入口点可以代表大型文件数据集的一半，四分之一甚至是1/61。
Python OpenStack Swift客户端是将大型文件基本上传和分段到IBM Bluemix Object Storage中的绝佳工具
静态大对象（SLO）和动态大对象（DLO）各自具有独特的特征，应针对您的用例仔细考虑。可以为任何大小的集合创建清单。不需要分段的总大小> 5Gb，这恰好是需要清单的最常见原因。
利用Swift Storlets ，我偶然发现了一篇有趣的博客文章，其中推荐SLO作为一种便利的Storlet用例的好方法，这些用例需要在多个对象上运行。这是必要的，因为storlet当前仅在单个流上运行。

In conclusion, whether you need a 100% representation using the Python OpenStack Swift Client upload feature or a partial representation via the OpenStack Storage APIs to facilitate large data analysis and more efficient notebook designs with faster processing times, you’ll be able to access the right size of data for your task.

总之，无论您需要使用Python OpenStack Swift客户端上载功能的100％表示形式，还是需要通过OpenStack Storage API的部分表示形式来促进大数据分析和更高效的笔记本设计以及更短的处理时间，您都可以访问适合您任务的数据大小。

Early in my career, specialized in melting plastic and debating with ISO auditors. Later, tested software test tools – envision a person measuring rulers in a ruler factory. After a promotion, I managed a team great at breaking software. I was also the test organization’s performance expert, assessing application throughput/speed and recommending fixes to make applications go faster. Later on, I worked on gluing non-IBM and IBM software together and showing customers how easy it was to do. As a facilitator to support the CEO’s office, I organized studies for our executive leadership by gathering people and steering chats to look at disruptive technologies and see where new money could be made. I’m currently a member of the amazing IBM jStart team. We explore the “art of the possible”, have an aversion for saying “it can’t be done” and love learning through direct client engagement. My general focus has been on cloud-related emerging technologies facilitated by our Cloud Foundry based Platform as a Service (PaaS) – IBM Bluemix™ Within that framework, my current technology adventure is with Apache Spark, lightning fast cluster computing, for Big Data analytics. I’ve travelled the world and enjoy experiencing new ideas. Curiosity keeps me creating and consuming. “If it can be, I will try” – Me

在我职业生涯的早期，专门研究塑料熔化和与ISO审核员进行辩论。后来，经过测试的软件测试工具–设想在标尺工厂中测量标尺的人员。晋升后，我管理了一支擅长于破坏软件的团队。我还是测试组织的性能专家，评估应用程序的吞吐量/速度，并建议修复程序以使应用程序运行更快。后来，我致力于将非IBM软件和IBM软件粘合在一起，并向客户展示了这样做的难度。作为支持首席执行官办公室的推动者，我通过聚集人员和指导聊天来研究颠覆性技术，看看可以在哪里赚到新钱，从而组织了有关执行领导层的研究。我目前是惊人的IBM jStart团队的成员。我们探索“可能的艺术”，对“不可能完成”表示厌恶，并喜欢通过直接与客户互动来学习。我的主要重点是通过基于Cloud Foundry的平台即服务（PaaS）– IBM Bluemix™促进的与云有关的新兴技术。在该框架内，我目前的技术历程是使用Apache Spark（闪电般的快速集群计算）进行大数据分析。我环游世界，享受新想法。好奇心使我不断创造和消费。 “如果可以，我会尝试的” –我

翻译自: https://www.pybloggers.com/2016/04/e-pluribus-unum-openstack-swift-manifest-objects/

pluribus算法

cumei1658

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
pluribus算法_E pluribus unum – OpenStack Swift清单对象

pluribus算法By default, the content of an OpenStack Swift object cannot be greater than 5GB. However, you can use a number of smaller objects to construct a large object via the concept of segmentation...
复制链接

扫一扫