The new stored format of Docker image on disk and Distribution

             
 
 

1 Introduction

自从Docker v1.10之后,Docker镜像在存储格式发生了很大的变化。主要是为了解决之前版本的镜像的一些问题:

(1)安全性。v1.10之前的image ID是随机生成的,与内容无关。这一方面很容易导致image ID被伪造,也容易引发冲突;

(2)image与layer没有严格的区别。每个layer都对应一个image ID和配置文件。这一方面导致image与layer的概念很模糊;另一方面,数据重复,考虑两个数据完全相同的layer,却是两个不同的layer。

为了解决上面的问题,新的格式下,image与layer是严格区分的。image ID与layer ID是两个不同的对象,生成的算法也不一样,但都是根据内容(content-addressable)生成的。正是这个区别导致了新的镜像存储在本地磁盘和registry端都发生了很大的变化。

2 Content hashes vs. distribution hashes

distribution 存储的layer是压缩文件,所以使用 hashes of compressed data ;而Docker在本地使用 content hashes ,即 hashes of uncompressed data 。为什么要做这种区分?

首先,对于registry,使用 hashes of compressed data ,维护简单,而且Docker在下载时,不需要解压就可以验证文件。但是,每个push生成的 distribution hashes 可能会不一样,因为同一个tar文件,每次压缩生成的文件可能会有区别。

另一方面,对于 docker builddocker savedocker loaduncompressed tars 更加适合,所以本地使用 hashes of uncompressed data

aaronlehmann这里 详细讨论了这些问题。

3 Distribution format

distribution 顶层有2个目录, repositories 存储image/tag相关的信息,blobs目录存储实际的layer数据:

# ls blobs  repositories

busybox:latest 为例:

# docker pull busybox:latest latest: Pulling from busybox c0a04912aa5a: Pull complete  a3ed95caeb02: Pull complete  Digest: sha256:ad28941632ea14c85dcac9fd2171599cd57381b40b4aac75f9ad67a3636d6658 Status: Downloaded newer image for busybox:latest  # docker images REPOSITORY                  TAG                 IMAGE ID            CREATED             SIZE busybox                    latest              9e301a362a27        13 months ago       364.3 MB

可以看到, busybox:latest 有2个layer: c0a04912aa5aa3ed95caeb02 为layer的 distribution hashsha256:ad28941632ea14c85dcac9fd2171599cd57381b40b4aac75f9ad67a3636d6658 为tag的digest; 9e301a362a27 为image ID。我们来看看这4个hash值在distribution端的存储情况。

  • tag digest
# ls repositories/busybox/_manifests/tags/latest/index/sha256/ ad28941632ea14c85dcac9fd2171599cd57381b40b4aac75f9ad67a3636d6658
  • repository’s layers

busybox 的所有layer:

# ls repositories/busybox/_layers/sha256/ 9e301a362a270bcb6900ebd1aad1b3a9553a9d055830bdf4cab5c2184187a2d1  c0a04912aa5afc0b4fd4c34390e526d547e67431f6bc122084f1e692dcb7d34e a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4

其中, 9e301a362a270bcb6900ebd1aad1b3a9553a9d055830bdf4cab5c2184187a2d1 对应image ID。

  • layer data
# ls blobs/ -R blobs/: sha256  blobs/sha256: 9e  a3  ad  c0  blobs/sha256/9e: 9e301a362a270bcb6900ebd1aad1b3a9553a9d055830bdf4cab5c2184187a2d1  blobs/sha256/9e/9e301a362a270bcb6900ebd1aad1b3a9553a9d055830bdf4cab5c2184187a2d1: data  blobs/sha256/a3: a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4  blobs/sha256/a3/a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4: data  blobs/sha256/ad: ad28941632ea14c85dcac9fd2171599cd57381b40b4aac75f9ad67a3636d6658  blobs/sha256/ad/ad28941632ea14c85dcac9fd2171599cd57381b40b4aac75f9ad67a3636d6658: data  blobs/sha256/c0: c0a04912aa5afc0b4fd4c34390e526d547e67431f6bc122084f1e692dcb7d34e  blobs/sha256/c0/c0a04912aa5afc0b4fd4c34390e526d547e67431f6bc122084f1e692dcb7d34e: data

可以看到,多了一个 ad28941632ea14c85dcac9fd2171599cd57381b40b4aac75f9ad67a3636d6658 的目录,保存了tag->(image ID, layers)的信息:

# cat blobs/sha256/ad/ad28941632ea14c85dcac9fd2171599cd57381b40b4aac75f9ad67a3636d6658/data   {    "schemaVersion": 2,    "mediaType": "application/vnd.docker.distribution.manifest.v2+json",    "config": {       "mediaType": "application/vnd.docker.container.image.v1+json",       "size": 1374,       "digest": "sha256:9e301a362a270bcb6900ebd1aad1b3a9553a9d055830bdf4cab5c2184187a2d1"    },    "layers": [       {          "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",          "size": 224153958,          "digest": "sha256:c0a04912aa5afc0b4fd4c34390e526d547e67431f6bc122084f1e692dcb7d34e"       },       {          "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",          "size": 32,          "digest": "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4"       }    ] }

从这个目录,可以得到tag对应的image ID,以及image所有的layer。

另外, 9e301a362a270bcb6900ebd1aad1b3a9553a9d055830bdf4cab5c2184187a2d1 目录保存了image的信息:

# cat blobs/sha256/9e/9e301a362a270bcb6900ebd1aad1b3a9553a9d055830bdf4cab5c2184187a2d1/data | python -mjson.tool {     "architecture": "amd64",      "config": {         "AttachStderr": false,          "AttachStdin": false,          "AttachStdout": false,          "Cmd": [             "sh"         ],          "Domainname": "",          "Entrypoint": null,          "Env": null,          "Hostname": "5f8e0e129ff1",          "Image": "cfa753dfea5e68a24366dfba16e6edf573daa447abf65bc11619c1a98a3aff54",          "Labels": null,          "OnBuild": null,          "OpenStdin": false,          "StdinOnce": false,          "Tty": false,          "User": "",          "Volumes": null,          "WorkingDir": ""     },      "container": "7f652467f9e6d1b3bf51172868b9b0c2fa1c711b112f4e987029b1624dd6295f",      "container_config": {         "AttachStderr": false,          "AttachStdin": false,          "AttachStdout": false,          "Cmd": [             "/bin/sh",              "-c",              "#(nop) CMD [/"sh/"]"         ],          "Domainname": "",          "Entrypoint": null,          "Env": null,          "Hostname": "5f8e0e129ff1",          "Image": "cfa753dfea5e68a24366dfba16e6edf573daa447abf65bc11619c1a98a3aff54",          "Labels": null,          "OnBuild": null,          "OpenStdin": false,          "StdinOnce": false,          "Tty": false,          "User": "",          "Volumes": null,          "WorkingDir": ""     },      "created": "2015-09-21T20:15:47.866196515Z",      "docker_version": "1.8.2",      "history": [         {             "created": "2015-09-21T20:15:47.433616227Z",              "created_by": "/bin/sh -c #(nop) ADD file:6cccb5f0a3b3947116a0c0f55d071980d94427ba0d6dad17bc68ead832cc0a8f in /"         },          {             "created": "2015-09-21T20:15:47.866196515Z",              "created_by": "/bin/sh -c #(nop) CMD [/"sh/"]"         }     ],      "os": "linux",      "rootfs": {         "diff_ids": [             "sha256:ae2b342b32f9ee27f0196ba59e9952c00e016836a11921ebc8baaf783847686a",              "sha256:5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef"         ],          "type": "layers"     } }

另外两个目录为layer tar data。

4 Pulling An Image

4.1 Pulling an Image Manifest

当docker拉取image时,首先会通过 Registry API 获取manifest信息:

GET /v2/<name>/manifests/<reference>

The name and reference parameter identify the image and are required. The reference may include a tag or digest.

# curl -L "http://10.x.x.x:5000/v2/busybox/manifests/latest" {    "schemaVersion": 1,    "name": "busybox",    "tag": "latest",    "architecture": "amd64",    "fsLayers": [       {          "blobSum": "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4"       },       {          "blobSum": "sha256:c0a04912aa5afc0b4fd4c34390e526d547e67431f6bc122084f1e692dcb7d34e"       }    ],    "history": [       {          "v1Compatibility": "{/"architecture/":/"amd64/",/"config/":{/"Hostname/":/"5f8e0e129ff1/",/"Domainname/":/"/",/"User/":/"/",/"AttachStdin/":false,/"AttachStdout/":false,/"AttachStderr/":false,/"Tty/":false,/"OpenStdin/":false,/"StdinOnce/":false,/"Env/":null,/"Cmd/":[/"sh/"],/"Image/":/"cfa753dfea5e68a24366dfba16e6edf573daa447abf65bc11619c1a98a3aff54/",/"Volumes/":null,/"WorkingDir/":/"/",/"Entrypoint/":null,/"OnBuild/":null,/"Labels/":null},/"container/":/"7f652467f9e6d1b3bf51172868b9b0c2fa1c711b112f4e987029b1624dd6295f/",/"container_config/":{/"Hostname/":/"5f8e0e129ff1/",/"Domainname/":/"/",/"User/":/"/",/"AttachStdin/":false,/"AttachStdout/":false,/"AttachStderr/":false,/"Tty/":false,/"OpenStdin/":false,/"StdinOnce/":false,/"Env/":null,/"Cmd/":[/"/bin/sh/",/"-c/",/"#(nop) CMD [///"sh///"]/"],/"Image/":/"cfa753dfea5e68a24366dfba16e6edf573daa447abf65bc11619c1a98a3aff54/",/"Volumes/":null,/"WorkingDir/":/"/",/"Entrypoint/":null,/"OnBuild/":null,/"Labels/":null},/"created/":/"2015-09-21T20:15:47.866196515Z/",/"docker_version/":/"1.8.2/",/"id/":/"bc315f8368c625be080de23e37c18bd7e555787987f907ccaf0b9bd1b69d5fe8/",/"os/":/"linux/",/"parent/":/"77106241d10a8ed96dc42f690453f89ec95890b1494ccac18bac69065e4b67fa/"}"       },       {          "v1Compatibility": "{/"id/":/"77106241d10a8ed96dc42f690453f89ec95890b1494ccac18bac69065e4b67fa/",/"created/":/"2015-09-21T20:15:47.433616227Z/",/"container_config/":{/"Cmd/":[/"/bin/sh -c #(nop) ADD file:6cccb5f0a3b3947116a0c0f55d071980d94427ba0d6dad17bc68ead832cc0a8f in //"]}}"       }    ],    "signatures": [       {          "header": {             "jwk": {                "crv": "P-256",                "kid": "LOC3:5XPT:EA5R:3TC3:BJT6:I3DM:FREB:PMXJ:JOPP:PCFT:L2IF:5N3H",                "kty": "EC",                "x": "FEMMDfyxC4WwfbsFe9qeCuFeK5QwgAxRYazrp1HgAQc",                "y": "EnOzI1hTixYK8g2_gklO2Zkvb7bP5gzCkEG_v-aOqx0"             },             "alg": "ES256"          },          "signature": "bZtcwb6n0cW4Ch9L3mpmD6HJX29CE7f7rdQPIEHlxlhX_6qCUK0qS6dIaM8yWlQPr7cQrAieINTWMc5cj898Eg",          "protected": "eyJmb3JtYXRMZW5ndGgiOjE5MTgsImZvcm1hdFRhaWwiOiJDbjAiLCJ0aW1lIjoiMjAxNi0xMC0yNVQwMjo1ODo1N1oifQ"       }    ] }
# curl -L "http://10.x.x.x:5000/v2/busybox/manifests/sha256:ad28941632ea14c85dcac9fd2171599cd57381b40b4aac75f9ad67a3636d6658" {    "schemaVersion": 2,    "mediaType": "application/vnd.docker.distribution.manifest.v2+json",    "config": {       "mediaType": "application/vnd.docker.container.image.v1+json",       "size": 1374,       "digest": "sha256:9e301a362a270bcb6900ebd1aad1b3a9553a9d055830bdf4cab5c2184187a2d1"    },    "layers": [       {          "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",          "size": 224153958,          "digest": "sha256:c0a04912aa5afc0b4fd4c34390e526d547e67431f6bc122084f1e692dcb7d34e"       },       {          "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",          "size": 32,          "digest": "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4"       }    ] }

4.2 Pulling a Layer

对应的接口为:

GET /v2/<name>/blobs/<digest>

Access to a layer will be gated by the name of the repository but is identified uniquely in the registry by digest.

示例:

# curl -L --progress "http://10.x.x.x:5000/v2/busybox/blobs/sha256:c0a04912aa5afc0b4fd4c34390e526d547e67431f6bc122084f1e692dcb7d34e" -o layer.tar  ######################################################################## 100.0% # sha256sum layer.tar  c0a04912aa5afc0b4fd4c34390e526d547e67431f6bc122084f1e692dcb7d34e  layer.tar

5 On-disk format

5.1 ID definitions and calculations

在讨论镜像在本地的存储之前,需要先理解几个ID及其对应的计算方法: layer.DiffIDlayer.ChainIDimage.ID :

ID schemeMeaningCalculation
layer.DiffID ID for an individual layer DiffID = SHA256hex(uncompressed layer tar data)
layer.ChainID ID for a layer and its parents. This ID uniquely identifies a filesystem composed of a set of layers. For bottom layer: ChainID(layer0) = DiffID(layer0)
For other layers: ChainID(layerN) = SHA256hex(ChainID(layerN-1) + " " + DiffID(layerN))
image.ID ID for an image. Since the image configuration references the layers the image uses, this ID incorporates the filesystem data and the rest of the image configuration. SHA256hex(imageConfigJSON)
“V1 ID1”legacy image/layer ID; originally not content-addressableCalculated for schema1 manifest:
For top layer: V1ID(layerTOP) = SHA256hex(blobsum(TOP) + " " + V1ID(layerTOP-1) + " " + imageConfigJSON)
For other layers: V1ID(layerN) = SHA256hex(blobsum(layerN) + " " + V1ID(layerN-1))

详细参考 Docker Image Specification v1.2.0ID definitions and calculations

示例:

# docker inspect 9e301a362a27 [     {         "Id": "sha256:9e301a362a270bcb6900ebd1aad1b3a9553a9d055830bdf4cab5c2184187a2d1",  ### image ID         "RepoTags": [             "busybox:latest"         ],         "RepoDigests": [ ### tag digest             "busybox@sha256:ad28941632ea14c85dcac9fd2171599cd57381b40b4aac75f9ad67a3636d6658"         ],         "Parent": "",         "Comment": "",         "Created": "2015-09-21T20:15:47.866196515Z",         "Size": 364348077,         "VirtualSize": 364348077,         "GraphDriver": {             "Name": "overlay",             "Data": {                 "RootDir": "/data/docker/overlay/390c9a37d1edca218961b4133c67992b6e62b1b1c1cdf820494ff4df5b9fce7c/root"             }         },         "RootFS": {             "Type": "layers",             "Layers": [  ###diff IDs                 "sha256:ae2b342b32f9ee27f0196ba59e9952c00e016836a11921ebc8baaf783847686a",                  "sha256:5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef"             ]         }     } ]
  • Image ID

Image config -> Image ID

func (p *v2Puller) pullSchema1(ctx context.Context, ref reference.Named, unverifiedManifest *schema1.SignedManifest) (imageID image.ID, manifestDigest digest.Digest, err error) {      config, err := v1.MakeConfigFromV1Config([]byte(verifiedManifest.History[0].V1Compatibility), &resultRootFS, history)     if err != nil {         return "", "", err     }      imageID, err = p.config.ImageStore.Create(config) ///image ID     if err != nil {         return "", "", err     } }  ///image/store.go func (is *store) Create(config []byte) (ID, error) { ///image config -> image ID     dgst, err := is.fs.Set(config)      imageID := ID(dgst) } 
  • layer.DiffID
///layer/layer_store.go ///ts: tar gzip file; parent: 上一层layer.ChainID func (ls *layerStore) Register(ts io.Reader, parent ChainID) (Layer, error) {     ///解压layer,同时计算DiffID     if err = ls.applyTar(tx, ts, pid, layer); err != nil {         return nil, err     } }  /// 解压的同时,计算diffID func (ls *layerStore) applyTar(tx MetadataTransaction, ts io.Reader, parent string, layer *roLayer) error {     digester := digest.Canonical.New() ///计算DiffID     tr := io.TeeReader(ts, digester.Hash())      tsw, err := tx.TarSplitWriter(true)     if err != nil {         return err     }     metaPacker := storage.NewJSONPacker(tsw)     defer tsw.Close()      // we're passing nil here for the file putter, because the ApplyDiff will     // handle the extraction of the archive     rdr, err := asm.NewInputTarStream(tr, metaPacker, nil)     if err != nil {         return err     }      applySize, err := ls.driver.ApplyDiff(layer.cacheID, parent, archive.Reader(rdr))     if err != nil {         return err     }      // Discard trailing data but ensure metadata is picked up to reconstruct stream     io.Copy(ioutil.Discard, rdr) // ignore error as reader may be closed      layer.size = applySize     layer.diffID = DiffID(digester.Digest()) } 
  • layer.ChainID
///image/rootfs_unix.go // RootFS describes images root filesystem // This is currently a placeholder that only supports layers. In the future // this can be made into an interface that supports different implementations. type RootFS struct {     Type    string         `json:"type"`     DiffIDs []layer.DiffID `json:"diff_ids,omitempty"` }  // ChainID returns the ChainID for the top layer in RootFS. func (r *RootFS) ChainID() layer.ChainID {     return layer.CreateChainID(r.DiffIDs) }   ///layer/layer.go // CreateChainID returns ID for a layerDigest slice func CreateChainID(dgsts []DiffID) ChainID {     return createChainIDFromParent("", dgsts...) }  func createChainIDFromParent(parent ChainID, dgsts ...DiffID) ChainID {     if len(dgsts) == 0 {         return parent     }     if parent == "" {         return createChainIDFromParent(ChainID(dgsts[0]), dgsts[1:]...)     }     // H = "H(n-1) SHA256(n)"     dgst := digest.FromBytes([]byte(string(parent) + " " + string(dgsts[0])))     return createChainIDFromParent(ChainID(dgst), dgsts[1:]...) } 
  • layer.cacheID

此外,还有一个 layer.cacheID 用于生成 graph driver 的目录:

///layer/ro_layer.go type roLayer struct {     chainID    ChainID     diffID     DiffID     parent     *roLayer     cacheID    string  ///用于graph driver     size       int64     layerStore *layerStore     descriptor distribution.Descriptor      referenceCount int     references     map[Layer]struct{} } 
///layer/layer_store.go func (ls *layerStore) Register(ts io.Reader, parent ChainID) (Layer, error) { ///...   // Create new roLayer   layer := &roLayer{     parent:         p,     cacheID:        stringid.GenerateRandomID(),     referenceCount: 1,     layerStore:     ls,     references:     map[Layer]struct{}{},     descriptor:     descriptor,   }    if err = ls.driver.Create(layer.cacheID, pid, "", nil); err != nil { ///创建graph driver目录     return nil, err   } ///... } 

5.2 Stored format

在本地,有2个与镜像存储相关的目录,以overlayfs为例, ROOT/image/overlayROOT/overlay

前者保存image的信息,包括 imagedblayerdb

# ls image/overlay/ distribution  imagedb  layerdb  repositories.json

后者用于存储实际的数据。更多参考 这里

  • imagedb

保存image的信息。

### image ID为目录名称 # ls image/overlay/imagedb/content/sha256/ 9e301a362a270bcb6900ebd1aad1b3a9553a9d055830bdf4cab5c2184187a2d1
  • layerdb

保存layer的信息。

### image/overlay/layerdb/sha256/保存每个layer的元数据信息,目录名称为layer的chainID # ls image/overlay/layerdb/sha256/ -R                                                             image/overlay/layerdb/sha256/: 75a46a4a46d9b53d8bbd70d52a26dc08858961f51156372edf6e8084ba9cfdb6  ae2b342b32f9ee27f0196ba59e9952c00e016836a11921ebc8baaf783847686a  image/overlay/layerdb/sha256/75a46a4a46d9b53d8bbd70d52a26dc08858961f51156372edf6e8084ba9cfdb6: cache-id  diff  parent  size  tar-split.json.gz  image/overlay/layerdb/sha256/ae2b342b32f9ee27f0196ba59e9952c00e016836a11921ebc8baaf783847686a: cache-id  diff  size  tar-split.json.gz

cache-id:

#### cache-id 为graph driver的目录名称 # cat image/overlay/layerdb/sha256/ae2b342b32f9ee27f0196ba59e9952c00e016836a11921ebc8baaf783847686a/cache-id    5b0bd80f386aebead9ee65c7d18172e6d937303bcc63490ef1378f92c1fb9a36  # cat image/overlay/layerdb/sha256/75a46a4a46d9b53d8bbd70d52a26dc08858961f51156372edf6e8084ba9cfdb6/cache-id  390c9a37d1edca218961b4133c67992b6e62b1b1c1cdf820494ff4df5b9fce7c  # ls overlay/ 390c9a37d1edca218961b4133c67992b6e62b1b1c1cdf820494ff4df5b9fce7c  5b0bd80f386aebead9ee65c7d18172e6d937303bcc63490ef1378f92c1fb9a36

diff:

### diff保存当前layer的diff ID # cat image/overlay/layerdb/sha256/ae2b342b32f9ee27f0196ba59e9952c00e016836a11921ebc8baaf783847686a/diff      sha256:ae2b342b32f9ee27f0196ba59e9952c00e016836a11921ebc8baaf783847686a ###chainID == diffID(bottom layer)  # cat image/overlay/layerdb/sha256/75a46a4a46d9b53d8bbd70d52a26dc08858961f51156372edf6e8084ba9cfdb6/diff  sha256:5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef

parent:

### parent保存上一层layer的chainID # cat image/overlay/layerdb/sha256/75a46a4a46d9b53d8bbd70d52a26dc08858961f51156372edf6e8084ba9cfdb6/parent  sha256:ae2b342b32f9ee27f0196ba59e9952c00e016836a11921ebc8baaf783847686a

6 Summary

新的存储格式涉及的ID比较多,理解了这些ID,也就理解了存储格式。这里简单总结一下:

(1)Image ID与Layer ID的计算方法不同。Image ID是在本地由Docker根据镜像的描述文件计算的,并用于imagedb的目录名称。Layer ID一般指 Distribution 根据layer compressed data计算的,并用于 Distribution 的存储目录名称。

(2) layer.DiffID 是在本地由Docker根据layer uncompressed data计算的。 layer.ChainID 只用本地,根据 layer.DiffID 计算,并用于layerdb的目录名称。

(3)layer.cacheID随机生成,用于graph driver的目录名称。



查看原文:http://www.zoues.com/2016/10/26/the-new-stored-format-of-docker-image-on-disk-and-distribution/
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值