《Docker源码分析》对照v1.12.6版本代码(1)：9.5.3 执行镜像下载

最新推荐文章于 2024-07-11 11:35:25 发布

sanyu.lh

最新推荐文章于 2024-07-11 11:35:25 发布

阅读量638

点赞数

分类专栏： docker

本文链接：https://blog.csdn.net/haolianglh/article/details/82684339

版权

docker 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

在Docker镜像下载环节实现Docker镜像的下载，源码位于/docker/distribution/pull_v1.go，如下：

-------------------------------------------------------------docker/distribution/pull_v1.go
func (p *v1Puller) Pull(ctx context.Context, ref reference.Named) error {
    ...
    if err := p.pullRepository(ctx, ref); err != nil {
		// TODO(dmcgowan): Check if should fallback
		return err
	}
    ...
}

func (p *v1Puller) pullRepository(ctx context.Context, ref reference.Named) error {
    ...
	repoData, err := p.session.GetRepositoryData(p.repoInfo)
    ...
	tagsList, err = p.session.GetRemoteTags(repoData.Endpoints, p.repoInfo)
    ...
	err := p.downloadImage(ctx, repoData, imgData, &layersDownloaded)
    ...
}

以上代码中pullRepository函数包含了镜像下载整个流程中的核心步骤，如下图所示，其中标红的是《分析》中也列出的函数：

图中所描述的是v1版本的puller的情况，图中各个环节的简要功能如下表所示。

函数名称	功能介绍
GetRepositoryData()	获取指定repository中所有镜像的ID信息
GetRemoteTags()	获取指定repository中所有的tag信息
pullImage()	从Docker Register下载Docker镜像
GetRemoteHistory()	获取指定image所有祖先镜像的ID信息
GetRemoteImageJSON()	获取指定image的json信息
GetRemoteImageLayer()	获取指定image的layer信息

《分析》中在分析pullReposity整个流程之前先介绍了一下调用者类型TagStore，但是v.1.12.6版本中调用者类型变成了v1Puller，如下：

---------------------------------------------------------/docker/distribution/pull_v1.go
type v1Puller struct {
	v1IDService *metadata.V1IDService
	endpoint    registry.APIEndpoint
	config      *ImagePullConfig
	repoInfo    *registry.RepositoryInfo
	session     *registry.Session
}

和以前的TagStore长相差别太大，暂时不明白v1Puller的原理，先跳过。

按《分析》的安排，下面重点分析pullRepository的整个流程。

1. GetRepositoryData

用户pull命令中指定了镜像的名称（或者还指定了tag），GetRepositoryData则获取到镜像名称所在的repository中所有image的ID信息。

在1.12.6版本代码中GetRepositoryData函数依然位于docker/registry/session.go中，其实就是使用http.NewRequest方法向https://index.docker.io/v1/repositories/ubuntu/images发出一个http GET请求，通过这个GET请求可以获得ubuntu这个repository中所有image的ID信息，然后利用一个RepositoryData结构来存放所有的image信息。

-------------------------------------------------------------docker/registry/session.go
func (r *Session) GetRepositoryData(name reference.Named) (*RepositoryData, error) {
	repositoryTarget :=
    fmt.Sprintf("%srepositories/%s/images", r.indexEndpoint.String(), name.RemoteName())
    //构造http GET请求，url由repositoryTarget给出
	req, err := http.NewRequest("GET", repositoryTarget, nil)
	if err != nil {
		return nil, err
	}
    //发出Get请求
	res, err := r.client.Do(req)
    ...
}

根据源码里写的注释：

-----------------------------------------------------------docker/registry/types.go
// RepositoryData tracks the image list, list of endpoints, and list of tokens
// for a repository
// RepositoryData存放一个repository的所有image的list，包括一个包含所有endpoints的list，
// 和所有tokens的list，但是现在tokens这个东西现在还没启用
type RepositoryData struct {
	// ImgList is a list of images in the repository
	ImgList map[string]*ImgData
	// Endpoints is a list of endpoints returned in X-Docker-Endpoints
	Endpoints []string
	// Tokens is currently unused (remove it?)
	Tokens []string
}

// ImgData is used to transfer image checksums to and from the registry
// ImgData结构用来将image的chunksums信息进行本地存储，其实不仅仅包含checksum信息，
// 不过GetRepositoryData这里除了chunksum信息就只获取了image的ID
type ImgData struct {
	// ID is an opaque string that identifies the image
	ID              string `json:"id"`
	Checksum        string `json:"checksum,omitempty"`
	ChecksumPayload string `json:"-"`
	Tag             string `json:",omitempty"`
}

这里面ImgList是一个map，key是每个ImgData的ID字段，value就是这个ID代表的imag的ImgData，里面包括image ID，Checksum，根据代码来看，暂时没有写入Tag，ImgData对象中依然只有属性ID与Checksum有内容；

另外还有个Endpoints的string类型slice，里面存放http响应header中X-Docker-Endpoints给出的所有endpoints的url。

//响应header中X-Docker-Endpoints给出所有endpoints，放到一个slice中
var endpoints []string
if res.Header.Get("X-Docker-Endpoints") != "" {
	endpoints, err = buildEndpointsList(res.Header["X-Docker-Endpoints"], r.indexEndpoint.String())
	if err != nil {
		return nil, err
	}
} else {
	// Assume the endpoint is on the same host
	endpoints =
  append(endpoints, fmt.Sprintf("%s://%s/v1/", r.indexEndpoint.URL.Scheme, req.URL.Host))
}
//响应正文中的内容解码到remoteChecksums中
remoteChecksums := []*ImgData{}
if err := json.NewDecoder(res.Body).Decode(&remoteChecksums); err != nil {
	return nil, err
}

//构造了一个更合适的结构imgsData来存放数据
// Forge a better object from the retrieved data
imgsData := make(map[string]*ImgData, len(remoteChecksums))
//取到的所有image的信息放到一个叫做imgsData的map中，key就是image ID
for _, elem := range remoteChecksums {
	imgsData[elem.ID] = elem
}

return &RepositoryData{
	ImgList:   imgsData,
	Endpoints: endpoints,
}, nil

总的来说，v1.12.6版本中GetRepositoryData函数的源码和1.12.0版本基本没有区别。

2.GetRemoteTags

用户使用pull命令下载镜像时除了指定镜像名称一般也会指定镜像的tag，比如docker pull ubuntu:14.04中14.04就是tag，如果用户不显示指定tag，则默认tag为latest，而FetRemoteTags的作用就是获取镜像名称所在repository中所有tag的信息。

v1.12.6版本源码里GetRemoteTags()函数的源码实现依然位于docker/registry/session.go文件中，从实现来看其实就是向类似https://index.docker.io/v1/repositories/library/ubuntu/tags这样的url发送Get请求。

// GetRemoteTags retrieves all tags from the given repository. It queries each
// of the registries supplied in the registries argument, and returns data from
// the first one that answers the query successfully. It returns a map with
// tag names as the keys and image IDs as the values.
func (r *Session) GetRemoteTags(registries []string, repositoryRef reference.Named) (map[string]string, error) {
	repository := repositoryRef.RemoteName()
    //如果repository字符串中没有携带"/"字符，则添加"library/"前缀
	if strings.Count(repository, "/") == 0 {
		// This will be removed once the registry supports auto-resolution on
		// the "library" namespace
		repository = "library/" + repository
	}
	for _, host := range registries {
        //构造获取tags的url，并用client.Get函数发出Get请求
		endpoint := fmt.Sprintf("%srepositories/%s/tags", host, repository)
		res, err := r.client.Get(endpoint)
		if err != nil {
			return nil, err
		}

	    result := make(map[string]string)
	    if err := json.NewDecoder(res.Body).Decode(&result); err != nil {
		    return nil, err
	    }
	    return result, nil
	}
}

得到的结果是一个map，key是tag name，而value则是image id。

pullRepository函数中调用完GetRemoteTags后，会去更新repoData里的kv对，将相应imgId对应的ImgData里的Tag字段补齐。当然是有可能repoData.ImgList中有的ImgData有Tag内容有的没有。补齐的方式则是将ImgLst[id]的指针指向新的含有Tag内容的ImgData对象。

---------------------------------------docker/distribution/pull_v1.go:func pullRepository()	
for tag, id := range tagsList {
	repoData.ImgList[id] = &registry.ImgData{
		ID:       id,
		Tag:      tag,
		Checksum: "",
	}
}

3.pullImage

以上，GetRepositoryData和GetRepositoryTags完成了对镜像名指定的Repository中所有image的ID和Tag信息的获取，接下来就要通过镜像ID来实际进行下载。

在pullRepository中，使用一个for循环来判断指定的Repository的所有image是否需要下载，其实也就是如果用户是指定了Tag的，则只下载用户指定的那个Tag的image，如果用户没有指定，则全部下载？这和我们常规认为的下载latest不一致啊？？？

for _, imgData := range repoData.ImgList {
	if isTagged && imgData.Tag != tagged.Tag() {
		continue
	}
	err := p.downloadImage(ctx, repoData, imgData, &layersDownloaded)
	if err != nil {
		return err
	}
}

其实在运行pullRepository之前，docker的其他模块已经事先为未给出tag的pull请求加上了默认的latest作为默认的tag，比如在文件reference.go中函数WithDefaultTag()

// WithDefaultTag adds a default tag to a reference if it only has a repo name.
func WithDefaultTag(ref Named) Named {
	if IsNameOnly(ref) {
		ref, _ = WithTag(ref, DefaultTag)
	}
	return ref
}

因此，这里pullRepository要么按照用户给出的tag下载指定的image，要么按照latest下载，总之肯定只会下载一个。

在v1.12.8版本代码中，pullImage函数并不由pullRepository直接调用，pullRepository函数实际调用的是downloadImage函数。而这个downloadImage函数其实就是在pullImage之前先用ValidateID()判断一下给出的image id是否合法，这个ValidateID()定义在文件docker/image/v/imagev1.go中，其实就是看image id是否符合正则表达式^([a-f0-9]{64})$。

然后就可以进入正体pullImage了。pullImage的定义就在docker/distrbution/pull_v1.go中。

第一步就是GetRemoteHistory来获取指定image及其所有祖先image的id。GetemoteHistory实现于docker/registry/session.go，实现非常简单，就是向https://index.docker.io/v1/repositories/images/“imgId”/ancestry这个地址发送Get请求，返回一个string类型的slice存储所有的祖先iamge id。

获取到所有image id后，对于history []string里存放的所有iamge id，按照从bottom到top的方式来依次下载它们的config文件。这个工作通过函数downloadLayerConfig函数完成，其实是GetRemoteImageJSON的一个马甲。

里面对GetRemoteImageJSON重试最多五次。而这个GetRemoteImageJSON中实现也是向一个url发送Get请求，在http response中header部分返回image的size，实体部分是json信息，把这两个东西作为返回值返回。

GetRemoteImageJSON返回后把这些信息组织一下，为每个iamge的json信息构造一个v1LayerDescriptor结构对象来记录image的各种信息。所有的v1LayerDescriptor结构都放在一个叫做descriptors的slice中。

接下来，就要利用descriptors来对image进行实质性的下载，也就是下载image的layer中包含的实质内容，按《分析》的说法就是函数GetRemoteImageLayer需要完成的工作。当然在v1.12.8中是调用了docker/distrbution/xfer/download.go文件中的一个函数进行了一个包裹。

4. Download

在pullImage中下载完了image的所有层的元数据并构造完layerDescriptor之后，调用Download函数对iamge的layer内容进行下载。这个Download函数是DownloadManger结构的一个方法成员。

// Iterate over layers, in order from bottom-most to top-most. Download
	// config for all layers and create descriptors.
	for i := len(history) - 1; i >= 0; i-- {
		v1LayerID := history[i]
		imgJSON, imgSize, err = p.downloadLayerConfig(v1LayerID, endpoint)

		// Create a new-style config from the legacy configs
		h, err := v1.HistoryFromConfig(imgJSON, false)
		newHistory = append(newHistory, h)
        //对于每个祖先层都会准备一个layerDescriptor对象
		layerDescriptor := &v1LayerDescriptor{
			v1LayerID:        v1LayerID,
			indexName:        p.repoInfo.Index.Name,
			endpoint:         endpoint,
			v1IDService:      p.v1IDService,
			layersDownloaded: layersDownloaded,
			layerSize:        imgSize,
			session:          p.session,
		}
        //把一个image的所有祖先层的descriptor（也包括自己的）都放在一个descriptors的slice里
		descriptors = append(descriptors, layerDescriptor)
	}

	rootFS := image.NewRootFS()
    //调用Download函数进行下载处理
	resultRootFS, release, err := p.config.DownloadManager.Download(ctx, *rootFS, descriptors, p.config.ProgressOutput)
	if err != nil {
		return err
	}

这个Download函数上文提到定义在docker/distrbution/xfer/download.go中，先看下代码里的注释

// Download is a blocking function which ensures the requested layers are
// present in the layer store. It uses the string returned by the Key method to
// deduplicate downloads. If a given layer is not already known to present in
// the layer store, and the key is not used by an in-progress download, the
// Download method is called to get the layer tar data. Layers are then
// registered in the appropriate order.  The caller must call the returned
// release function once it is is done with the returned RootFS object.

翻译过来就是：

Download函数是一个阻塞函数，确保请求的层存在于layer store仓库中。使用由一个key函数返回的字符串来避免重复下载。如果layer store中尚未存在指定的layer，并且key也没有对应的正在执行中的下载progress，才进一步调用Download方法获取该层的tar打包数据。然后按适当的顺序将layer注册。

所以避免重复下载的动作是在这个Download函数中实施的。

对于所有的descriptor，首先判断layerStore中是否存在对应image id，如果存在且未设置“ForcePull”则直接跳过，否则继续。

rootFS := initialRootFS
//对给出的所有layer的每一个descriptor一个一个进行处理
for _, descriptor := range layers {
	key := descriptor.Key()
	transferKey += key

	if !missingLayer {
		missingLayer = true
		diffID, err := descriptor.DiffID()
		if err == nil {
			getRootFS := rootFS
			getRootFS.Append(diffID)
            //查看layerStore中是否存在
			l, err := ldm.layerStore.Get(getRootFS.ChainID())
			logrus.Debugf("pulling image check chain id %s, diffID: %s, exist %v",
				getRootFS.ChainID(), diffID, err == nil)
            //设置了forcePull的话则需要下载
			if forcePull := os.Getenv("ForcePull"); forcePull == "y" && err == nil {
				logrus.Debugf("pulling image force change exist to false %s", diffID)
				err = fmt.Errorf("force pull image")
			}
			if err == nil {
                //未设置forecePull且存在，则continue跳过
				// Layer already exists.
				logrus.Debugf("Layer already exists: %s", descriptor.ID())
				progress.Update(progressOutput, descriptor.ID(), "Already exists")
				if topLayer != nil {
					layer.ReleaseAndLog(ldm.layerStore, topLayer)
				}
				topLayer = l
				missingLayer = false
				rootFS.Append(diffID)
				continue
			}
		}
	}

接下来就比较复杂拗口了

先看下有没有存在正在执行的同layer的download任务：

1.如果有，则执行makeDownLoadFuncFromDownload()返回的匿名函数，这个匿名函数的工作是，等待自己这一层和祖先层的下载完毕，然后注册本层；执行是通过Transfer()来执行的，Transfer先查看是否有同样的协程在做同样的工作，如果没有才调用上面那个匿名函数；

2.如果没有，则执行 makeDownloadFunc()返回的匿名函数，这个匿名函数的工作是，执行下载和注册工作，如果祖先层也在下载，需要等待祖先层下载完毕才能注册。

// Does this layer have the same data as a previous layer in
// the stack? If so, avoid downloading it more than once.
var topDownloadUncasted Transfer
if existingDownload, ok := downloadsByKey[key]; ok && usingCache {
    //返回的匿名函数给zferFunc，由Transfer根据情况来用协程执行
	xferFunc := ldm.makeDownloadFuncFromDownload(descriptor, existingDownload, topDownload)
	defer topDownload.Transfer.Release(watcher)
	// Transfer checks if a transfer matching the given key is in progress. If not,
	// it starts one by calling xferFunc. The caller supplies a channel which
	// receives progress output from the transfer.
	// 见transfer.go，会根据transferKey查看是否有相同的transferKey指向的任务在执行
	topDownloadUncasted, watcher = ldm.tm.Transfer(transferKey, xferFunc, progressOutput)
	topDownload = topDownloadUncasted.(*downloadTransfer)
	continue
}

// Layer is not known to exist - download and register it.
progress.Update(progressOutput, descriptor.ID(), "Pulling fs layer")

var xferFunc DoFunc
if topDownload != nil {
	xferFunc = ldm.makeDownloadFunc(descriptor, "", topDownload)
	defer topDownload.Transfer.Release(watcher)
} else {
	xferFunc = ldm.makeDownloadFunc(descriptor, rootFS.ChainID(), nil)
}
topDownloadUncasted, watcher = ldm.tm.Transfer(transferKey, xferFunc, progressOutput)
topDownload = topDownloadUncasted.(*downloadTransfer)
downloadsByKey[key] = topDownload

5.GetRemoteImageLayer

这是实际执行下载的函数，其注释：GetRemoteImageLayer retrieves an image layer from the registry，从registry取回image layer的内容。定义在docker/registry/session.go。

实际就是向https://index.docker.io/v1/repositories/images/“imgId”/layer这个url发送Get请求。

// GetRemoteImageLayer retrieves an image layer from the registry
func (r *Session) GetRemoteImageLayer(imgID, registry string, imgSize int64) (io.ReadCloser, error) {
	var (
		statusCode = 0
		res        *http.Response
		err        error
		imageURL   = fmt.Sprintf("%simages/%s/layer", registry, imgID)
	)
	req, err := http.NewRequest("GET", imageURL, nil)
	res, err = r.client.Do(req)
	return res.Body, nil
}

待续.....

sanyu.lh

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
《Docker源码分析》对照v1.12.6版本代码(1)：9.5.3 执行镜像下载

在Docker镜像下载环节实现Docker镜像的下载，源码位于/docker/distribution/pull_v1.go，如下：-------------------------------------------------------------docker/distribution/pull_v1.gofunc (p *v1Puller) Pull(ctx context.Con...
复制链接

扫一扫

专栏目录