docker构建本地仓库_加快本地Docker构建-CSDN博客

docker构建本地仓库

Every day I build containers locally. Many of us do — Docker is a crucial tool in the developer arsenal these days. However, when you build and maintain 1–2 containers — that’s great, but when you maintain an app built from 10 containers? 15? 20? Each local build, even with cache enabled, will take a significant amount of time. Multiply that by a number of daily rebuilds — and you’ll see hours of your precious time leaking away!

我每天都在本地构建容器。我们许多人都这样做了-如今，Docker已成为开发人员手中的重要工具。但是，当您构建和维护1-2个容器时-很好，但是当您维护由10个容器构建的应用程序时？ 15吗20吗即使启用了缓存，每个本地构建也会花费大量时间。将其乘以每天的大量重建工作，您会发现数小时的宝贵时间流失了！

Sure, there’s CI/CD tools capable of parallel building, but what about local builds? I’ve spent some time googling for a solution to this small problem, and, to my surprise, wasn’t able to find one.

当然，有能够并行构建的CI / CD工具，但是本地构建又如何呢？我花了一些时间来搜寻这个小问题的解决方案，但令我惊讶的是，找不到任何解决方案。

So, I’ve thought I could make a useful tool for myself, plus a blog post for others :)

所以，我认为我可以为自己做一个有用的工具，再为其他人写一篇博客文章：)

Step 0. Let’s define the problem and requirements.

步骤0。让我们定义问题和要求。

I need an app, I’ll call it Krane, that is able to build multiple Docker images in parallel. There are a few special requirements for this app:

我需要一个名为Krane的应用程序，该应用程序可以并行构建多个Docker映像。此应用程序有一些特殊要求：

Build configuration must be persistent. I’d hate using CLI to describe 10–20–30 images. So, JSON or YAML build configuration would be good.
构建配置必须是持久的。我不喜欢使用CLI描述10–20–30图像。因此，JSON或YAML构建配置将是不错的选择。
Krane should take care of internal dependencies: if image2 depends on image1, it will guarantee that image2 will be built only after image1 is built.
Krane应该注意内部依赖性：如果image2依赖于image1，它将保证只有在image1构建之后才能构建image2。
Krane should be aware of build outcome: if one of the images fails the build — all remaining images should fail too.
Krane应该知道构建结果：如果其中一张图像无法构建-所有其余图像也应该失败。
Krane should be 100% compatible with Minikube, which I use for local development. However, this requirement will be satisfied automatically if I just use Docker executable.
Krane应该与我用于本地开发的Minikube 100％兼容。但是，如果我仅使用Docker可执行文件，此要求将自动得到满足。

Since this app needs concurrent execution, Python wouldn’t be the first tool to use for me. Golang will fit way better, I believe.

由于此应用需要并发执行，因此Python并不是我第一个使用的工具。我相信Golang会更好。

Step 1. Build configuration in a file.

步骤1.在文件中构建配置。

My new tool should be able to read build configuration from a file, passed as an argument into the app. Something like app -f configFile.yml.

我的新工具应该能够从文件中读取构建配置，并将其作为参数传递给应用程序。类似于app -f configFile.yml 。

Golang provides JSON support out of the box, without any external dependencies. However, since Docker/Kubernetes environment is YAML-centric, it would make sense to use YAML for build configuration as well.

Golang开箱即用地提供JSON支持，没有任何外部依赖关系。但是，由于Docker / Kubernetes环境是以YAML为中心的，因此也可以使用YAML进行构建配置。

Golang has no issues with YAML as well, there are multiple libraries providing YAML support. I prefer this one.

Golang的YAML也没有问题，有多个提供YAML支持的库。我喜欢这个。

type Image struct {
	ContainerName string `yaml:"containerName"`
	Dockerpath    string `yaml:"dockerpath"`
	ForbidCache   bool   `yaml:"noCache"`
}


type BuildConfiguration struct {
	Images  []Image `yaml:"build"`
	Threads int     `yaml:"threads"`
}


/*
	This function provides YAML deserialization of given byte slice
*/
func ParseBytes(conf []byte) (bc BuildConfiguration, err error) {
	err = yaml.Unmarshal(conf, &bc)
	if err == nil {
		SortImages(&bc)
	}
	return
}

Now, since I’m able to read random YAML files, time to make sure I’m able to pass the configuration file as a CLI argument.

现在，由于我能够读取随机的YAML文件，因此有时间确保我能够将配置文件作为CLI参数传递。

Golang has a builtin package for that as well: flag. Defining all the needed input arguments is really trivial:

Golang也有一个内置的包：flag。定义所有必需的输入参数确实很简单：

func main() {
	var configFile string
	var dryRun bool


	// parse configuration flags from command line
	flag.StringVar(&configFile, "f", "", "Path to build configuration file")
	flag.BoolVar(&dryRun, "d", false, "Don't run docker, only build and print sorted map")
	flag.Parse()


	// Exit if something is off
	_ = ValidatePath(configFile, true)


	// get configuration
	buildConfiguration, err := ParseFile(configFile)
	if err != nil {
		fmt.Printf("%v\n", err.Error())
		os.Exit(1)
	}

Now, something like krane -f config.yml is definitely going to work :)

现在，类似krane -f config.yml东西肯定会起作用:)

Step 2. Find dependencies within the build task.

步骤2.在构建任务中找到依赖项。

When you’re building a bunch of independent containers — dependencies tracking is not a problem — by definition. But if some images depend on other images within the task — it means that dependencies must be built before images that depend on them.

根据定义，当您构建一堆独立的容器时-依赖关系跟踪不是问题。但是，如果某些图像依赖于任务中的其他图像，则意味着必须先建立依赖关系，然后再依赖它们。

In other words: I have to search for internal dependencies before building anything. Thanks to Docker developers, the Dockerfile format is pretty straightforward: there’s a dedicated <FROM> keyword, so old good regular expressions will do the job.

换句话说：在构建任何东西之前，我必须搜索内部依赖项。感谢Docker开发人员，Dockerfile格式非常简单：有一个专用的< FROM >关键字，因此可以使用旧的良好正则表达式来完成这项工作。

/*
	This function scans Dockerfile, given as string with commands, and extracts image names it depends
*/
func findDockerDependencies(dockerfile string) (deps []string, err error) {
	re := regexp.MustCompile(`(?im)FROM (.*?[|$| |\n])`)
	substrings := re.FindAllStringSubmatch(dockerfile, -1)
	for _, v := range substrings {
		for i, dep := range v {
			// skip first match, since it's full match
			if i == 0 {
				continue
			}


			dep = strings.TrimSpace(dep)
			if !strings.Contains(dep, ":") {
				// if no tag given, assume we're on the latest tag then
				dep += ":latest"
			}


			deps = append(deps, dep)
		}
	}


	if len(deps) == 0 {
		err = fmt.Errorf("no docker dependencies found. wrong Dockerfile was passed in?")
	}


	return
}

When applied to every Dockerfile in the job, it’ll get me a full map of dependencies, where the key is a container name, and the value is a slice of containers it depends on.

当将其应用于作业中的每个Dockerfile时，它将获得完整的依赖关系图，其中的键是容器名称，值是其依赖的容器切片。

Step 3. Organizing the build process.

步骤3.组织构建过程。

The map of dependencies is good, but how can I use it to organize the build process? One of the simplest ways is to represent the build it as a sequence of sequences of independent build steps. Basically topological sort of the graph, when the outcome isn’t 1D sequence, but 2D instead, to allow parallelism.

依赖关系图很好，但是如何使用它来组织构建过程？最简单的方法之一是将其构建表示为一系列独立构建步骤的序列。图的基本拓扑结构，当结果不是1D序列而是2D时，允许并行。

It might sound tough, but it’s really trivial. Imagine the following algorithm:

听起来可能很难，但这确实是微不足道的。想象一下以下算法：

All independent containers are built first, in parallel. Let’s call it “Layer”.
首先并行构建所有独立的容器。我们称之为“层”。
Containers that depend on the previous layer are built in parallel.
依赖于上一层的容器是并行构建的。

The last step is repeated until all containers are built.

重复最后一步，直到构建完所有容器。

With this approach, the executor will dispatch individual build jobs to separate goroutines on each Layer. Once all jobs dispatched, the executor will wait until all jobs are finished, before switching to the next layer

通过这种方法，执行者将分派单独的构建作业，以在每个图层上分配独立的goroutine。一旦分配了所有作业，执行程序将等待直到所有作业完成，然后再切换到下一层

Step 4. Handling the outcome.

步骤4.处理结果。

The last requirement I have is the build state handling and transfer: if one of the jobs fails — it shouldn’t be silently swallowed. I must be aware of the problems as soon as they arise: there’s no sense waiting for the full build to finish if one of the jobs failed. So, early stopping would be a “really nice to have” feature.

我的最后一个要求是构建状态处理和传输：如果其中一项作业失败，则不应默默地吞下它。我必须尽快意识到问题的存在：如果其中一项工作失败，就没有理由等待完整的构建完成。因此，尽早停止将是一个“真的很高兴”的功能。

Luckily, Golang has channels for communications between goroutines, so each worker will get a channel for reading build jobs, and channel for reporting. The reporting channel will be used for tracking the outcome of each build.

幸运的是，Golang拥有goroutine之间的通信渠道，因此每个工作人员将获得一个读取构建作业的通道和一个报告通道。报告渠道将用于跟踪每个构建的结果。

// storage for the reports
	var failed []Report
	var succeed []Report


	// dispatch all jobs one by one
	jobsCounter := 0
	for i := 0; i < len(executableMap); i++ {
		dispatched := 0


		// each layer is an array of images
		layer, _ := executableMap[i]
		for _, image := range layer {
			workers[jobsCounter%config.Threads] <- image
			jobsCounter += 1
			dispatched++
		}


		// now, when all jobs on this layer were dispatched - wait for them to finish
		for i := 0; i < dispatched; i++ {
			report := <-requeue
			if !report.Success {
				failed = append(failed, report)
			} else {
				succeed = append(succeed, report)
			}
		}


		// do something better here?
		if len(failed) > 0 {
			return fmt.Errorf("At least %v out of %v jobs failed", len(failed), len(config.Images))
		}
	}

Final step. Comparing the apples.

最后一步。 比较苹果。

It’s time to see numbers. For the performance test, I’ve mastered pretty much realistic sample deployment: 4 containers building React apps (frontend part), 2 containers building Go apps (backend part), and an ML-deployment container (almost static one). I will compare build time twice: the first run with a no-cache option, and the second run without it.

现在该看看数字了。对于性能测试，我已经掌握了许多实际的示例部署：4个构建React应用程序的容器(前端部分)，2个构建Go应用程序的容器(后端部分)和ML部署容器(几乎是静态部署容器)。我将两次比较构建时间：第一次运行时使用无缓存选项，第二次运行时不使用该选项。

no-cache sequential build time:

无缓存顺序构建时间：

real 7m15,544s
user 0m5,941s
sys 0m7,696s

no-cache parallel build time:

无缓存并行构建时间：

real 2m46,595s
user 0m5,800s
sys 0m8,451s

partially cached sequential build time:

部分缓存的顺序构建时间：

real 2m57,410s
user 0m6,304s
sys 0m7,320s

partially cached parallel build time:

部分缓存的并行构建时间：

real 0m17,323s
user 0m6,111s
sys 0m7,808s

So, the relative speedup is somewhere between x2.5 and x10, which is just great for me: my typical builds are partially cached. I’ll save lots of my time using this small tool.

因此，相对速度在x2.5到x10之间，这对我来说很棒：我的典型构建已部分缓存。使用这个小工具，我会节省很多时间。

I hope you’ll find it useful too.

我希望您也会发现它有用。

Feel free to contact me if you have any questions :) As usual, the source code for this app is available on GitHub.

如果您有任何问题，请随时与我联系：)与往常一样，此应用程序的源代码可在GitHub上找到。