bitbucket_缓存如何节省Bitbucket管道中的构建时间

最新推荐文章于 2024-11-22 10:24:55 发布

weixin_26748959

最新推荐文章于 2024-11-22 10:24:55 发布

阅读量308

点赞数

文章标签： python java 大数据缓存

原文链接：https://levelup.gitconnected.com/how-caching-can-save-build-minutes-in-bitbucket-pipelines-219d310ab277

版权

bitbucket

In this post I will help you learn how to properly set up caching in Bitbucket Pipelines. This will help you to speed up the build time of your pipelines, so you can deliver faster and spend fewer build minutes. I will also share another thought about how to speed up your pipelines.

在这篇文章中，我将帮助您学习如何在Bitbucket Pipelines中正确设置缓存。这将帮助您加快管道的构建时间，从而可以更快地交付并且花费更少的构建时间。我还将分享有关如何加快生产流程的另一种想法。

一般而言，什么是缓存？ (What is Caching in General?)

Caching (pronounced “cashing”) is the process of storing data in a cache.

缓存(发音为“现金”)是将数据存储在缓存中的过程。

A cache is a temporary storage area. For example, the files you automatically request by looking at a Web page are stored on your hard disk in a cache subdirectory under the directory for your browser. When you return to a page you’ve recently looked at, the browser can get those files from the cache rather than the original server, saving you time and saving the network the burden of additional traffic.

缓存是一个临时存储区。例如，您可以通过查看网页自动请求的文件是存储在硬盘上的高速缓存中的子目录目录为你的浏览器。当您返回到最近查看过的页面时，浏览器可以从缓存而不是原始服务器中获取这些文件，从而节省了时间并减轻了网络额外流量的负担。

Source: whatis.techtarget.com

资料来源： whatis.techtarget.com

Bitbucket管道中可以缓存什么？ (What can be Cached in Bitbucket Pipelines?)

Bitbucket Pipelines is able to cache external build dependencies and directories, such as 3rd-party libraries, between builds providing faster builds, and reducing the number of consumed build minutes.

Bitbucket Pipelines能够在内部版本之间缓存外部内部版本依赖关系和目录，例如第三方库，从而提供更快的内部版本，并减少了消耗的内部版本数。

Source: support.atlassian.com

资料来源： support.atlassian.com

Depending on what you are doing in a pipeline, different things can be cached in Bitbucket Pipelines. The biggest win will be storing downloaded third party libraries retrieved by a package manager, so you will not have to download external libraries over and over again. Caching Docker images/layers or caching build steps can also save you a lot of time, the latter can also be shared between steps as artifacts.

根据您在管道中执行的操作，可以在Bitbucket管道中缓存不同的内容。最大的胜利将是存储由程序包管理器检索的下载的第三方库，因此您不必一遍又一遍地下载外部库。缓存Docker映像/层或缓存构建步骤也可以节省大量时间，后者也可以在各个步骤之间作为工件共享。

You can use pre-defined caches, such as docker, pip, node and maven, these will already cache a specific folder. You can also set up custom caches, where you have to set the path that needs to be cached by yourself.

您可以使用预定义的缓存，例如docker ， pip ， node和maven ，它们已经缓存了特定的文件夹。您还可以设置自定义缓存，您必须在其中设置需要自己缓存的路径。

很高兴知道 (Good to Know)

Below a short and to the point summary about things that you should know.

以下是您应该了解的简短摘要。

A cache will be stored on a successful build (when there’s no cache already)
缓存将存储在成功的构建中(当尚无缓存时)
Only caches less then 1GB (compressed) will be stored
仅存储小于1GB(压缩)的缓存
Caches will expire after one week
缓存将在一周后过期
You should not cache sensitive data
您不应该缓存敏感数据
You can clear caches manually
您可以手动清除缓存

缓存第三方程序包(Caching Third Party Packages)

I will share several Bitbucket Configuration files that show you how to cache downloaded third party packages, using node and pip as package managers. I will provide results of pipeline run times with and without caching.

我将共享几个Bitbucket配置文件，向您展示如何使用node和pip作为包管理器来缓存下载的第三方包。我将提供带有和不带有缓存的管道运行时间的结果。

节点缓存 (Node Caching)

I started with a simple pipeline configuration, where we install tensorflow. I think you will never need tensorflow within a pipelines, but this is just because I wanted an example with a somewhat bigger package. This can of course be any package or even a requirements.txt.

我从一个简单的管道配置开始，我们在其中安装tensorflow 。我认为您永远不需要管道内的张量流，但这只是因为我想要一个带有更大包装的示例。当然，这可以是任何包，甚至可以是requirements.txt。

image: node:10.15.3pipelines:
  default:
    - step:
        script: 
          - npm install tensorflow@0.7.0

Running this pipeline the first time took 40 seconds, the second and third time took 15 and 14 seconds. Let us see if enabling caching really improves the pipeline run time, which makes an average of 23 seconds.

第一次运行此管道需要40秒，第二次和第三次需要15和14秒。让我们看看启用缓存是否真的可以改善管道运行时间，该平均时间为23秒。

image: node:10.15.3pipelines:
  default:
    - step:
        caches:
          - node
        script: 
          - npm install tensorflow@0.7.0

The first time running the above pipeline took 20 seconds, on this first run it has no cache yet, so it is also responsible for creating and storing the cache, which of course takes a bit longer. We should see some differences in the second and third run. The second run took 11 seconds and the third run took 12 seconds, an average of around 14 seconds. So using cache on node packages already saves at least a few seconds.

第一次运行上述管道需要20秒钟，第一次运行还没有缓存，因此它还负责创建和存储缓存，这当然会花费更长的时间。我们应该在第二轮和第三轮中看到一些差异。第二次运行耗时11秒，第三次运行耗时12秒，平均约14秒。 因此，在节点程序包上使用缓存已经节省了至少几秒钟。

点缓存 (Pip Caching)

We can do the same for Python and pip and want I to demonstrate it with the tensorflow library again. Below the pipeline configuration I have used.

我们可以对Python和pip执行相同的操作，并希望我再次使用tensorflow库进行演示。在我使用过的管道配置下面。

image: python:3.7.3pipelines:
  default:
    - step:
        script: 
          - pip install tensorflow==2.3.0

The first run, without caching, was finished in 47 seconds, the second in 38 seconds and the third in 1 minute and 28 seconds. Average run time without caching: around 58 seconds. I changed the pipeline configuration and added caching for pip.

无需缓存的第一次运行在47秒内完成，第二次在38秒内完成，第三次在1分28秒内完成。没有缓存的平均运行时间：大约58秒。我更改了管道配置，并为pip添加了缓存。

image: python:3.7.3pipelines:
  default:
    - step:
        caches:
          - pip
        script: 
          - pip install tensorflow==2.3.0

Enabling cache resulted in the following results: 1 minute and 14 seconds (without existing cache), 42 seconds and 43 seconds. Average runtime: 53 seconds. Again an improvement compared to the pipelines without caching, looking at the averages.

启用缓存会导致以下结果：1分14秒(不存在现有缓存)，42秒和43秒。平均运行时间： 53秒。与平均值相比，与没有缓存的管道相比，这又是一个改进。

缓存的评估和缺点 (Evaluation and Downsides of Caching)

The results have “outliers” up and down, which makes me wonder if they are really good examples. Same configurations can sometimes differ by tens of seconds, although the run times of cached enable pipelines look more stable, which may be due to the latency to download the third party packages.

结果上下有“异常值”，这使我想知道它们是否真的是很好的例子。尽管缓存的启用管道的运行时间看起来更稳定，但相同的配置有时可能相差数十秒，这可能是由于下载第三方程序包的延迟所致。

When there is cache already in place with the same name, the cache will not be updated, not even when something changed within the folder you cached. You should delete the specific cache at the ‘Pipelines’ page in Bitbucket when you want to create a new cache. Also, the 1 GB limit can be reached very fast, so you still have little or no use for caching.

当已经有同名的缓存时，即使缓存文件夹中发生了某些更改，缓存也不会更新。当您要创建新的缓存时，应在Bitbucket的“管道”页面上删除特定的缓存。另外，可以非常快地达到1 GB的限制，因此您仍然很少或根本没有缓存。

其他方法 (Other Approaches)

Another approach, when you need external libraries within your steps, is to create your own Docker container for your pipelines. You can set the Docker container for a whole pipeline, or set it per step. This can be handy if you have a lint step, where you are using a Python image with pylint already installed, and a test step, with all the packages installed to test your Python application. This ensures you do not have to install these dependencies again and again. See the example below, assuming the image is built and deployed to DockerHub by myself, containing Python 3.7.3, pylint 2.6.0 and pytest 6.0.2, which names and versions are part of the Docker image name and tag.

当您在步骤中需要外部库时，另一种方法是为管道创建自己的Docker容器。您可以为整个管道设置Docker容器，也可以按步骤进行设置。如果您有一个lint步骤(正在使用已安装pylint的Python映像)和一个test步骤(已安装所有软件包以测试您的Python应用程序)，则这会很方便。这样可以确保您不必一次又一次地安装这些依赖项。请参阅下面的示例，假设该映像由我自己构建并部署到DockerHub，其中包含Python 3.7.3，pylint 2.6.0和pytest 6.0.2，这些名称和版本是Docker映像名称和标记的一部分。

image:
  name: sschrijver/python-pylint-pytest:3.7.3-2.6.0-6.0.2pipelines:
  default:
    - step:
        name: Lint
        script: 
          - pylint .
    - step:
        name: Test
        script: 
          - pytest .

The Dockerfile of this image can look like the following:

该映像的Dockerfile如下所示：

FROM python:3.7.3RUN pip install pylint==2.6.0 pytest==6.0.2

Or you can set the container per step, using the same principle as mentioned above.

或者，您可以使用上述相同的原理，按步骤设置容器。

pipelines:
  default:
    - step:
        name: Lint
        image: sschrijver/python-pylint:3.7.3-2.6.0
        script: 
          - pylint .
    - step:
        name: Test
        image: sschrijver/python-pytest:3.7.3-6.0.2
        script: 
          - pytest .