

When I started my professional career I had no idea what a unit test looked like. I think I skipped that part on W3Schools. It didn’t matter, because my first employer was not using tests. Before pushing a commit to the SVN repository, I checked some random pages on my local development environment. And after every deployment I made sure the exceptions coming in to my Thunderbird email client were not caused by my code change.

When I applied at Mollie almost 7 years ago, they gave me a small coding assignment. I had to create a login screen and add some tests to prove that it worked. The login screen was working in 30 minutes and the rest of the day I was learning how to prove this with automated tests.

In my first weeks at Mollie, I quickly learned why tests are so important. Never in my life had I felt so much confidence before releasing a new feature. And because Mollie is processing financial information, confidence in your code is a very nice thing to have.

不同类型的测试 (Different types of tests)

Over the years, we’ve added lots of new features. And with every new feature, the number of tests grew as well. In our biggest application, we have over 30 thousand tests of different types:

  • Unit tests (27,687x) — This is the biggest group of tests that we have. They test a single unit of our code base, and it does not depend on external services like a database.

  • Integration tests (5,348x) — In our integration tests we test bigger parts of the system together. The tests use MySQL, Redis, RabbitMQ and Elasticsearch to make sure everything works and is configured correctly.

  • Headless tests (39x) — These tests confirm that the end-to-end flow works as expected. We spin up three servers: one that serves the Mollie application, one that mocks our suppliers and one that acts as one of our customers. We also use a headless Chromium browser to imitate the consumer completing a transaction. Then we verify that the correct calls are made to suppliers, that we call the webhooks to our clients correctly, and many more things.

  • Other tests — We also have some more specific tests for smaller subsystems with specific tooling. For example, we test that our frontend assets are built correctly and we check that our SVG images don’t contain any malicious code.

This time all tests completed in 3m28s.

For every commit we push to our Git repository, we run all the tests in our self-hosted GitLab instance. Not only for master, but also in our branches. This gives our software developers immediate feedback. To keep the feedback loop useful, we’ve always tried to run all tests in under 5 minutes. Over the years we had to keep improving our test pipelines to make this 5-minute target. We use the following techniques to speed things up:

伞兵 (Paratest)

We use the brianium/paratest tool to run our PHPUnit tests in parallel. Paratest is an extension on top of PhpUnit that adds support for parallel testing. For our unit tests, we could just add Paratest and it works without any configuration!

The output still looks the same when using Paratest.

Because our integration tests are using external services, we had to duplicate these services as well if we want to use Paratest. For MySQL and Redis we use separate databases on the same server. We decided to mock the RabbitMQ service completely, and with our use of Elasticsearch it was not a problem to run multiple tests on the same server at the same time.

Running the tests in parallel really speeds up your tests, but now setting up a clean database for every parallel thread was becoming the bottleneck. So instead of creating all the databases from scratch by running the database migrations, we just run them once and use some bash foodoo to mysqldump them into the other databases in parallel as well.

We learned the hard way that we have to explicitly wait for the mysql imports to finish. In the past we had some tests fail randomly, because not all the tables existed at the start of the test suite. This only happened when the pipeline runners were busy, causing the imports to take more time than usual.

并行管道 (Parallel pipelines)

Another way to run tests in parallel is by using fully independent parallel pipeline steps. Every type of test runs in their own step. And our integration tests are even split up over two separate steps. We created a custom Paratest runner that only runs 1/Nth of the tests. Using the parallel property in .gitlab-ci.yml we can specify in how many steps we want to split the tests.

Image for post
Just adding parallel: 2 to our GitLab pipeline configuration is now enough to split tests over multiple steps. And we can increase this number even more at the cost of more resources.
MySQL事务 (MySQL transactions)

We run all our integration tests inside a database transaction. This allows us to rollback any changes made by the tests. We just run the SQL query BEGIN before we start a test and run ROLLBACK at the end. This is way faster than running TRUNCATE on all the tables that changed.

检查慢速测试 (Check for slow tests)

We use an open source PHPUnit listener johnkary/phpunit-speedtrap that outputs which of the tests take longer than a specified threshold. It’s very easy to install and configure. Our output looks like this:

When you know which tests are slow, you can investigate how to make them faster. Maybe you are doing requests to external services, or maybe there’s a sleep() method you need to mock away.

GitLab运行程序规格 (GitLab runner specs)

Because we host our own GitLab in Google Cloud, we have the freedom to give the pipeline runners the specs that we want. We used to have instances of the type n2-standard-16 for our runners (16 vCPUs and 64GB ram), but we realised that for tests we only needed the CPUs. We switched to instances of the n2-highcpu-16 type, which has 16 vCPUs and only 16GB of ram. This saved us quite some money without slowing the pipelines down at all. With auto scaling we try to always have runners available to pick up new pipelines, without having a lot of unused resources.

管道之间的缓存 (Caching between pipelines)

One easy way to speed up your pipelines is to use caching whenever possible. We cache our vendor/ and node_modules/ directories for respectively our PHP and Node dependencies.

On the master branch we use cache policy "pull-push" so new dependencies are also pushed to the cache. For branches we use cache policy "pull" to prevent adding dependencies to the cache that are not merged to master yet. The downside is that new dependencies always will be downloaded when the pipeline runs in a branch, but we don’t add new dependencies that often. Using a dedicated cache per branch would slow down all first pipelines for a new branch, and caching for example the Composer cache directory would slow down all runs in master.

When using caches, don’t forget that these caches also have to be downloaded into the runner. Sometimes it’s faster to quickly build something than to use a cache.

将本地运行的测试与您的IDE集成在一起 (Integrate running tests locally with your IDE)

Although the tests run faster in the CI pipelines, being able to run just a couple of tests locally can save a lot of time. Most of our developers use PhpStorm and the integration with PHPUnit and xDebug work out of the box for us. This also helps to save resources for other pipelines.

结论 (Conclusion)

One of the best practises defined by extreme programming (XP) is to keep your build and tests under 10 minutes. Martin Fowler writes:

“It’s worth putting in concentrated effort to make it happen, because every minute you reduce off the build time is a minute saved for each developer every time they commit.”


I hope this article gave you some good suggestions on how to speed up your build.


