Using the test suite

BuildStream uses tox as a frontend to run the tests which are implemented using pytest. We use pytest for regression tests and testing out the behavior of newly added components.

The elaborate documentation for pytest can be found here:

Don’t get lost in the docs if you don’t need to, follow existing examples instead.

Installing build dependencies

Some of BuildStream’s dependencies have non-python build dependencies. When running tests with tox, you will first need to install these dependencies. Exact steps to install these will depend on your operating system. Commands for installing them for some common distributions are listed below.

For Fedora-based systems:

dnf install gcc python3-devel

For Debian-based systems:

apt install gcc python3-dev

Installing runtime dependencies

To be able to run BuildStream as part of the test suite, BuildStream’s runtime dependencies must also be installed. Instructions on how to do so can be found in Installing Dependencies.

If you are not interested in running the integration tests, you can skip the installation of buildbox-run.

Running tests

To run the tests, simply navigate to the toplevel directory of your BuildStream checkout and run:


By default, the test suite will be run against every supported python version found on your host. If you have multiple python versions installed, you may want to run tests against only one version and you can do that using the -e option when running tox:

tox -e py37

If you would like to test and lint at the same time, or if you do have multiple python versions installed and would like to test against multiple versions, then we recommend using detox, just run it with the same arguments you would give tox:

detox -e lint,py36,py37

The output of all failing tests will always be printed in the summary, but if you want to observe the stdout and stderr generated by a passing test, you can pass the -s option to pytest as such:

tox -- -s


The -s option is a pytest option.

Any options specified before the -- separator are consumed by tox, and any options after the -- separator will be passed along to pytest.

You can always abort on the first failure by running:

tox -- -x

Similarly, you may also be interested in the --last-failed and --failed-first options as per the pytest cache documentation.

If you want to run a specific test or a group of tests, you can specify a prefix to match. E.g. if you want to run all of the frontend tests you can do:

tox -- tests/frontend/

Specific tests can be chosen by using the :: delimiter after the test module. If you wanted to run the test_build_track test within frontend/ you could do:

tox -- tests/frontend/

When running only a few tests, you may find the coverage and timing output excessive, there are options to trim them. Note that coverage step will fail. Here is an example:

tox -- --no-cov --durations=1 tests/frontend/

We also have a set of slow integration tests that are disabled by default - you will notice most of them marked with SKIP in the pytest output. To run them, you can use:

tox -- --integration

In case BuildStream’s dependencies were updated since you last ran the tests, you might see some errors like pytest: error: unrecognized arguments: --codestyle. If this happens, you will need to force tox to recreate the test environment(s). To do so, you can run tox with -r or --recreate option.


By default, we do not allow use of site packages in our tox configuration to enable running the tests in an isolated environment. If you need to enable use of site packages for whatever reason, you can do so by passing the --sitepackages option to tox. Also, you will not need to install any of the build dependencies mentioned above if you use this approach.


While using tox is practical for developers running tests in more predictable execution environments, it is still possible to execute the test suite against a specific installation environment using pytest directly:


If you want to run coverage, you will need need to add BST_CYTHON_TRACE=1 to your environment if you also want coverage on cython files. You could then get coverage by running:

BST_CYTHON_TRACE=1 coverage run pytest

Note that you will have to have all dependencies installed already, when running tests directly via pytest. This includes the following:

  • Cython and Setuptools, as build dependencies

  • Runtime dependencies and test dependencies are specified in requirements files, present in the requirements subdirectory. Refer to the .in files for loose dependencies and .txt files for fixed version of all dependencies that are known to work.

  • Additionally, if you are running tests that involve external plugins, you will need to have those installed as well.

You can also have a look at our tox configuration in tox.ini file if you are unsure about dependencies.


We also have an environment called ‘venv’ which takes any arguments you give it and runs them inside the same virtualenv we use for our tests:

tox -e venv -- <your command(s) here>

Any commands after -- will be run a virtualenv managed by tox.

Running linters

Linting is performed separately from testing. In order to run the linting step which consists of running the pylint tool, run the following:

tox -e lint


The project specific pylint configuration is stored in the toplevel buildstream directory in the .pylintrc file. This configuration can be interesting to use with IDEs and other developer tooling.

Formatting code

Similar to linting, code formatting is also done via a tox environment. To format the code using the black tool, run the following:

tox -e format

Observing coverage

Once you have run the tests using tox (or detox), some coverage reports will have been left behind.

To view the coverage report of the last test run, simply run:

tox -e coverage

This will collate any reports from separate python environments that may be under test before displaying the combined coverage.

Adding tests

Tests are found in the tests subdirectory, inside of which there is a separate directory for each domain of tests. All tests are collected as:


If the new test is not appropriate for the existing test domains, then simply create a new directory for it under the tests subdirectory.

Various tests may include data files to test on, there are examples of this in the existing tests. When adding data for a test, create a subdirectory beside your test in which to store data.

When creating a test that needs data, use the datafiles extension to decorate your test case (again, examples exist in the existing tests for this), documentation on the datafiles extension can be found here:

Tests that run a sandbox should be decorated with:


and use the integration cli helper.

You must test your changes in an end-to-end fashion. Consider the first end to be the appropriate user interface, and the other end to be the change you have made.

The aim for our tests is to make assertions about how you impact and define the outward user experience. You should be able to exercise all code paths via the user interface, just as one can test the strength of rivets by sailing dozens of ocean liners. Keep in mind that your ocean liners could be sailing properly because of a malfunctioning rivet. End-to-end testing will warn you that fixing the rivet will sink the ships.

The primary user interface is the cli, so that should be the first target ‘end’ for testing. Most of the value of BuildStream comes from what you can achieve with the cli.

We also have what we call a “Public API Surface”, as previously mentioned in Documenting symbols. You should consider this a secondary target. This is mainly for advanced users to implement their plugins against.

Note that both of these targets for testing are guaranteed to continue working in the same way across versions. This means that tests written in terms of them will be robust to large changes to the code. This important property means that BuildStream developers can make large refactorings without needing to rewrite fragile tests.

Another user to consider is the BuildStream developer, therefore internal API surfaces are also targets for testing. For example the YAML loading code, and the CasCache. Remember that these surfaces are still just a means to the end of providing value through the cli and the “Public API Surface”.

It may be impractical to sufficiently examine some changes in an end-to-end fashion. The number of cases to test, and the running time of each test, may be too high. Such typically low-level things, e.g. parsers, may also be tested with unit tests; alongside the mandatory end-to-end tests.

It is important to write unit tests that are not fragile, i.e. in such a way that they do not break due to changes unrelated to what they are meant to test. For example, if the test relies on a lot of BuildStream internals, a large refactoring will likely require the test to be rewritten. Pure functions that only rely on the Python Standard Library are excellent candidates for unit testing.

Unit tests only make it easier to implement things correctly, end-to-end tests make it easier to implement the right thing.