Cache keys
Cache keys for artifacts are generated from the inputs of the build process for the purpose of reusing artifacts in a well-defined, predictable way.
Structure
Cache keys are SHA256 hash values generated from a pickled Python dict that includes:
Environment (e.g., project configuration and variables)
Element configuration (details depend on element kind,
Element.get_unique_key()
)Sources (
Source.get_unique_key()
)Dependencies (depending on cache key type, see below)
Public data
Cache key types
There are two types of cache keys in BuildStream, strong
and weak
.
The purpose of a strong
cache key is to capture the state of as many aspects
as possible that can have an influence on the build output. The aim is that
builds will be fully reproducible as long as the cache key doesn’t change,
with suitable module build systems that don’t embed timestamps, for example.
A strong
cache key includes the strong cache key of each build dependency
(and their runtime dependencies) of the element as changes in build dependencies
(or their runtime dependencies) can result in build differences in reverse
dependencies. This means that whenever the strong cache key of a dependency
changes, the strong cache key of its reverse dependencies will change as well.
A weak
cache key has an almost identical structure, however, it includes
only the names of build dependencies, not their cache keys or their runtime
dependencies. A weak cache key will thus still change when the element itself
or the environment changes but it will not change when a dependency is updated.
For elements without build dependencies the strong
cache key is identical
to the weak
cache key.
Strict build plan
This is the default build plan that exclusively uses strong
cache keys
for the core functionality. An element’s cache key can be calculated when
the cache keys of the element’s build dependencies (and their runtime
dependencies) have been calculated and either tracking is not enabled or it
has already completed for this element, i.e., the ref
is available.
This means that with tracking disabled the cache keys of all elements could be
calculated right at the start of a build session.
While BuildStream only uses strong
cache keys with the strict build plan
for the actual staging and build process, it will still calculate weak
cache keys for each element. This allows BuildStream to store the artifact
in the cache with both keys, reducing rebuilds when switching between strict
and non-strict build plans. If the artifact cache already contains an
artifact with the same weak
cache key, it’s replaced. Thus, non-strict
builds always use the latest artifact available for a given weak
cache key.
Non-strict build plan
The non-strict build plan disables the time-consuming automatic rebuild of
reverse dependencies at the cost of dropping the reproducibility benefits.
It uses the weak
cache keys for the core staging and build process.
I.e., if an artifact is available with the calculated weak
cache key,
it will be reused for staging instead of being rebuilt. weak
cache keys
can be calculated early in the build session. After tracking, similar to
when strong
cache keys can be calculated with a strict build plan.
Similar to how strict build plans also calculate weak
cache keys, non-strict
build plans also calculate strong
cache keys. However, this is slightly
more complex. To calculate the strong
cache key of an element, BuildStream
requires the strong
cache keys of the build dependencies (and their runtime
dependencies).
The build dependencies of an element may have been updated since the artifact
was built. With the non-strict build plan the artifact will still be reused.
However, this means that we cannot use a strong
cache key calculated purely
based on the element definitions. We need a cache key that matches the
environment at the time the artifact was built, not the current definitions.
The only way to get the correct strong
cache key is by retrieving it from
the metadata stored in the artifact. As artifacts may need to be pulled from a
remote artifact cache, the strong
cache key is not readily available early
in the build session. However, it can always be retrieved when an element is
about to be built, as the dependencies are guaranteed to be in the local
artifact cache at that point.
Element._get_cache_key_from_artifact()
extracts the strong
cache key
from an artifact in the local cache. Element._get_cache_key_for_build()
calculates the strong
cache key that is used for a particular build job.
This is used for the embedded metadata and also as key to store the artifact in
the cache.