Dimensions of reproducible builds

The Many Dimensions of Reproducible Builds

One of the core concepts of Docker is an isolated runtime environment. We choose the dependencies, library versions and packages. It should be possible to reproduce the same environment on any machine and in theory the same build output on any machine.

In practise it’s more complex.

Builds can be slow if they are not optimised; ever had to download hundreds of dependencies from Maven each time you build a Java project? Or repopulate node_modules? Developers will optimise for the quickest path and this might be at the expense of reproducibility, by building locally outside Docker with cached dependencies, mounting dirty, local directories or using cached (out-of-date) Docker images.

Developers are increasingly using ARM based Macs but targeting x86_64. These builds are slower on Macs because they require the QEMU translation layer. Should build teams also target ARM architecture purely to improve local build and testing performance? The overhead in customising and maintaining multiple build containers for different architectures will be significant, and not great for reproducibility if we aren’t targeting ARM in production.

Tools like Bazel or Buck2 are designed to improve build performance and reproducibility by controlling the build environment and caching build inputs/outputs. We can run the same build, and tests locally and in CI and have greater confidence that our inputs and outputs are the same. These tools are complex and require significant investment to implement and maintain. Small changes in your code can will require constant changes to your build configuration. It’s worth noting that the Kubernetes project started with Bazel (and was copied - eg cert-manager/cert-manager), but did later remove Bazel because of it’s complexity vs benefits.

Dagger.io is a tool that leverages Buildkit to execute pipelines in containers with the promise that you can run the same pipeline anywhere. Furthermore pipelines are written in Go, Typescript, or Python instead of YAML. This helps close the gap between unique CI/CD scripts and local development.

The different dimensions of reproducible builds are constantly changing and we need to make trade-offs between reproducibility, developer velocity and confidence. Teams that can track the reliability of their CI/CD pipelines will be able to consider these dimensions and choose the approach that’s good enough.