How to Accelerate Slow Rust CI/CD Builds on Google Cloud Build

Rust’s cargo provides powerful dependency management, allowing projects to easily leverage community developed libraries to build new functionality. Cargo’s local cache speeds up local cargo builds when build configuration does not change: dependencies only need to be built once, so subsequent builds are faster. However, when running plain cargo in build containers on CI platforms, cargo cannot access caches from previous builds, resulting in long compile times.

This post shows an implementation that uses two types of caches to speed up Rust builds: sccache and kaniko. The shown solution caches the crates.io index, sccache itself, and compilation outputs for both dependencies and code.

Sccache

sccache is a compiler caching tool that supports Rust, C++ and CUDA. When a cached compilation output is found for an input, sccache uses the cached output rather than re-running the compiler.

To use sccache, the user tells the build system to call sccache instead of the compiler. In Rust, this is done with the RUSTC_WRAPPER environment variable, or setting build.rustc_wrapper in cargo config. Cargo runs sccache instead of running rustc.

To maintain state across runs, sccache spawns a server on the same machine. This is done automatically: users do not need to manually run or configure the server component.

There are multiple backends to store the cache. We will focus on the Google Cloud Storage (GCS) backend here, but sccache also supports S3 (and Cloudflare R2), Azure Blob Storage, and Github Actions cache, as well as Redis and memcached.

Adding sccache to the docker image

The docker image where cargo will compile our crate will need to have the sccache binaries. At the time of writing, sccache did not appear to be available via apt-get; the following script fetches binaries from its GitHub release on different architectures. The architecture is given as a parameter (x86_64 or aarch64), or disable caching by specifying nocache:

#!/usr/bin/env bash
# called with the caching architecture, one of x86_64, aarch64, or nocache
CACHE_ARCH="$1"
echo "Preparing sccache with CACHE_ARCH=${CACHE_ARCH}"
case ${CACHE_ARCH} in
    nocache)
        echo nocache
        ln -s `which rustc` /usr/local/cargo/bin/rustc_wrapper
        ;;
    x86_64)
        echo x86_64: curling the release
        curl -L https://github.com/mozilla/sccache/releases/download/v0.4.0-pre.6/sccache-v0.4.0-pre.6-x86_64-unknown-linux-musl.tar.gz | tar xvz -C / 
        ln -s /sccache-v0.4.0-pre.6-x86_64-unknown-linux-musl/sccache /usr/local/cargo/bin/rustc_wrapper
        ;;
    aarch64)
        curl -L https://github.com/mozilla/sccache/releases/download/v0.4.0-pre.6/sccache-v0.4.0-pre.6-aarch64-unknown-linux-musl.tar.gz | tar xvz -C / 
        ln -s /sccache-v0.4.0-pre.6-aarch64-unknown-linux-musl/sccache /usr/local/cargo/bin/rustc_wrapper
        ;;
    *)
        echo Unsupported caching architecture \"${CACHE_ARCH}\"
        exit -1
        ;;
esac

Note that release v0.3.3 did not work with GCS instance metadata authentication, which is why this snippet uses v0.4.0-pre.6. (side note for those interested: it appears PR#1108 fixed the issue).

Docker can then call the script and use a CACHE_ARCH argument so that the architecture could be specified during the build:

FROM rust:1.66.1-bullseye as build-env
ARG CACHE_ARCH=nocache # x86_64, aarch64, nocache
# add sccache, with two alternatives that will be select by ${CACHE_ARCH}
COPY docker_prepare_sccache.sh /
RUN /docker_prepare_sccache.sh $CACHE_ARCH

Configuring sccache for GCS

To get sccache to work with Google Cloud Storage, we need to specify the bucket for the cache via SCCACHE_GCS_BUCKET.

By default sccache treats the GCS cache as read-only, however you’d probably want to set it as read+write to fill the cache from Cloud Build, via SCCACHE_GCS_RW_MODE.

We also set the RUSTC_WRAPPER. In the Dockerfile, just before cargo build:

ENV RUSTC_WRAPPER=/usr/local/cargo/bin/rustc_wrapper
ARG SCCACHE_GCS_BUCKET
ENV SCCACHE_GCS_BUCKET=${SCCACHE_GCS_BUCKET}
ENV SCCACHE_GCS_RW_MODE=READ_WRITE

Enabling sccache to use workload identity to authenticate to GCS

To authenticate to GCS using Workload Identity, the container would need access to the instance metadata endpoint, which provides an oauth token. By default, however, Google Cloud Build “builders” disallow access instance metadata to code running the build. Fortunately, the cloud builder team made it possible to forward credential access to the builder in the cloudbuild.yaml file by passing --network=cloudbuild:

- name: "gcr.io/cloud-builders/docker"
  args:
  - build
  - --network=cloudbuild
  - "--tag=<IMAGE TAG>"
  - "--file=./Dockerfile"
  - "--build-arg"
  - "SCCACHE_GCS_BUCKET=<CACHE BUCKET>"
  - .
images:
- "<IMAGE TAG>"

Setting up permissions

The Cloud Build default service account, which executes builds out-of-the-box, already has permissions to access all buckets in the account. It is possible to configure a custom service account, but the build output cannot be published to the default log bucket. There is a choice whether to use Logs Writer permissions, or to use a user-created bucket. Having tried the Logs Writer option, it doesn’t refresh as well as the default setup. Did not try user-specified buckets. I ended up using the default service account for simplicity in this test project, however depending on future security and compliance requirements, this can change.

Optional: cost savings with lifecycle rules

I also set a lifecycle rule to delete objects after their age reaches 30 days. This ensures cached products are not kept indefinitely, but might require some re-compilation for code that doesn’t change much (e.g., if we use the same dependency for 30 days or more). Every use-case might have their preferred tradeoff. See the lifecycle documentation link for further instructions.

kaniko

kaniko builds container images inside a container without requiring access to a docker daemon. This avoid the usual downside that requires granting builders system-wide permissions to docker.

One of kaniko’s useful features is its ability to cache container layers which are produced while processing the Dockerfile. This is similar to the experience we’re used to with local docker builds — the execution of Dockerfile commands that do not change and whose inputs remain static can jump ahead using the layer cache.

A first approach: cache cargo dependencies with kaniko (not recommended)

For a “simpler” alternative to sccache, multiple suggestions online oppose to use layers to avoid compiling dependencies when they have not changed.

issue cargo#2644 has an extensive discussion.

Posters asked if cargo could add a command to only download and compile dependencies for a project. Fortunately, cargo already has enough functionality to do this, and this can be accomplished with a handful of Dockerfile commands.

The idea is to copy Cargo.toml and Cargo.lock into the container, and then run cargo build. For cargo to agree to attempt compilation, the directory needs to also have source files: the script creates an empty src/lib.rs. Only after dependencies compile, the Dockerfile copies the full source code to the container, followed by a second cargo build, which uses the cached layer for the dependencies. If dependencies have not changed, only the crate code needs to be recompiled.

However, I encountered two downsides with using kaniko this way.

  1. The most obvious downside is that even if one dependency changes, the builder needs to recompile all dependencies. This increases the frequency and severity of long builds.
  2. The other downside was that on Cloud Build with 4GB RAM, kaniko failed to cache the dependency layer:
    Finished release [optimized] target(s) in 21m 49s
INFO[1343] Taking snapshot of full filesystem...        
ERROR
ERROR: build step 0 "gcr.io/kaniko-project/executor:latest" failed: step exited with non-zero status: 137

The OOM (error 137) is probably because the artifacts are pretty large, although it is not clear why artifacts would occupy RAM. However, since sccache is able to solve the first downside by caching with finer granularity, we can just not use kaniko for this cache. Instead, kaniko can speed up other aspects of the build.

Kaniko use-case #1: produce a container with sccache binaries

One use for kaniko is to prepare a builder image that already contains sccache, so the build containers would not have to install it on every run. The script and Dockerfile given above to install sccache would work for that.

Kaniko use-case #2: cache the crates.io index

When using sccache, the builder still needs to read the crates.io index (the familiar Updating crates.io index prompt). The crates.io index is apparently git-based, so with incremental git updates, caching it would speed up updates even if the cache is somewhat stale. There is a request for a cargo command to explicitly refresh the crates.io index, however it is as yet unimplemented.

Until explicit refresh is provided, we can utilize a workaround that force-refreshes, by creating a dummy crate and adding a dependency. This is part of the Dockerfile.builder below.

Putting it all together

cloudbuild.yaml

This builds a “builder image” with kaniko containing sccache and a version of the crates.io index, and then uses that image to cargo build.

steps:
- name: 'gcr.io/kaniko-project/executor:latest'
  id: builder
  args:
  - --destination=<BUILDER IMAGE>
  - --cache=true
  - --cache-ttl=720h
  - --dockerfile=Dockerfile.builder
  - "--build-arg"
  - "CACHE_ARCH=x86_64"
- name: "gcr.io/cloud-builders/docker"
  id: artifact
  waitFor:
    - builder
  args:
  - build
  - --network=cloudbuild
  - "--tag=<TARGET IMAGE>"
  - "--file=./Dockerfile"
  - "--build-arg"
  - "BASE_IMAGE=<BUILDER IMAGE>"
  - "--build-arg"
  - "SCCACHE_GCS_BUCKET=<SCCACHE BUCKET>"
  - .
images:
- "<BUILDER IMAGE>"
- "<TARGET IMAGE>"

Replace <BUILDER IMAGE> and <TARGET IMAGE> as appropriate.

Limitation: note that the second (artifact) stage uses the general builder image and not that specific builder image produced in the previous stage, this can race in case of multiple concurrent builds.

Dockerfile.builder

FROM rust:1.66.1-bullseye as build-env
ARG CACHE_ARCH=nocache # x86_64, aarch64, nocache
# add sccache
COPY docker_prepare_sccache.sh /
RUN /docker_prepare_sccache.sh $CACHE_ARCH
# update the crates.io index
# based on https://stackoverflow.com/a/74708239
RUN mkdir /tmp/deleteme \
      && cd /tmp/deleteme \
      && cargo init \
      && cargo add serde \
      && rm -rf /tmp/deleteme

Dockerfile

ARG BASE_IMAGE=builder
FROM ${BASE_IMAGE} as build-env
# set SCCACHE parameters on all build-env. these will be ignored if RUSTC_WRAPPER is not set
ENV RUSTC_WRAPPER=/usr/local/cargo/bin/rustc_wrapper
ARG SCCACHE_GCS_BUCKET
ENV SCCACHE_GCS_BUCKET=${SCCACHE_GCS_BUCKET}
ENV SCCACHE_GCS_RW_MODE=READ_WRITE
WORKDIR /build
COPY . .
RUN  cargo build --release && /usr/local/cargo/bin/rustc_wrapper -s
FROM debian:bullseye-slim as final
RUN apt-get update \
 && apt install -y \
        ca-certificates \
        libopenblas0-pthread \
 && apt-get clean \
 && rm -rf /var/lib/apt/lists/*
COPY --from=build-env /build/target/release/<BINARY-NAME> /<BINARY-NAME>
ENTRYPOINT ["./<BINARY-NAME>"]

Replace <BINARY-NAME> as appropriate.

Note the rust_wrapper -s call, which should output caching statistics into the build log.

docker_prepare_sccache.sh

Same as above, but provided here for completeness for those trying to assemble a complete solution:

#!/usr/bin/env bash
# called with the caching architecture, one of x86_64, aarch64, or nocache
CACHE_ARCH="$1"
echo "Preparing sccache with CACHE_ARCH=${CACHE_ARCH}"
case ${CACHE_ARCH} in
    nocache)
        echo nocache
        ln -s `which rustc` /usr/local/cargo/bin/rustc_wrapper
        ;;
    x86_64)
        echo x86_64: curling the release
        curl -L https://github.com/mozilla/sccache/releases/download/v0.4.0-pre.6/sccache-v0.4.0-pre.6-x86_64-unknown-linux-musl.tar.gz | tar xvz -C / 
        ln -s /sccache-v0.4.0-pre.6-x86_64-unknown-linux-musl/sccache /usr/local/cargo/bin/rustc_wrapper
        ;;
    aarch64)
        curl -L https://github.com/mozilla/sccache/releases/download/v0.4.0-pre.6/sccache-v0.4.0-pre.6-aarch64-unknown-linux-musl.tar.gz | tar xvz -C / 
        ln -s /sccache-v0.4.0-pre.6-aarch64-unknown-linux-musl/sccache /usr/local/cargo/bin/rustc_wrapper
        ;;
    *)
        echo Unsupported caching architecture \"${CACHE_ARCH}\"
        exit -1
        ;;
esac

Appendix: debugging GCS permissions

The following commands were useful to debug GCS access. The token fetch verifies the instance metadata is reachable and produces an auth token. Debug logs for sccache can point to issues. The following command builds the hello-world-rs crate for a speedier debug cycle.

RUN curl "http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token" -H "Metadata-Flavor: Google"
RUN SCCACHE_ERROR_LOG=/sccache_log.txt SCCACHE_LOG=debug RUSTC_WRAPPER=/usr/local/cargo/bin/rustc_wrapper cargo install hello-world-rs && /usr/local/cargo/bin/rustc_wrapper -s
RUN echo "Log:" && cat /sccache_log.txt
Posted in Tech Blog and tagged , , , , , , , , .

Leave a Reply

Your email address will not be published. Required fields are marked *