Docker Layer Caching Reference
Optimize build times and image size with efficient layer caching
Docker layer caching is a fundamental concept that makes Docker builds fast and efficient. When you build an image, Docker executes each instruction in your Dockerfile and creates a layer for each one. If an instruction hasn't changed since the last build, Docker can reuse the existing layer from its cache instead of executing the instruction again, significantly reducing build times.
Understanding and optimizing for layer caching is essential for creating an efficient Docker workflow, especially in development environments and CI/CD pipelines where builds happen frequently.
How Docker Layer Caching Works
Docker images are composed of a series of layers, with each layer representing the changes resulting from a single Dockerfile instruction. These layers are stacked on top of each other to form the final image. Here's how the caching process works:
When building an image, Docker checks each instruction to determine if it can use a cached layer:
- Docker looks for an existing image in its cache that has the same parent and was created with the same instruction.
- For most instructions, Docker simply compares the instruction string to determine if the cache can be used.
- For ADD and COPY instructions, Docker also checks if the content of the files being added or copied has changed by computing checksums.
Once a cache miss occurs for an instruction, all subsequent instructions will not use the cache:
- If Docker cannot find a cache match for an instruction, it executes the instruction and creates a new layer.
- All instructions that follow in the Dockerfile will also be executed without using the cache, even if they would otherwise match cached layers.
- This is because any change might affect how subsequent instructions behave.
Some instructions have special caching behaviors:
- RUN: For RUN instructions, Docker compares the exact command string, not the result of running the command. Even if a command like
apt-get update
would pull different packages, Docker will still use the cache if the command string hasn't changed. - COPY and ADD: For these instructions, Docker checks the checksum of each file being copied to determine whether the cache can be used.
- ENV, LABEL, etc.: For metadata instructions, Docker simply checks if the instruction string has changed.
Controlling Cache Behavior
You can disable cache usage entirely when building an image:
docker build --no-cache .
This forces Docker to execute all instructions without using any cached layers, which is useful when you need to ensure all packages and dependencies are updated.
You can selectively invalidate the cache for specific instructions by using build arguments:
# Dockerfile
FROM ubuntu:20.04
ARG CACHEBUST=1
RUN apt-get update && apt-get install -y python3
By changing the value of CACHEBUST at build time, you can force Docker to execute the RUN instruction without using the cache:
docker build --build-arg CACHEBUST=$(date +%s) .
Docker BuildKit (enabled with DOCKER_BUILDKIT=1
) provides more advanced caching options, including cache mounts:
# Use cache mount for package management
RUN --mount=type=cache,target=/var/cache/apt \
apt-get update && apt-get install -y python3
# Use cache mount for dependencies
RUN --mount=type=cache,target=/root/.cache/pip \
pip install -r requirements.txt
Cache mounts allow you to persist certain directories across builds, which is especially useful for package managers that maintain their own caches.
Examples of Layer Caching Optimization
Basic Layer Ordering
Order Dockerfile instructions from least likely to change to most likely to change:
# Bad example - frequent changes to source code invalidate dependency cache
FROM node:18-alpine
WORKDIR /app
COPY . .
RUN npm install
CMD ["npm", "start"]
# Good example - dependencies are cached even when source code changes
FROM node:18-alpine
WORKDIR /app
COPY package*.json ./
RUN npm install
COPY . .
CMD ["npm", "start"]
By copying only package.json files first and installing dependencies before copying the rest of the code, you can reuse the dependency installation layer even when your source code changes.
Combining Commands
Combine related commands into a single RUN instruction:
# Bad example - multiple layers, the cache might break at any point
FROM ubuntu:20.04
RUN apt-get update
RUN apt-get install -y python3
RUN apt-get install -y python3-pip
RUN apt-get clean
RUN rm -rf /var/lib/apt/lists/*
# Good example - single layer for all package management operations
FROM ubuntu:20.04
RUN apt-get update && \
apt-get install -y python3 python3-pip && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
Combining related commands into a single RUN instruction reduces the number of layers and ensures that intermediate states (like after apt-get update but before installing packages) aren't cached separately.
Using BuildKit Cache Mounts
Leverage BuildKit cache mounts for package managers:
# Using cache mounts with BuildKit
# DOCKER_BUILDKIT=1 docker build .
FROM golang:1.18-alpine
WORKDIR /app
# Cache Go modules
COPY go.mod go.sum ./
RUN --mount=type=cache,target=/go/pkg/mod \
go mod download
# Build the application
COPY . .
RUN --mount=type=cache,target=/go/pkg/mod \
--mount=type=cache,target=/root/.cache/go-build \
go build -o /app/myapp .
This example uses BuildKit cache mounts to persist the Go module cache and build cache between builds, which can significantly speed up builds even when using --no-cache.
Multi-stage Builds with Caching
Optimize caching in multi-stage builds:
# Optimized multi-stage build for caching
FROM node:18 AS dependencies
WORKDIR /app
COPY package*.json ./
RUN npm ci
FROM node:18 AS builder
WORKDIR /app
# Copy dependencies to reuse the dependency installation layer
COPY --from=dependencies /app/node_modules ./node_modules
COPY . .
RUN npm run build
FROM nginx:alpine
COPY --from=builder /app/build /usr/share/nginx/html
This multi-stage build separates dependency installation into its own stage, which can be cached and reused even when the source code changes.
Using .dockerignore
Use .dockerignore to exclude files that shouldn't affect the build cache:
# .dockerignore file
node_modules
npm-debug.log
.git
.gitignore
README.md
Dockerfile
.dockerignore
*.log
*.tmp
.DS_Store
Excluding files that don't affect the build prevents unnecessary cache invalidation when those files change.
Best Practices for Layer Caching
- Order instructions by stability: Place instructions that change less frequently earlier in the Dockerfile to maximize cache usage.
- Separate dependencies from code: Copy and install dependencies before copying application code to avoid reinstalling dependencies when only code changes.
- Combine related commands: Use a single RUN instruction with multiple commands connected by && for related operations to reduce layer count and prevent partial state caching.
- Use .dockerignore: Exclude files that don't need to be in the build context to prevent unnecessary cache invalidation.
- Use specific base image tags: Use specific version tags for base images instead of 'latest' to ensure consistent parent layers.
- Consider multi-stage builds: Use multi-stage builds to separate build-time dependencies from runtime dependencies, reducing final image size.
- Use BuildKit cache mounts: For package managers and compilers with their own caching mechanisms, use BuildKit cache mounts to persist caches between builds.
- Cache downloaded files: If your build downloads files, consider caching them in a separate layer to avoid re-downloading them on each build.
- Be careful with wildcards: Using wildcards in COPY or ADD instructions can lead to unexpected cache invalidation if files are added or removed.
- Update package managers atomically: For apt, yum, or other package managers, update and install packages in a single RUN instruction to ensure consistency.
Advanced Caching Techniques
Docker BuildKit supports remote caching, allowing you to share cache between different build environments:
# Export cache to a registry
docker buildx build --push --cache-to type=registry,ref=myregistry.com/myapp:cache .
# Import cache from a registry
docker buildx build --cache-from type=registry,ref=myregistry.com/myapp:cache .
This is especially useful in CI/CD environments, where builds often run on clean machines.
BuildKit can also store cache metadata in the image itself, allowing you to use the cache when pulling images:
# Enable inline cache
docker buildx build --push --cache-to type=inline .
When you later use this image as a cache source, BuildKit will extract the cache metadata from the image.
You can share build cache between different projects on the same machine:
# Export cache to a local directory
docker buildx build --cache-to type=local,dest=./docker-cache .
# Import cache from a local directory
docker buildx build --cache-from type=local,src=./docker-cache .
BuildKit allows you to rebuild a specific stage in a multi-stage build:
# Rebuild only the 'builder' stage
docker buildx build --target builder .
This can be useful for development workflows where you want to iterate quickly on a specific part of the build.
Troubleshooting Cache Issues
If Docker isn't using the cache when you expect it to, there are several common causes:
- Changed files: Even minor changes to copied files will invalidate the cache for that layer and all subsequent layers.
- ARG or ENV changes: Changes to ARG or ENV values can affect how subsequent commands execute.
- Parent image changes: If you're using a tag like 'latest' for your base image, it may have been updated.
- Pruned cache: Docker may have pruned the cache layers due to space constraints or cleanup operations.
You can use the following techniques to debug cache issues:
- Inspect image history: Use
docker history --no-trunc <image>
to see the complete history of an image, including the exact commands used to create each layer. - Enable BuildKit debugging: Set
BUILDKIT_PROGRESS=plain
for more detailed build output. - Use verbose build mode: Add
--progress=plain
to your buildx build command to see more details about cache usage. - Check file content: Make sure the content of files you're copying hasn't changed, even if the filenames are the same.
Sometimes you may want to manually control which layers get cached and when:
- Force a rebuild: Use
--no-cache
to force Docker to rebuild the entire image without using any cached layers. - Use ARG for cache busting: Set an ARG in your Dockerfile and pass a unique value at build time to invalidate specific layers.
- Clean up the cache: Use
docker builder prune
to clean up the build cache and start fresh.
Layer Caching and Image Size
Layer caching not only affects build speed but also impacts the final image size. Each layer adds to the image size, even if it removes files added by previous layers. This is because Docker layers are additive - they can add files but not truly delete them from previous layers.
To understand how layers affect image size, consider this example:
# Example showing layer impact on size
FROM ubuntu:20.04
# Layer 1: Adds 100MB
RUN apt-get update && apt-get install -y python3
# Layer 2: Adds 200MB
RUN apt-get install -y build-essential
# Layer 3: Removes files but doesn't reduce image size
RUN apt-get clean && rm -rf /var/lib/apt/lists/*
Even though Layer 3 removes files, the image still includes all the data from Layers 1 and 2. To address this, you can use multi-stage builds or combine commands to minimize the number of layers and the final image size.
Notes and Limitations
- Docker caches layers based on instruction strings, not the actual result of executing the instruction. This means that commands like
apt-get update
might use cached results even when newer packages are available. - Cache layers are stored on the host machine and can consume significant disk space over time. Use
docker builder prune
periodically to clean up unused cache layers. - BuildKit caching features like cache mounts and remote caching require BuildKit to be enabled (
DOCKER_BUILDKIT=1
or Docker version 23.0 or later). - Layer caching doesn't help with runtime performance; it only speeds up builds. The number and size of layers don't significantly affect container startup or runtime performance.
- Docker cache is host-specific by default. Builds on different machines won't share cache unless you use BuildKit's remote caching features.
- Some operations, like COPY with wildcards or RUN with commands that reference the filesystem state, can be non-deterministic and lead to inconsistent caching behavior.
- The cache for a layer and all its child layers is invalidated if any parent layer changes. This makes it crucial to order your Dockerfile instructions to maximize cache reuse.