Docker Full Tutorial — Part 3: Writing Dockerfiles

By Suraj Ahir 2025-11-09 11 min read

← Part 2Docker Tutorial · Part 3 of 12Part 4 →
Docker Full Tutorial — Part 3: Writing Dockerfiles
Docker Full Tutorial — Part 3: Writing Dockerfiles

Most people can write a Dockerfile that works. Fewer people write Dockerfiles that are fast to build, small in size, and secure. The difference between a naive Dockerfile and a well-crafted one is often the difference between a 1.2GB image that takes 8 minutes to build and a 120MB image that builds in 45 seconds. In production, that difference matters enormously — in CI pipeline speed, in deployment time, and in cloud storage costs.

I once inherited a project where the Dockerfile was copying the entire repository — including a 400MB node_modules directory and a .git folder — into the image before installing dependencies. The build took 12 minutes. After rewriting the Dockerfile properly, the same build took 90 seconds. Same application, dramatically better Dockerfile.

The Anatomy of a Dockerfile

Every Dockerfile starts with a FROM instruction specifying the base image. Everything else builds on top of that base. Here are the most important instructions:

A Well-Optimised Node.js Dockerfile

Optimised Node.js Dockerfile
# Use specific version tag, not 'latest'
FROM node:20-alpine

# Set working directory
WORKDIR /app

# COPY package files FIRST (before source code)
# This lets Docker cache the npm install layer
COPY package.json package-lock.json ./

# Install dependencies (cached unless package.json changes)
RUN npm ci --only=production

# Now copy source code (changes more often)
COPY . .

# Create non-root user for security
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
USER appuser

EXPOSE 3000
CMD ["node", "src/index.js"]

The critical optimisation here is the order of COPY and RUN instructions. We copy package.json and run npm install before copying the rest of the source code. This way, the expensive npm install step is only re-run when package.json changes — not every time you change a source file. On a typical development cycle where you push code changes dozens of times per day, this cache hit means builds complete in 10 seconds instead of 90.

The .dockerignore File

Before Docker sends files to the daemon to build an image, it reads your .dockerignore file and excludes matching paths. Without this, you might accidentally include node_modules (hundreds of MB), .git directories, local configuration, test data, and build artifacts.

.dockerignore
node_modules
.git
.gitignore
.env
*.log
dist
build
coverage
.DS_Store
README.md
docker-compose*.yml

Choosing the Right Base Image

The single biggest impact on image size is your choice of base image. Compare these Python base images:

Base image size comparison
python:3.11          # Full Debian - ~900MB
python:3.11-slim     # Slim Debian - ~125MB
python:3.11-alpine   # Alpine Linux - ~50MB

Alpine images use musl libc instead of glibc, which sometimes causes compatibility issues with compiled packages. The slim variant is usually the best balance — it removes documentation and extras from the full image but keeps glibc compatibility.

Cleaning Up in RUN Instructions

Each RUN instruction creates a new layer. If you install packages in one layer and delete cache in another, the cache data is still in the first layer — it just becomes inaccessible. Always combine install and cleanup in the same RUN instruction.

Correct way to install and clean up
# WRONG — cache is in layer 1, even though deleted in layer 2
RUN apt-get update && apt-get install -y curl
RUN rm -rf /var/lib/apt/lists/*

# CORRECT — all in one layer, truly removed
RUN apt-get update && apt-get install -y curl \
    && rm -rf /var/lib/apt/lists/*

Common Dockerfile Mistakes

Using latest as an image tag is dangerous — the image can change unexpectedly when you rebuild. Always pin specific versions: node:20.11-alpine not node:latest.

Running containers as root is a security risk. Create a dedicated non-root user and switch to it with the USER instruction before the CMD.

Copying unnecessary files bloats your image. Always use .dockerignore and copy only what the running application actually needs.

Using ADD when COPY is sufficient. COPY is explicit and predictable. ADD has extra features (URL downloads, tar extraction) that create unpredictable behaviour if you are not careful.

In Part 4, we tackle Docker Volumes — how to persist data and share files between containers and the host machine. This is essential for running databases and any stateful applications in Docker.

Frequently Asked Questions

What is a Dockerfile?

A Dockerfile is a text file containing step-by-step instructions for building a Docker image. Each instruction (FROM, RUN, COPY, CMD, etc.) creates a layer in the image. Docker reads the Dockerfile top to bottom and executes each instruction to produce the final image.

What is Docker layer caching and why does it matter?

Docker caches each layer of a build. If a layer has not changed since the last build, Docker reuses the cached version instead of rebuilding it. This makes subsequent builds dramatically faster. The key is ordering your Dockerfile instructions from least-frequently-changed to most-frequently-changed — so copy dependency files before copying source code.

What is a .dockerignore file?

A .dockerignore file tells Docker which files and directories to exclude when building an image. Similar to .gitignore, it prevents node_modules, .git directories, local config files, and build artifacts from being copied into the image, keeping the image small and build context fast.

What is the difference between CMD and ENTRYPOINT?

CMD specifies the default command to run when a container starts, and can be overridden at runtime. ENTRYPOINT specifies the executable that always runs and cannot be overridden without --entrypoint flag. A common pattern is to use ENTRYPOINT for the executable and CMD for default arguments.

How do I reduce my Docker image size?

Use slim or alpine base images instead of full OS images. Run all apt-get commands in a single RUN instruction to minimize layers. Delete package manager caches in the same RUN instruction. Copy only necessary files. Use multi-stage builds to separate build tools from the final runtime image.

Key takeaways

Continue reading
Part 4 — Volumes and Persistence
Stop losing data when containers stop.
Suraj Ahir — author of SRJahir Tech

Written by

Suraj Ahir

Cloud & DevOps engineer running four live production services on my own AWS infrastructure. I write everything on this site myself — no ghostwriters, no AI filler.

← Part 2Docker Tutorial · Part 3 of 12Part 4 →
← Back to Blog
Disclaimer: This content is for educational purposes only. SRJahir Tech does not guarantee any specific outcome, job placement, or exam result.