Docker: A Developer's Guide
What is Docker, and why should you, as a software developer, understand how it works? Have you ever used Docker commands and wondered, “Why does this even work?” or “Why doesn’t it work?” You don’t need to be an expert, but knowing the fundamentals will make your life easier as a software developer. This knowledge will help you write efficient Docker images and resolve bugs and performance issues.
What is Docker
One thing we need to. Make clear is that Docker is not a virtual machine (VM). A VM emulates an entire machine, computer - its own kernel, its own operating system(OS) and its own hardware abstraction/ That’s heavy for the computer to run and execute. Docker takes a different approach compered to VM’s.
Docker uses features built into the Linux kernel — specifically namespaces and cgroups — to isolate processes. Your container is just a process running on your host machine, but it thinks it’s alone in the world. It has its own filesystem, its own network, its own process tree.
| Virtual Machine | Docker Container | |
|---|---|---|
| Boots | A full OS | A process |
| Startup time | Minutes | Milliseconds |
| Size | Gigabytes | Megabytes |
| Isolation | Hardware-level | Kernel-level |
This is why Docker is so fast and lightweight. You’re not booting a computer — you’re starting a process.
Images vs Contaienrs
Lets make clear what the main difference between images and containers are since a lot of people get confused between them and make it clear.
- An image is a blueprint — a read-only snapshot of a filesystem and some metadata.
- A container is a running instance of that image — it’s the image brought to life.
You can think of it like this:
Image → like a class in OOP
Container → like an instance of that class
You can run many containers from the same image, and they don’t interfere with each other. When a container writes data, those changes live only in that container — the original image is never touched.
Pull an image from Docker Hub
docker pull nginx
Run a container from that image
docker run -d -p 8080:80 nginx
Run a second, completely independent container from the same image
docker run -d -p 8081:80 nginx
Both containers share the same image underneath, but live completely separate lives.
🧅 Layers — Why Docker Is Efficient
Images aren’t monolithic blobs. They’re made of layers, stacked on top of each other. Each layer represents a change to the filesystem on your machine.
This is a critical concept because:
- Layers are cached. If a layer hasn’t changed, Docker reuses it.
- Layers are shared between images. If two images use the same base, they don’t duplicate that data.
Here’s what that looks like in practice:
Layer 4: Copy your app code ← changes often
Layer 3: Install npm dependencies ← changes sometimes
Layer 2: Install Node.js ← rarely changes
Layer 1: Ubuntu base image ← almost never changes
Docker builds from top to bottom, and as soon as one layer changes, every layer below it must be rebuilt. This is why instruction order in your Dockerfile matters enormously for build speed.
For example, here is an illustration of a slow Dockerfile.
** 🚨 Slow Dockerfile (wrong order):**
FROM node:20
COPY . . # Copies everything first
RUN npm install # Runs install AFTER — cache busts every time code changes!
CMD ["node", "index.js"]
✅ Fast Dockerfile (correct order):
FROM node:20
COPY package*.json ./ # Copy only what npm needs first
RUN npm install # This layer is now cached as long as package.json doesn't change
COPY . . # Copy your app code last
CMD ["node", "index.js"]
In the optimized version, npm install is only re-run when your dependencies actually change — not every time you edit a .js file.
📄 The Dockerfile — Thinking in Build Steps
A Dockerfile is a recipe for building an image. Every instruction creates a new layer. Here’s what the most important instructions actually do:
| Instruction | What it does |
|---|---|
FROM | Sets the base image to build on top of |
RUN | Executes a shell command during the build |
COPY | Copies files from your machine into the image |
ENV | Sets environment variables available at build and runtime |
EXPOSE | Documents which port the container listens on (informational) |
CMD | The default command to run when a container starts (overridable) |
ENTRYPOINT | The fixed command that always runs (CMD becomes its arguments) |
Understanding Docker Layers
Each instruction in a Dockerfile creates a new layer in the image. Think of layers like transparent sheets stacked on top of each other — each one adds or modifies something from the previous layers. This layered architecture is what makes Docker images efficient:
- Layers are cached: If nothing changes in a layer, Docker reuses it from cache
- Layers are shared: Multiple images can share common base layers
- Order matters: Put frequently changing things (like your app code) at the bottom
Real-World Example: Node.js Application
Here’s a production-ready Dockerfile for a Node.js app with explanations:
# 1. Start from official Node image
FROM node:20-alpine
# 2. Set working directory inside the container
WORKDIR /app
# 3. Set an environment variable
ENV NODE_ENV=production
# 4. Copy dependency files and install — cached layer
COPY package*.json ./
RUN npm ci --only=production
# 5. Copy the rest of your source code
COPY . .
# 6. Document the port
EXPOSE 3000
# 7. Start the app
CMD ["node", "server.js"]
Why this order?
- Dependencies change less often than source code
- By copying
package*.jsonfirst and runningnpm install, we create a cached layer - If only your source code changes, Docker skips the npm install step!
Additional Important Instructions
| Instruction | What it does | When to use |
|---|---|---|
ARG | Build-time variables (not available at runtime) | API keys for private registries, version numbers |
USER | Sets the user/UID to run as | Security: avoid running as root |
VOLUME | Creates a mount point for external volumes | Database files, uploaded content |
HEALTHCHECK | Defines how Docker checks if container is healthy | Production monitoring |
LABEL | Adds metadata to the image | Version info, maintainer, description |
CMD vs ENTRYPOINT — The Key Difference
Both define what runs when a container starts, but they behave differently:
# CMD — easily overridden at runtime
CMD ["node", "server.js"]
# docker run myimage node other-script.js ← works fine, overrides CMD
# ENTRYPOINT — the container IS this command
ENTRYPOINT ["node"]
CMD ["server.js"] # default argument to ENTRYPOINT
# docker run myimage other-script.js ← runs: node other-script.js
When to use which?
- Use
ENTRYPOINTwhen your container represents a specific tool or process that should always run (e.g., a database server, a CLI tool) - Use
CMDwhen you want a sensible default that’s easy to override (e.g., development vs production scripts)
Best Practices for Writing Dockerfiles
-
Use specific base image tags
# ❌ Bad - might change unexpectedly FROM node:latest # ✅ Good - predictable FROM node:20.12.0-alpine -
Combine RUN commands to reduce layers
# ❌ Creates 3 layers RUN apt-get update RUN apt-get install -y curl RUN apt-get clean # ✅ Creates 1 layer RUN apt-get update && \ apt-get install -y curl && \ apt-get clean && \ rm -rf /var/lib/apt/lists/* -
Use .dockerignore to exclude files
# .dockerignore node_modules .git .env *.log -
Multi-stage builds for smaller images
# Build stage FROM node:20-alpine AS builder WORKDIR /app COPY package*.json ./ RUN npm ci COPY . . RUN npm run build # Production stage FROM node:20-alpine WORKDIR /app COPY package*.json ./ RUN npm ci --only=production COPY --from=builder /app/dist ./dist CMD ["node", "dist/server.js"]
Common Patterns by Language/Framework
Python/Django:
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 8000
CMD ["gunicorn", "myapp.wsgi:application", "--bind", "0.0.0.0:8000"]
Go:
FROM golang:1.21-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN go build -o main .
FROM alpine:latest
RUN apk --no-cache add ca-certificates
WORKDIR /root/
COPY --from=builder /app/main .
CMD ["./main"]
Debugging Dockerfile Builds
When things go wrong, these techniques help:
# Build with detailed output
docker build --progress=plain --no-cache -t myapp .
# Debug a specific build stage
docker build --target builder -t myapp-debug .
# Inspect intermediate layers
docker history myapp
# Run commands in a failed build container
docker run -it <image-id-from-failed-step> /bin/sh
Remember: A well-written Dockerfile is the foundation of a reliable containerized application. Take time to understand each instruction and optimize for both build speed and final image size.
💾 Volumes — Solving the Persistence Problem
Containers are ephemeral. When a container dies, any data it wrote to its filesystem is gone. Forever.
This is a feature, not a bug — it keeps containers predictable and stateless. But obviously, for databases and file uploads, you need data to survive restarts. That’s what volumes are for.
A volume is a directory that lives outside the container on the host filesystem, but is mounted into the container so it can read and write there.
# Create a named volume
docker volume create mydata
# Mount it into a container at /app/data
docker run -v mydata:/app/data myimage
# Use a bind mount (maps a specific host folder)
docker run -v /home/user/myproject:/app myimage
Now even if the container is destroyed and recreated, the data in the volume persists.
Host Machine Container
───────────────────────────────────────────
/var/lib/docker/volumes/mydata ←→ /app/data
↑
Data lives here, safe and sound
Rule of thumb: Anything stateful (databases, uploaded files, logs) should live in a volume. Your app code and dependencies should be baked into the image.
🌐 Networking — How Containers Talk to Each Other
By default, Docker creates a private internal network. Containers on the same network can talk to each other by name — Docker has a built-in DNS resolver that maps container names to their IP addresses.
# Create a custom network
docker network create myapp-network
# Start a database on that network
docker run -d \
--name postgres-db \
--network myapp-network \
-e POSTGRES_PASSWORD=secret \
postgres:16
# Start your app on the same network
docker run -d \
--name my-api \
--network myapp-network \
-p 3000:3000 \
myimage
Now inside my-api, you can connect to the database using the hostname postgres-db:
// In your Node.js app — no IP addresses needed!
const connectionString = "postgresql://postgres:secret@postgres-db:5432/mydb"
Docker resolves postgres-db to the correct container IP automatically. This is incredibly powerful — your app doesn’t need to know or care about internal IPs.
🎼 Docker Compose — Orchestrating the Whole Thing
Running every container manually with docker run gets unwieldy fast. Docker Compose lets you define your entire multi-container application in a single docker-compose.yml file.
Docker Compose is more commonly used locally and for development purposes, but it is less common in production environments. For production-grade orchestration, teams typically migrate to Kubernetes or Docker Swarm, though Compose can work for simpler production deployments.
Complete Example: Node.js API with PostgreSQL and Redis
Here’s a production-ready example with health checks and proper configurations:
# docker-compose.yml
services:
api:
build: . # Build from the Dockerfile in this directory
ports:
- "3000:3000"
environment:
- NODE_ENV=development
- DATABASE_URL=postgresql://postgres:secret@db:5432/mydb
- REDIS_URL=redis://cache:6379
depends_on:
db:
condition: service_healthy # Wait for DB to be ready
cache:
condition: service_healthy
volumes:
- .:/app # Bind mount for live code reloading
- /app/node_modules # Keep container's node_modules
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
db:
image: postgres:16
environment:
POSTGRES_PASSWORD: secret
POSTGRES_DB: mydb
volumes:
- postgres-data:/var/lib/postgresql/data # Persist database data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 10s
timeout: 5s
retries: 5
cache:
image: redis:7-alpine
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 10s
timeout: 5s
retries: 3
volumes:
postgres-data: # Named volume, managed by Docker
networks:
default:
driver: bridge
Essential Docker Compose Commands
# Start everything in the background
docker compose up -d
# See what's running
docker compose ps
# Stream logs from all services
docker compose logs -f
# Stream logs from specific service
docker compose logs -f api
# Execute commands in running containers
docker compose exec api npm test
# Rebuild images before starting
docker compose up -d --build
# Scale a service to multiple instances
docker compose up -d --scale api=3
# Tear everything down (volumes are preserved)
docker compose down
# Tear down AND delete volumes (fresh start)
docker compose down -v
All services are automatically placed on the same network, so api can reach db and cache by name — exactly as we covered in the networking section.
Advanced Docker Compose Features
1. Health Checks and Dependencies
Health checks ensure services are actually ready, not just started:
services:
api:
depends_on:
db:
condition: service_healthy # Waits for health check to pass
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s # Grace period for startup
2. Multiple Compose Files for Different Environments
Use override files to customize for different environments:
# docker-compose.yml (base configuration)
services:
api:
image: myapp:latest
environment:
- LOG_LEVEL=info
# docker-compose.override.yml (auto-loaded for development)
services:
api:
build: .
volumes:
- .:/app
environment:
- LOG_LEVEL=debug
# docker-compose.prod.yml (production overrides)
services:
api:
restart: always
environment:
- LOG_LEVEL=warning
deploy:
replicas: 3
# Development (uses base + override automatically)
docker compose up
# Production (explicitly specify files)
docker compose -f docker-compose.yml -f docker-compose.prod.yml up -d
3. Profiles for Conditional Services
Run different service combinations based on profiles:
services:
api:
image: myapp:latest
db:
image: postgres:16
debug-tools:
image: busybox
profiles: ["debug"] # Only starts when debug profile is active
monitoring:
image: prometheus
profiles: ["monitoring", "production"]
# Start only core services
docker compose up
# Include debug tools
docker compose --profile debug up
# Include monitoring stack
docker compose --profile monitoring up
4. Resource Limits and Reservations
Control resource usage for production deployments:
services:
api:
image: myapp:latest
deploy:
resources:
limits:
cpus: "0.5"
memory: 512M
reservations:
cpus: "0.25"
memory: 256M
Docker Compose vs Other Orchestrators
| Feature | Docker Compose | Kubernetes | Docker Swarm |
|---|---|---|---|
| Complexity | Simple YAML, easy to learn | Steep learning curve | Moderate complexity |
| Use Case | Development, small production | Enterprise production | Simple production clusters |
| Scaling | Single host only | Multi-host, auto-scaling | Multi-host, manual scaling |
| Self-healing | Basic restart policies | Advanced with pod management | Basic service recovery |
| Load Balancing | Manual with nginx/HAProxy | Built-in service mesh | Built-in simple LB |
| Setup Time | Minutes | Hours to days | 30 minutes |
Best Practices for Docker Compose
-
Always specify version and use latest schema
version: "3.8" # or newer -
Use environment files for sensitive data
services: api: env_file: - .env # Git-ignored file with secrets - .env.local # Local overrides -
Leverage build arguments for flexible images
services: api: build: context: . args: NODE_VERSION: 20 APP_ENV: ${APP_ENV:-development} -
Use explicit container names for easier debugging
services: api: container_name: myapp_api_1 -
Define restart policies for production
services: api: restart: unless-stopped # or "always" for critical services
Common Patterns and Examples
Full-Stack Application with Frontend
services:
frontend:
build: ./frontend
ports:
- "80:80"
depends_on:
- api
api:
build: ./backend
environment:
- DATABASE_URL=postgresql://user:pass@db:5432/app
depends_on:
db:
condition: service_healthy
db:
image: postgres:16
volumes:
- db-data:/var/lib/postgresql/data
volumes:
db-data:
Microservices with Service Discovery
services:
gateway:
image: nginx
ports:
- "80:80"
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf
auth-service:
build: ./services/auth
expose:
- "3001" # Internal port only
user-service:
build: ./services/users
expose:
- "3002"
order-service:
build: ./services/orders
expose:
- "3003"
Debugging Docker Compose Applications
# Validate compose file syntax
docker compose config
# See real-time events
docker compose events
# Run one-off commands
docker compose run --rm api npm test
# Start specific services only
docker compose up db cache
# Remove orphan containers
docker compose up --remove-orphans
Remember: Docker Compose excels at defining relationships between containers and managing them as a unit. While it’s primarily a development tool, it can handle simple production deployments. For complex production needs requiring high availability, auto-scaling, or multi-host deployments, consider graduating to Kubernetes or Docker Swarm.
🧠 Putting It All Together
Here’s the mental model to keep in your head:
Dockerfile
↓ docker build
Image ──────────────────────── (Layers, cached, shared)
↓ docker run / docker compose up
Container ────────────────────── (Isolated process)
│
├── Network ──────────────── (Talk to other containers by name)
└── Volume ──────────────── (Persist data outside the container)
- Images are built from
Dockerfiles— layer by layer. - Containers are running instances of images — ephemeral by default.
- Volumes give containers a memory — data that outlives them.
- Networks give containers a voice — they can find and talk to each other.
- Docker Compose is the conductor — it wires everything together.
Once these ideas click, Docker stops feeling like magic (or black magic) and becomes a genuinely elegant tool. The commands stop being something you copy from Stack Overflow and start being something you can reason about.