Optimum Web
DevOps 8 min read

Docker Container Crash Loop: Production Back in 47 Minutes for $99

OW

Optimum Web

Senior Node.js / Docker Engineer

A production e-commerce application went down when a Docker container entered a crash loop — restarting every 3 minutes due to OOM (Out of Memory) kills. The client's team could not identify the cause for 4 hours. Optimum Web's senior engineer connected via SSH, identified the root cause (Node.js worker process leaking memory through an unbounded event listener array), and fixed it in 47 minutes for $99 fixed price. The application has been stable for 5 months since, with zero OOM events.

This case study shows exactly what the crash looked like, how we diagnosed it, what the fix was, and how to recognize the same problem on your server.

The Problem: "Container Keeps Restarting Every 3 Minutes"

Friday, 5:17 PM. A logistics SaaS company in Germany noticed their tracking dashboard was down. Docker logs showed the pattern immediately:

text
app_worker  | Killed
app_worker  | OOM: Kill process 1 (node) score 987
app_worker exited with code 137
app_worker  | Starting...
# ... repeats every 2-3 minutes

Exit code 137 = killed by OOM (Out of Memory). The container was configured with a 512MB memory limit. Every 2–3 minutes, the Node.js process consumed all 512MB, was killed by the kernel, restarted, and the cycle repeated.

What the client's team tried (and why it didn't work):

AttemptResult
Increase memory limit to 1GBCrash shifted to every 5 minutes — same problem, more RAM to fill
Restart the entire Docker stackSame crash loop after 3 minutes
Roll back to previous Docker imageSame crash — the bug was in a data dependency, not code version

After 4 hours of downtime, they contacted us.

Why This Happens

Memory leaks in Node.js are notoriously difficult to diagnose because:

JavaScript uses garbage collection — developers assume memory is managed automatically. It is, but only when there are no references to an object. If your code keeps a reference, garbage collection is blocked.

Leaks only appear under production load. Local development handles 5–10 concurrent connections. Production has 1,000+. The leak grows slowly at low traffic, then triggers frequently under load.

OOM kill is silent. The kernel kills the process without warning. Application logs show nothing — the next log line is the container restarting. You never see the root cause in logs.

Common Node.js leak sources: unbounded arrays growing with every request, event listeners not removed on disconnect, global caches without eviction policy, circular references preventing garbage collection.

Our Diagnosis (First 12 Minutes)

bash
# 1. Check which container is crashing
$ docker ps -a --format "{{.Names}}: {{.Status}}"
app_worker: Restarting (137) 45 seconds ago

# 2. Memory at the moment of crash
$ docker stats --no-stream app_worker
# MEM USAGE: 498MB / 512MB — at limit

# 3. Heap growth pattern
$ docker exec app_worker node -e "console.log(process.memoryUsage())"
# heapUsed: 487MB — almost all memory is heap

# 4. Generate heap snapshot via Chrome DevTools
# Found: Array "eventListeners" — 340,000 entries (growing ~1,000/min)

Root cause: A WebSocket handler was adding an event listener on every new connection but never removing it on disconnect. After 340,000 connections over 5 hours of production traffic, the listener array consumed all available memory.

The Fix (35 Minutes)

javascript
// BEFORE (bug):
socket.on('connection', (ws) => {
  process.on('SIGTERM', () => ws.close()); // Added every connection, never removed
});

// AFTER (fix):
socket.on('connection', (ws) => {
  const cleanup = () => ws.close();
  process.once('SIGTERM', cleanup); // Auto-removes after firing

  ws.on('close', () => {
    process.removeListener('SIGTERM', cleanup); // Explicit cleanup on disconnect
  });
});

Additionally applied:

- Set --max-old-space-size=384 to trigger GC pressure earlier, before reaching the container limit - Added memory monitoring alert (Telegram notification when heap > 80%) - Configured Docker restart policy with exponential backoff delay

[Fix My Docker — $99](/fixed-price/fix-docker-issues#checkout) · Root cause diagnosis included · 14-day warranty

The Result

MetricBeforeAfter
Container restarts per hour20–300
Memory usage (steady state)498MB → OOM every 3 min180–220MB stable
Downtime4 hours (and counting)0 (5 months stable)
Time to resolution4+ hours (no result)47 minutes

Cost & Timeline

ItemDetail
ServiceOW-BUG-01: Fix Docker & Docker Compose Issues
Price$99 fixed
Actual time47 minutes (SSH to resolution)
EngineerSenior Node.js / Docker specialist
IncludedRoot cause diagnosis, code fix, memory monitoring, Docker config update
Warranty14 days

🐛 Docker Container Crashing? We Fix It Same Day — $99.

OOM kills, network errors, permission denied, crash loops — root cause diagnosed and fixed. Not just restarted.

  • SSH diagnostic within 24 hours of order
  • Root cause identification (heap snapshot, logs, traces)
  • Code fix included (not just Docker restart)
  • Memory monitoring setup
  • Docker restart policy configuration
  • 14-day warranty

$99 · Service ID: OW-BUG-01 · 14-day warranty

Fix My Docker — $99 →

Production Completely Down?

If your production is down right now and every minute costs you revenue, use [Emergency Server Recovery — $199](/fixed-price/quickfix-server-recovery#checkout) instead. Response within 1–4 hours. We SSH in, stabilize production first, diagnose root cause second.

Could This Be Your Problem? (5 Warning Signs)

  • Docker container restarts periodically (check docker ps -a for exit code 137)
  • Memory usage climbs steadily then drops to zero (the restart cycle)
  • The problem only appears in production, never in local development
  • Increasing container memory limit only delays the crash by minutes
  • Application logs show no errors before the crash — OOM kill is silent

If 2 or more apply, you likely have a memory leak.

[Fix Docker Issues — $99](/fixed-price/fix-docker-issues#checkout) · Root cause + code fix · 14-day warranty → [Emergency Recovery — $199](/fixed-price/quickfix-server-recovery#checkout) · 1–4 hour response · Production down now

DockerNode.jsMemory LeakOOMCrash LoopCase StudyBugfix

Frequently Asked Questions

How fast can you respond if my production is currently down?
For Docker bug fixes (OW-BUG-01, $99): we start diagnosis within 24 hours. For production emergencies (OW-EMR-01, $199): we SSH in within 1–4 hours of order confirmation.
What if the fix requires code changes in my application?
Code fixes are included in the $99 price, as long as the fix is related to the Docker/container issue. If the root cause requires days of refactoring — we tell you honestly and quote a separate engagement.
Do you support Docker Swarm and Kubernetes?
Docker and Docker Compose are covered by OW-BUG-01 ($99). Kubernetes-specific issues are more complex and require a custom quote. Contact us with your setup details.
Can you help prevent future Docker crashes?
Yes. As part of every fix, we configure basic memory monitoring (alerts via Telegram or email when heap usage exceeds threshold) and document prevention recommendations. For ongoing monitoring, see OW-AI-14 Website Uptime Monitor ($190).
What programming languages can you debug inside Docker containers?
Node.js, Java, PHP, Python, .NET, Go, and Ruby. Our team covers all major stacks. The diagnostic approach (heap snapshots, memory profiling, log analysis) is adapted per language but the $99 price is the same.