Why Scale?
A single Node.js process on one server has hard limits:
- CPU β one core (event loop is single-threaded)
- Memory β limited by server RAM (e.g., 512MB on a small VPS)
- Connections β limited by file descriptors (default ~1024 on Linux)
- Throughput β limited by event loop capacity (~10K req/s on one core)
When your app exceeds any of these, you need to scale.
Single Node Process Scaled App
ββββββββββββββββββββββ ββββββββββββββββββββββ
β 1 CPU Core β β Load Balancer β
β 512MB RAM β β β β
β ~1024 sockets β β ββββββΌβββββ β
β ~5000 req/s β β β β β β
ββββββββββββββββββββββ β βββ΄β βββ΄β βββ΄β β
β βW1β βW2β βW3β β
β ββββ ββββ ββββ β
β 3 CPU Cores β
β 1.5GB RAM total β
β ~15,000 req/s β
ββββββββββββββββββββββ
Vertical vs Horizontal Scaling
| Strategy | What It Means | Node.js Approach |
|---|---|---|
| Vertical | Bigger server (more CPU, RAM) | Upgrade VPS from 2GB to 16GB |
| Horizontal | More servers (add instances) | Load balancer + multiple app servers |
Vertical Scaling: Horizontal Scaling:
ββββββββββββββββββ ββββββββββββββββββ
β 1 Big Server β β Load Balancer β
β 16 cores β βββββ¬ββββ¬ββββ¬βββββ
β 64GB RAM β ββββ΄β ββ΄β ββ΄βββ
ββββββββββββββββββ βS1 β βS2β βS3 β
βββββ ββββ βββββ
1. Vertical Scaling β The Node.js Cluster Module
Before adding more machines, use all the cores on one machine:
const cluster = require('cluster');
const os = require('os');
if (cluster.isMaster) {
const cpuCount = os.cpus().length;
console.log(`Master ${process.pid} forking ${cpuCount} workers`);
for (let i = 0; i < cpuCount; i++) {
cluster.fork();
}
cluster.on('exit', (worker) => {
console.log(`Worker ${worker.process.pid} died, restarting`);
cluster.fork();
});
} else {
require('./app.js'); // Your app β each worker runs independently
}
PM2 Cluster Mode (Simpler)
# PM2 handles clustering with zero code changes
pm2 start app.js -i max # One worker per CPU core
pm2 start app.js -i 4 # Exactly 4 workers
pm2 start app.js -i 0 # Auto-detect (max - 1)
// ecosystem.config.js
module.exports = {
apps: [{
name: 'my-api',
script: 'app.js',
instances: 'max',
exec_mode: 'cluster',
}],
};
Cluster Limitations
- In-memory state is NOT shared β each worker has its own heap. Session data, caches, and counters are duplicated. Use Redis or a database for shared state.
- Sticky sessions β if using WebSockets or long-lived sessions, configure the load balancer for sticky sessions (same client β same worker).
2. Horizontal Scaling β Load Balancing
When one machine isnβt enough, distribute traffic across multiple machines:
ββββββββββββββββββ
β Load Balancer β
β (Nginx / HAProxy / AWS ALB) β
ββββ¬βββββββ¬βββββββ¬βββ
β β β
βββββββ΄β βββββ΄βββ ββ΄ββββββ
β App1 β β App2 β β App3 β
β node β β node β β node β
ββββββββ ββββββββ ββββββββ
Nginx as a Load Balancer
# /etc/nginx/nginx.conf
http {
upstream my_api {
# Round-robin (default)
server app1.internal:3000;
server app2.internal:3000;
server app3.internal:3000;
# Optional: sticky sessions via IP hash
# ip_hash;
}
server {
listen 80;
server_name api.example.com;
location / {
proxy_pass http://my_api;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection 'upgrade';
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
}
}
Load Balancing Strategies
| Strategy | Behaviour | Use Case |
|---|---|---|
| Round-robin | Requests distributed in order | Most common, works for stateless APIs |
| Least connections | Send to server with fewest active connections | Uneven request durations |
| IP hash | Same client IP always goes to same server | Session persistence (sticky sessions) |
| Random | Distribute randomly | Simple, avoids edge cases |
3. Statelessness β The Key to Scaling
For horizontal scaling to work, your app must be stateless:
// β BAD β state stored in process memory
const sessions = new Map();
app.post('/login', (req, res) => {
const token = generateToken();
sessions.set(token, { user: req.body.username });
// If this request goes to server 2 on next call, session is lost!
res.json({ token });
});
// β
GOOD β state stored externally
app.post('/login', async (req, res) => {
const token = generateToken();
await redis.set(`session:${token}`, JSON.stringify({ user: req.body.username }), 'EX', 3600);
// Any server can look up the session
res.json({ token });
});
app.get('/profile', async (req, res) => {
const session = await redis.get(`session:${req.headers.authorization}`);
// Works regardless of which server handles the request
});
Stateless Checklist
| Resource | Local (Bad for Scale) | External (Good for Scale) |
|---|---|---|
| Sessions | In-memory Map | Redis, Memcached |
| File uploads | Local disk | S3, Cloud Storage |
| Cache | In-memory object | Redis, CDN |
| Queues | In-process array | RabbitMQ, Redis, SQS |
| Logs | Local file | stdout β log aggregator |
4. Reverse Proxy
A reverse proxy sits in front of your Node.js app and handles concerns that Node.js shouldnβt:
server {
listen 443 ssl;
server_name api.example.com;
# SSL termination
ssl_certificate /etc/letsencrypt/live/api.example.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/api.example.com/privkey.pem;
# Static files β serve directly, don't hit Node.js
location /static/ {
root /var/www/public;
expires 365d;
add_header Cache-Control "public, immutable";
}
# API requests β proxy to Node.js
location /api/ {
proxy_pass http://localhost:3000;
proxy_set_header X-Forwarded-Proto $scheme;
# Rate limiting at proxy level
limit_req zone=api burst=20 nodelay;
}
# Compression at proxy level
gzip on;
gzip_types text/plain application/json application/javascript text/css;
}
What the reverse proxy handles:
- SSL/TLS termination β Node.js doesnβt handle SSL directly
- Static file serving β Nginx serves static assets 10x faster than Node.js
- Compression β gzip at proxy level, offloads CPU work from Node.js
- Rate limiting β block malicious traffic before it reaches your app
- Request buffering β handles slow clients so Node.js doesnβt wait
5. Microservices Architecture
Instead of one big app, split into small, independent services:
Monolith: Microservices:
ββββββββββββββββββββββββ ββββββββββ ββββββββββ
β User Management β β Auth β β Users β
β Product Catalog β β Serviceβ β Serviceβ
β Order Processing β ββββββββββ ββββββββββ
β Payment β ββββββββββ ββββββββββ
β Notifications β β Orders β βPayment β
β Analytics β β Serviceβ βService β
ββββββββββββββββββββββββ ββββββββββ ββββββββββ
β
ββββββ΄βββββ
β Message β
β Queue β
βββββββββββ
Communication Between Services
// Synchronous β HTTP API call
const response = await fetch('http://user-service:3001/api/users/42');
const user = await response.json();
// Asynchronous β message queue
const amqp = require('amqplib');
const connection = await amqp.connect('amqp://rabbitmq');
const channel = await connection.createChannel();
// Publish event
channel.publish('orders', 'order.created', Buffer.from(JSON.stringify(order)));
// Consume event (in notification service)
channel.consume('order.created', (msg) => {
const order = JSON.parse(msg.content.toString());
await sendEmail(order.userEmail, 'Order confirmed!');
channel.ack(msg);
});
Microservices Pros & Cons
| Pros | Cons |
|---|---|
| Independent scaling (scale only busy services) | Distributed systems complexity |
| Independent deployment | Network latency between services |
| Language-agnostic (polyglot) | Data consistency challenges |
| Smaller codebases per team | Testing requires integration tests |
| Fault isolation (one service crash β all crash) | Operational overhead (monitoring, tracing) |
6. Auto-Scaling
Automatically add or remove servers based on load:
# AWS Auto Scaling config (conceptual)
AutoScalingGroup:
MinSize: 2
MaxSize: 10
ScalingPolicies:
- PolicyName: ScaleOut
MetricType: CPUUtilization
TargetValue: 70
ScaleInCooldown: 60
- PolicyName: ScaleIn
MetricType: CPUUtilization
TargetValue: 30
ScaleInCooldown: 120
For simpler setups, use a platform that auto-scales:
- Railway β auto-scales based on CPU/memory
- Fly.io β auto-scales to zero, wakes on request
- Heroku β auto-scaling dynos (paid addon)
- AWS Elastic Beanstalk β auto-scaling with CloudWatch
7. Database Scaling
When your database becomes the bottleneck:
Read Replicas: Sharding:
ββββββββββ ββββββββββ
β Master ββββΊ Replica 1 β Shard 1 β Users AβM
β (write)ββββΊ Replica 2 ββββββββββ€
βββββββββββββΊ Replica 3 β Shard 2 β Users NβZ
(reads distributed) ββββββββββ
// Read/write splitting
const { Pool } = require('pg');
const writerPool = new Pool({ connectionString: process.env.DB_WRITER_URL });
const readerPool = new Pool({ connectionString: process.env.DB_READER_URL });
async function getUser(id) {
const { rows } = await readerPool.query('SELECT * FROM users WHERE id = $1', [id]);
return rows[0];
}
async function updateUser(id, data) {
const { rows } = await writerPool.query('UPDATE users SET name = $1 WHERE id = $2 RETURNING *', [data.name, id]);
return rows[0];
}
Scaling Decision Tree
Is your app slow?
β
βββ Profile the bottleneck
β
βββ Is it CPU? β Cluster mode (PM2 -i max)
β
βββ Is it memory? β Bigger server or Redis cache
β
βββ Is it database? β Indexes β Read replicas β Sharding
β
βββ Is it network? β CDN, compression, keep-alive
β
βββ Is it one server's limit? β Horizontal scaling with load balancer
Key Takeaways
- Vertical scaling first β use PM2 cluster mode to utilise all CPU cores on one machine
- Cluster mode does NOT share memory β use Redis for sessions, caches, and shared state
- Horizontal scaling requires a load balancer (Nginx, HAProxy, AWS ALB)
- Statelessness is essential for horizontal scaling β sessions go in Redis, files go in S3
- Reverse proxy (Nginx) handles SSL, static files, compression, rate limiting β offloads Node.js
- Microservices enable independent scaling but add operational complexity
- Auto-scaling adjusts capacity based on metrics (CPU, request rate) β use it for variable traffic
- Database scaling (read replicas, sharding) is often the next bottleneck after app servers
- Always measure before scaling β profile to find the real bottleneck; scaling the wrong layer wastes resources