Scaling Node.js Applications Β· Astro Tech Blog

Why Scale?

A single Node.js process on one server has hard limits:

  • CPU β€” one core (event loop is single-threaded)
  • Memory β€” limited by server RAM (e.g., 512MB on a small VPS)
  • Connections β€” limited by file descriptors (default ~1024 on Linux)
  • Throughput β€” limited by event loop capacity (~10K req/s on one core)

When your app exceeds any of these, you need to scale.

Single Node Process              Scaled App
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”           β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚    1 CPU Core       β”‚           β”‚   Load Balancer     β”‚
β”‚    512MB RAM        β”‚           β”‚         β”‚           β”‚
β”‚    ~1024 sockets    β”‚           β”‚    β”Œβ”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”      β”‚
β”‚    ~5000 req/s      β”‚           β”‚    β”‚    β”‚    β”‚      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜           β”‚  β”Œβ”€β”΄β” β”Œβ”€β”΄β” β”Œβ”€β”΄β”    β”‚
                                  β”‚  β”‚W1β”‚ β”‚W2β”‚ β”‚W3β”‚    β”‚
                                  β”‚  β””β”€β”€β”˜ β””β”€β”€β”˜ β””β”€β”€β”˜    β”‚
                                  β”‚  3 CPU Cores        β”‚
                                  β”‚  1.5GB RAM total    β”‚
                                  β”‚  ~15,000 req/s      β”‚
                                  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Vertical vs Horizontal Scaling

StrategyWhat It MeansNode.js Approach
VerticalBigger server (more CPU, RAM)Upgrade VPS from 2GB to 16GB
HorizontalMore servers (add instances)Load balancer + multiple app servers
Vertical Scaling:               Horizontal Scaling:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 1 Big Server   β”‚              β”‚ Load Balancer  β”‚
β”‚ 16 cores       β”‚              β””β”€β”€β”€β”¬β”€β”€β”€β”¬β”€β”€β”€β”¬β”€β”€β”€β”€β”˜
β”‚ 64GB RAM       β”‚              β”Œβ”€β”€β”΄β” β”Œβ”΄β” β”Œβ”΄β”€β”€β”
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜              β”‚S1 β”‚ β”‚S2β”‚ β”‚S3 β”‚
                                β””β”€β”€β”€β”˜ β””β”€β”€β”˜ β””β”€β”€β”€β”˜

1. Vertical Scaling β€” The Node.js Cluster Module

Before adding more machines, use all the cores on one machine:

const cluster = require('cluster');
const os = require('os');

if (cluster.isMaster) {
  const cpuCount = os.cpus().length;
  console.log(`Master ${process.pid} forking ${cpuCount} workers`);

  for (let i = 0; i < cpuCount; i++) {
    cluster.fork();
  }

  cluster.on('exit', (worker) => {
    console.log(`Worker ${worker.process.pid} died, restarting`);
    cluster.fork();
  });
} else {
  require('./app.js');  // Your app β€” each worker runs independently
}

PM2 Cluster Mode (Simpler)

# PM2 handles clustering with zero code changes
pm2 start app.js -i max        # One worker per CPU core
pm2 start app.js -i 4          # Exactly 4 workers
pm2 start app.js -i 0          # Auto-detect (max - 1)
// ecosystem.config.js
module.exports = {
  apps: [{
    name: 'my-api',
    script: 'app.js',
    instances: 'max',
    exec_mode: 'cluster',
  }],
};

Cluster Limitations

  • In-memory state is NOT shared β€” each worker has its own heap. Session data, caches, and counters are duplicated. Use Redis or a database for shared state.
  • Sticky sessions β€” if using WebSockets or long-lived sessions, configure the load balancer for sticky sessions (same client β†’ same worker).

2. Horizontal Scaling β€” Load Balancing

When one machine isn’t enough, distribute traffic across multiple machines:

                          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                          β”‚  Load Balancer  β”‚
                          β”‚  (Nginx / HAProxy / AWS ALB) β”‚
                          β””β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”˜
                             β”‚      β”‚      β”‚
                       β”Œβ”€β”€β”€β”€β”€β”΄β” β”Œβ”€β”€β”€β”΄β”€β”€β” β”Œβ”΄β”€β”€β”€β”€β”€β”
                       β”‚ App1 β”‚ β”‚ App2 β”‚ β”‚ App3 β”‚
                       β”‚ node β”‚ β”‚ node β”‚ β”‚ node β”‚
                       β””β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”˜

Nginx as a Load Balancer

# /etc/nginx/nginx.conf
http {
    upstream my_api {
        # Round-robin (default)
        server app1.internal:3000;
        server app2.internal:3000;
        server app3.internal:3000;

        # Optional: sticky sessions via IP hash
        # ip_hash;
    }

    server {
        listen 80;
        server_name api.example.com;

        location / {
            proxy_pass http://my_api;
            proxy_http_version 1.1;
            proxy_set_header Upgrade $http_upgrade;
            proxy_set_header Connection 'upgrade';
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        }
    }
}

Load Balancing Strategies

StrategyBehaviourUse Case
Round-robinRequests distributed in orderMost common, works for stateless APIs
Least connectionsSend to server with fewest active connectionsUneven request durations
IP hashSame client IP always goes to same serverSession persistence (sticky sessions)
RandomDistribute randomlySimple, avoids edge cases

3. Statelessness β€” The Key to Scaling

For horizontal scaling to work, your app must be stateless:

// ❌ BAD β€” state stored in process memory
const sessions = new Map();

app.post('/login', (req, res) => {
  const token = generateToken();
  sessions.set(token, { user: req.body.username });
  // If this request goes to server 2 on next call, session is lost!
  res.json({ token });
});

// βœ… GOOD β€” state stored externally
app.post('/login', async (req, res) => {
  const token = generateToken();
  await redis.set(`session:${token}`, JSON.stringify({ user: req.body.username }), 'EX', 3600);
  // Any server can look up the session
  res.json({ token });
});

app.get('/profile', async (req, res) => {
  const session = await redis.get(`session:${req.headers.authorization}`);
  // Works regardless of which server handles the request
});

Stateless Checklist

ResourceLocal (Bad for Scale)External (Good for Scale)
SessionsIn-memory MapRedis, Memcached
File uploadsLocal diskS3, Cloud Storage
CacheIn-memory objectRedis, CDN
QueuesIn-process arrayRabbitMQ, Redis, SQS
LogsLocal filestdout β†’ log aggregator

4. Reverse Proxy

A reverse proxy sits in front of your Node.js app and handles concerns that Node.js shouldn’t:

server {
    listen 443 ssl;
    server_name api.example.com;

    # SSL termination
    ssl_certificate /etc/letsencrypt/live/api.example.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/api.example.com/privkey.pem;

    # Static files β€” serve directly, don't hit Node.js
    location /static/ {
        root /var/www/public;
        expires 365d;
        add_header Cache-Control "public, immutable";
    }

    # API requests β€” proxy to Node.js
    location /api/ {
        proxy_pass http://localhost:3000;
        proxy_set_header X-Forwarded-Proto $scheme;

        # Rate limiting at proxy level
        limit_req zone=api burst=20 nodelay;
    }

    # Compression at proxy level
    gzip on;
    gzip_types text/plain application/json application/javascript text/css;
}

What the reverse proxy handles:

  • SSL/TLS termination β€” Node.js doesn’t handle SSL directly
  • Static file serving β€” Nginx serves static assets 10x faster than Node.js
  • Compression β€” gzip at proxy level, offloads CPU work from Node.js
  • Rate limiting β€” block malicious traffic before it reaches your app
  • Request buffering β€” handles slow clients so Node.js doesn’t wait

5. Microservices Architecture

Instead of one big app, split into small, independent services:

Monolith:                        Microservices:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ User Management      β”‚         β”‚ Auth   β”‚ β”‚ Users  β”‚
β”‚ Product Catalog      β”‚         β”‚ Serviceβ”‚ β”‚ Serviceβ”‚
β”‚ Order Processing     β”‚         β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ Payment              β”‚         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Notifications        β”‚         β”‚ Orders β”‚ β”‚Payment β”‚
β”‚ Analytics            β”‚         β”‚ Serviceβ”‚ β”‚Service β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                         β”‚
                                    β”Œβ”€β”€β”€β”€β”΄β”€β”€β”€β”€β”
                                    β”‚  Message β”‚
                                    β”‚  Queue   β”‚
                                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Communication Between Services

// Synchronous β€” HTTP API call
const response = await fetch('http://user-service:3001/api/users/42');
const user = await response.json();

// Asynchronous β€” message queue
const amqp = require('amqplib');
const connection = await amqp.connect('amqp://rabbitmq');
const channel = await connection.createChannel();

// Publish event
channel.publish('orders', 'order.created', Buffer.from(JSON.stringify(order)));

// Consume event (in notification service)
channel.consume('order.created', (msg) => {
  const order = JSON.parse(msg.content.toString());
  await sendEmail(order.userEmail, 'Order confirmed!');
  channel.ack(msg);
});

Microservices Pros & Cons

ProsCons
Independent scaling (scale only busy services)Distributed systems complexity
Independent deploymentNetwork latency between services
Language-agnostic (polyglot)Data consistency challenges
Smaller codebases per teamTesting requires integration tests
Fault isolation (one service crash β‰  all crash)Operational overhead (monitoring, tracing)

6. Auto-Scaling

Automatically add or remove servers based on load:

# AWS Auto Scaling config (conceptual)
AutoScalingGroup:
  MinSize: 2
  MaxSize: 10
  ScalingPolicies:
    - PolicyName: ScaleOut
      MetricType: CPUUtilization
      TargetValue: 70
      ScaleInCooldown: 60
    - PolicyName: ScaleIn
      MetricType: CPUUtilization
      TargetValue: 30
      ScaleInCooldown: 120

For simpler setups, use a platform that auto-scales:

  • Railway β€” auto-scales based on CPU/memory
  • Fly.io β€” auto-scales to zero, wakes on request
  • Heroku β€” auto-scaling dynos (paid addon)
  • AWS Elastic Beanstalk β€” auto-scaling with CloudWatch

7. Database Scaling

When your database becomes the bottleneck:

Read Replicas:                    Sharding:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”                       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Master │──► Replica 1          β”‚ Shard 1 β”‚ Users A–M
β”‚ (write)│──► Replica 2          β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€
β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜β”€β”€β–Ί Replica 3          β”‚ Shard 2 β”‚ Users N–Z
        (reads distributed)       β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜
// Read/write splitting
const { Pool } = require('pg');

const writerPool = new Pool({ connectionString: process.env.DB_WRITER_URL });
const readerPool = new Pool({ connectionString: process.env.DB_READER_URL });

async function getUser(id) {
  const { rows } = await readerPool.query('SELECT * FROM users WHERE id = $1', [id]);
  return rows[0];
}

async function updateUser(id, data) {
  const { rows } = await writerPool.query('UPDATE users SET name = $1 WHERE id = $2 RETURNING *', [data.name, id]);
  return rows[0];
}

Scaling Decision Tree

Is your app slow?
    β”‚
    β”œβ”€β”€ Profile the bottleneck
    β”‚
    β”œβ”€β”€ Is it CPU? β†’ Cluster mode (PM2 -i max)
    β”‚
    β”œβ”€β”€ Is it memory? β†’ Bigger server or Redis cache
    β”‚
    β”œβ”€β”€ Is it database? β†’ Indexes β†’ Read replicas β†’ Sharding
    β”‚
    β”œβ”€β”€ Is it network? β†’ CDN, compression, keep-alive
    β”‚
    └── Is it one server's limit? β†’ Horizontal scaling with load balancer

Key Takeaways

  • Vertical scaling first β€” use PM2 cluster mode to utilise all CPU cores on one machine
  • Cluster mode does NOT share memory β€” use Redis for sessions, caches, and shared state
  • Horizontal scaling requires a load balancer (Nginx, HAProxy, AWS ALB)
  • Statelessness is essential for horizontal scaling β€” sessions go in Redis, files go in S3
  • Reverse proxy (Nginx) handles SSL, static files, compression, rate limiting β€” offloads Node.js
  • Microservices enable independent scaling but add operational complexity
  • Auto-scaling adjusts capacity based on metrics (CPU, request rate) β€” use it for variable traffic
  • Database scaling (read replicas, sharding) is often the next bottleneck after app servers
  • Always measure before scaling β€” profile to find the real bottleneck; scaling the wrong layer wastes resources