What Are Streams & Why They Matter Β· Astro Tech Blog

The Problem Streams Solve

Imagine you need to read a 5GB file and send it over HTTP. The naive approach:

const http = require('http');
const fs = require('fs');

http.createServer((req, res) => {
  // ❌ BAD β€” loads entire 5GB into memory
  fs.readFile('./huge-file.zip', (err, data) => {
    res.end(data);
  });
}).listen(3000);

This crashes with an out of memory error because readFile loads the entire file into RAM before sending it. With streams, you never hold more than a 64KB chunk in memory at once:

// βœ… GOOD β€” streams use ~64KB of memory regardless of file size
http.createServer((req, res) => {
  const readStream = fs.createReadStream('./huge-file.zip');
  readStream.pipe(res);
}).listen(3000);

That’s why streams matter. They enable processing of data that would otherwise be impossible due to memory constraints.

What Is a Stream?

A stream is a sequence of data chunks that arrives over time. Instead of waiting for all the data to be available, you process each chunk as it arrives.

Without Streams (buffer everything):
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Wait for entire file...                β”‚
β”‚                                        β”‚
β”‚  [β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ] 5GB β”‚
β”‚                                        β”‚
β”‚  Then process                          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

With Streams (process chunks):
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Chunk 1 ──► Process                   β”‚
β”‚  Chunk 2 ──► Process                   β”‚
β”‚  Chunk 3 ──► Process                   β”‚
β”‚  ...                                   β”‚
β”‚  Chunk N ──► Process                   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Real-World Analogy

Think of streams like a water pipe:

  • Without a pipe (buffer): You wait until all the water arrives in a giant tank, then you use it. If the tank is infinite, this works. But RAM isn’t infinite.
  • With a pipe (stream): Water flows through the pipe continuously. You attach a faucet at the end and use water as it arrives. The pipe never holds more than a small volume at once.

What Is a Chunk?

A chunk is the unit of data a stream emits. Chunks are Buffer objects (binary) by default, or you can set an encoding to get strings:

const stream = fs.createReadStream('file.txt', { highWaterMark: 16384 });
// highWaterMark = chunk size in bytes (default: 65536 or 16KB for files)

stream.on('data', (chunk) => {
  console.log('Received chunk of', chunk.length, 'bytes');
  // chunk is a Buffer β€” call chunk.toString() for a string
});

Backpressure β€” The Killer Feature

When a readable stream produces data faster than a writable stream can consume it, backpressure kicks in. The readable stream pauses until the writable stream catches up.

Too fast:                         With backpressure:
Source ──► Sink                   Source ──► Sink
  β”‚         β”‚                        β”‚         β”‚
  β–Ό         β–Ό                        β–Ό         β–Ό
[β–ˆβ–ˆβ–ˆβ–ˆ]    [β–‘β–‘β–‘β–‘]                    [β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ]  [β–ˆ]
[β–ˆβ–ˆβ–ˆβ–ˆ]    [β–‘β–‘β–‘β–‘]  ← overflow!       [──pause──→]
[β–ˆβ–ˆβ–ˆβ–ˆ]    [β–‘β–‘β–‘β–‘]                     [β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ]  [β–ˆβ–ˆ]
                                    [──resume──→]

pipe() handles backpressure automatically. Without pipe(), you must manage it manually:

// Manual backpressure handling
const readable = fs.createReadStream('source.txt');
const writable = fs.createWriteStream('dest.txt');

readable.on('data', (chunk) => {
  const canContinue = writable.write(chunk);
  if (!canContinue) {
    console.log('Backpressure β€” pausing read');
    readable.pause();  // Stop reading until drain
  }
});

writable.on('drain', () => {
  console.log('Drained β€” resuming read');
  readable.resume();  // Safe to read more
});

pipe() does all of this for you. Always use pipe() unless you have a specific reason to manage backpressure manually.

Stream Types Overview

Node.js has four fundamental stream types:

TypePurposeExample
ReadableSource of datafs.createReadStream, http.IncomingMessage
WritableDestination for datafs.createWriteStream, http.ServerResponse
DuplexBoth readable and writable (independent)net.Socket, zlib.createGzip
TransformBoth readable and writable (chained)zlib.createGunzip, crypto.createCipher
Readable:       ──────►
Writable:               ◄──────
Duplex:         ──────►
                ◄──────
Transform:      ──────► modify ──────►

Stream Modes

Readable streams operate in two modes:

1. Flowing Mode

Data is read automatically and emitted as soon as it arrives:

// Attaching a 'data' listener switches to flowing mode
readStream.on('data', (chunk) => {
  console.log('Received', chunk.length, 'bytes');
});

// Or use pipe()
readStream.pipe(writable);

2. Paused Mode

You explicitly call read() to pull data:

// No 'data' listener β€” stream is in paused mode
readStream.on('readable', () => {
  let chunk;
  while ((chunk = readStream.read()) !== null) {
    console.log('Read', chunk.length, 'bytes');
  }
});

pipe() switches to flowing mode internally and handles backpressure. Manual read() in paused mode gives you finer control but is more error-prone.

Stream States

Every stream has an internal state:

const stream = fs.createReadStream('file.txt');

console.log(stream.readableFlowing);  // null (not flowing yet)
stream.on('data', () => {});
console.log(stream.readableFlowing);  // true (flowing)

stream.pause();
console.log(stream.readableFlowing);  // false (paused)

stream.resume();
console.log(stream.readableFlowing);  // true (flowing again)

stream.destroy();
console.log(stream.destroyed);        // true
StateDescription
readableFlowing === nullNot yet flowing β€” wait for pipe or β€˜data’ listener
readableFlowing === trueFlowing β€” data is being read automatically
readableFlowing === falsePaused β€” call read() or resume()
destroyed === trueStream has been destroyed

Why Streams Matter in Production

Without StreamsWith Streams
5GB file needs 5GB RAM5GB file needs ~64KB RAM
Client waits for full bufferClient starts receiving immediately
Backpressure causes OOM crashBackpressure auto-managed
One slow consumer blocks everythingMultiple pipes, transform chaining

Common Scenarios Where Streams Are Essential

  1. File upload/download β€” handling large files without RAM explosion
  2. Log processing β€” reading gigabyte-sized log files line by line
  3. Video streaming β€” delivering media without buffering entire files
  4. Compression β€” gzipping data on the fly as it’s sent
  5. Database migrations β€” processing millions of rows in batches
  6. CSV/JSON parsing β€” parsing large datasets incrementally

Practical: Processing a Large CSV

const fs = require('fs');
const readline = require('readline');

async function processLargeCSV(filePath) {
  const rl = readline.createInterface({
    input: fs.createReadStream(filePath),
    crlfDelay: Infinity,
  });

  let lineCount = 0;
  let totalAmount = 0;

  for await (const line of rl) {
    if (lineCount === 0) { lineCount++; continue; } // Skip header

    const columns = line.split(',');
    const amount = parseFloat(columns[3]);

    if (!isNaN(amount)) {
      totalAmount += amount;
    }

    lineCount++;

    // Log progress every 100K lines
    if (lineCount % 100000 === 0) {
      console.log(`Processed ${lineCount} lines, current total: $${totalAmount.toFixed(2)}`);
    }
  }

  console.log(`Done! Total: $${totalAmount.toFixed(2)} from ${lineCount} records`);
}

// Processes a 2GB CSV using ~50MB of RAM
processLargeCSV('./sales_2025.csv').catch(console.error);

Key Takeaways

  • Streams process data chunk by chunk β€” not all at once. This makes large data handling possible.
  • Memory efficiency β€” streams use ~64KB per chunk regardless of total data size
  • Backpressure is automatic with pipe() β€” the source slows down if the sink can’t keep up
  • Without streams, a 5GB file requires 5GB of RAM β€” with streams, it’s ~64KB
  • There are 4 stream types: Readable, Writable, Duplex, Transform
  • Streams can be in flowing (auto-read) or paused (manual read()) mode
  • pipe() is the safest way to connect streams β€” it handles backpressure, errors, and cleanup
  • Use streams for: file I/O, HTTP, compression, CSV/JSON parsing, and any large data processing