The Problem Streams Solve
Imagine you need to read a 5GB file and send it over HTTP. The naive approach:
const http = require('http');
const fs = require('fs');
http.createServer((req, res) => {
// β BAD β loads entire 5GB into memory
fs.readFile('./huge-file.zip', (err, data) => {
res.end(data);
});
}).listen(3000);
This crashes with an out of memory error because readFile loads the entire file into RAM before sending it. With streams, you never hold more than a 64KB chunk in memory at once:
// β
GOOD β streams use ~64KB of memory regardless of file size
http.createServer((req, res) => {
const readStream = fs.createReadStream('./huge-file.zip');
readStream.pipe(res);
}).listen(3000);
Thatβs why streams matter. They enable processing of data that would otherwise be impossible due to memory constraints.
What Is a Stream?
A stream is a sequence of data chunks that arrives over time. Instead of waiting for all the data to be available, you process each chunk as it arrives.
Without Streams (buffer everything):
ββββββββββββββββββββββββββββββββββββββββββ
β Wait for entire file... β
β β
β [ββββββββββββββββββββββββββββββββ] 5GB β
β β
β Then process β
ββββββββββββββββββββββββββββββββββββββββββ
With Streams (process chunks):
ββββββββββββββββββββββββββββββββββββββββββ
β Chunk 1 βββΊ Process β
β Chunk 2 βββΊ Process β
β Chunk 3 βββΊ Process β
β ... β
β Chunk N βββΊ Process β
ββββββββββββββββββββββββββββββββββββββββββ
Real-World Analogy
Think of streams like a water pipe:
- Without a pipe (buffer): You wait until all the water arrives in a giant tank, then you use it. If the tank is infinite, this works. But RAM isnβt infinite.
- With a pipe (stream): Water flows through the pipe continuously. You attach a faucet at the end and use water as it arrives. The pipe never holds more than a small volume at once.
What Is a Chunk?
A chunk is the unit of data a stream emits. Chunks are Buffer objects (binary) by default, or you can set an encoding to get strings:
const stream = fs.createReadStream('file.txt', { highWaterMark: 16384 });
// highWaterMark = chunk size in bytes (default: 65536 or 16KB for files)
stream.on('data', (chunk) => {
console.log('Received chunk of', chunk.length, 'bytes');
// chunk is a Buffer β call chunk.toString() for a string
});
Backpressure β The Killer Feature
When a readable stream produces data faster than a writable stream can consume it, backpressure kicks in. The readable stream pauses until the writable stream catches up.
Too fast: With backpressure:
Source βββΊ Sink Source βββΊ Sink
β β β β
βΌ βΌ βΌ βΌ
[ββββ] [ββββ] [βββββ] [β]
[ββββ] [ββββ] β overflow! [ββpauseβββ]
[ββββ] [ββββ] [βββββ] [ββ]
[ββresumeβββ]
pipe() handles backpressure automatically. Without pipe(), you must manage it manually:
// Manual backpressure handling
const readable = fs.createReadStream('source.txt');
const writable = fs.createWriteStream('dest.txt');
readable.on('data', (chunk) => {
const canContinue = writable.write(chunk);
if (!canContinue) {
console.log('Backpressure β pausing read');
readable.pause(); // Stop reading until drain
}
});
writable.on('drain', () => {
console.log('Drained β resuming read');
readable.resume(); // Safe to read more
});
pipe()does all of this for you. Always usepipe()unless you have a specific reason to manage backpressure manually.
Stream Types Overview
Node.js has four fundamental stream types:
| Type | Purpose | Example |
|---|---|---|
| Readable | Source of data | fs.createReadStream, http.IncomingMessage |
| Writable | Destination for data | fs.createWriteStream, http.ServerResponse |
| Duplex | Both readable and writable (independent) | net.Socket, zlib.createGzip |
| Transform | Both readable and writable (chained) | zlib.createGunzip, crypto.createCipher |
Readable: βββββββΊ
Writable: βββββββ
Duplex: βββββββΊ
βββββββ
Transform: βββββββΊ modify βββββββΊ
Stream Modes
Readable streams operate in two modes:
1. Flowing Mode
Data is read automatically and emitted as soon as it arrives:
// Attaching a 'data' listener switches to flowing mode
readStream.on('data', (chunk) => {
console.log('Received', chunk.length, 'bytes');
});
// Or use pipe()
readStream.pipe(writable);
2. Paused Mode
You explicitly call read() to pull data:
// No 'data' listener β stream is in paused mode
readStream.on('readable', () => {
let chunk;
while ((chunk = readStream.read()) !== null) {
console.log('Read', chunk.length, 'bytes');
}
});
pipe()switches to flowing mode internally and handles backpressure. Manualread()in paused mode gives you finer control but is more error-prone.
Stream States
Every stream has an internal state:
const stream = fs.createReadStream('file.txt');
console.log(stream.readableFlowing); // null (not flowing yet)
stream.on('data', () => {});
console.log(stream.readableFlowing); // true (flowing)
stream.pause();
console.log(stream.readableFlowing); // false (paused)
stream.resume();
console.log(stream.readableFlowing); // true (flowing again)
stream.destroy();
console.log(stream.destroyed); // true
| State | Description |
|---|---|
readableFlowing === null | Not yet flowing β wait for pipe or βdataβ listener |
readableFlowing === true | Flowing β data is being read automatically |
readableFlowing === false | Paused β call read() or resume() |
destroyed === true | Stream has been destroyed |
Why Streams Matter in Production
| Without Streams | With Streams |
|---|---|
| 5GB file needs 5GB RAM | 5GB file needs ~64KB RAM |
| Client waits for full buffer | Client starts receiving immediately |
| Backpressure causes OOM crash | Backpressure auto-managed |
| One slow consumer blocks everything | Multiple pipes, transform chaining |
Common Scenarios Where Streams Are Essential
- File upload/download β handling large files without RAM explosion
- Log processing β reading gigabyte-sized log files line by line
- Video streaming β delivering media without buffering entire files
- Compression β gzipping data on the fly as itβs sent
- Database migrations β processing millions of rows in batches
- CSV/JSON parsing β parsing large datasets incrementally
Practical: Processing a Large CSV
const fs = require('fs');
const readline = require('readline');
async function processLargeCSV(filePath) {
const rl = readline.createInterface({
input: fs.createReadStream(filePath),
crlfDelay: Infinity,
});
let lineCount = 0;
let totalAmount = 0;
for await (const line of rl) {
if (lineCount === 0) { lineCount++; continue; } // Skip header
const columns = line.split(',');
const amount = parseFloat(columns[3]);
if (!isNaN(amount)) {
totalAmount += amount;
}
lineCount++;
// Log progress every 100K lines
if (lineCount % 100000 === 0) {
console.log(`Processed ${lineCount} lines, current total: $${totalAmount.toFixed(2)}`);
}
}
console.log(`Done! Total: $${totalAmount.toFixed(2)} from ${lineCount} records`);
}
// Processes a 2GB CSV using ~50MB of RAM
processLargeCSV('./sales_2025.csv').catch(console.error);
Key Takeaways
- Streams process data chunk by chunk β not all at once. This makes large data handling possible.
- Memory efficiency β streams use ~64KB per chunk regardless of total data size
- Backpressure is automatic with
pipe()β the source slows down if the sink canβt keep up - Without streams, a 5GB file requires 5GB of RAM β with streams, itβs ~64KB
- There are 4 stream types: Readable, Writable, Duplex, Transform
- Streams can be in flowing (auto-read) or paused (manual
read()) mode pipe()is the safest way to connect streams β it handles backpressure, errors, and cleanup- Use streams for: file I/O, HTTP, compression, CSV/JSON parsing, and any large data processing