Skip to content

Performance Guide

How to choose the right output mode, minimize file size, and avoid memory pressure for PDFs of any scale.

Decision tree

Document size → ┬─ < 1 MB ──────► toUint8Array() (simplest)

                ├─ 1–50 MB ─────► toUint8Array() or BufferSink

                ├─ 50–200 MB ───► NodeStreamSink / WebStreamSink

                └─ > 200 MB ────► Stream + object streams + linearize if web

Streaming vs in-memory

toUint8Array()

Returns the entire PDF as one allocation. Suitable for nearly all generated documents. The serializer writes sequentially to a growing buffer.

ts
const bytes = doc.toUint8Array();
// bytes occupies ~file_size bytes of memory

Use when: document fits in memory comfortably, you need a Uint8Array for upload, or you're running in a serverless function with low concurrency.

NodeStreamSink

Writes directly to a Node.js writable stream. Only one chunk is buffered at a time.

ts
import { NodeStreamSink } from "@criston/zeropdf";
import { createWriteStream } from "node:fs";

await doc.writeTo(new NodeStreamSink(createWriteStream("large.pdf")));

Use when: generating files larger than available memory, serving HTTP responses with res.writeHead, or writing to cloud storage.

WebStreamSink

Streams to a browser WritableStream (pair with a TransformStream to feed a Response).

ts
import { WebStreamSink } from "@criston/zeropdf";

const { readable, writable } = new TransformStream<Uint8Array>();
const sink = new WebStreamSink(writable);
void doc.writeTo(sink).then(() => sink.close());

const response = new Response(readable, {
  headers: { "Content-Type": "application/pdf" }
});

Use when: browser-side generation for large documents, progressive download.

BufferSink

Collects chunks into a buffer array, then assembles at the end. Avoids a single large allocation during serialization, but the final collect() step still creates one contiguous Uint8Array.

ts
const sink = new BufferSink();
await doc.writeTo(sink);
const bytes = sink.collect();

Use when: you need bytes at the end but want bounded memory during serialization.

Object streams

Object streams compress multiple indirect objects into a single compressed stream. Enable with:

ts
const doc = createDocument({ objectStreams: true });

Size impact

Object streams reduce file size by roughly 20–30% for typical documents by consolidating small dictionaries and arrays into a single deflate-compressed block. The margin is larger for documents with many small objects (form fields, annotations, structure elements).

Trade-offs

AspectClassic xref (default)Object streams
PDF version1.41.5
File sizeLargerSmaller (≈20–30%)
Encryption compatibilityAll modesrc4-128, aes-128, aes-256 only
Detached signaturesSupportedSupported
Linearized outputSupportedNot supported
Parse/edit compatibilityBroadRequires xref-stream capable reader
Memory during generationSameSame (streamed inline)

When to enable

Enable object streams for:

  • Documents with many small indirect objects (>500 objects)
  • Internal archival formats where file size matters
  • PDFs targeting PDF 1.5+ viewers

Skip object streams for:

  • Documents that must be PDF 1.4 compatible
  • Linearized output (linearize: true)
  • Environments where classic xref compatibility is required

Font subsetting costs

Every embedded TrueType font is subsetted automatically: only glyphs actually used in the document are included, and unused OpenType tables (GPOS, GSUB, apple, meta) are stripped.

Storage savings

ScenarioFull TTFSubsettedSavings
Latin-only body (Source Sans 3)88 KB8 KB91%
Latin + Greek + Cyrillic body88 KB18 KB80%
CJK (Noto Sans CJK SC)16 MB120 KB (used chars)99%+

Performance cost

Subsetting runs once per embedded font during serialization (not during page authoring). For a typical Latin font with <100 glyphs used, subsetting takes <2ms. For CJK fonts with thousands of glyphs, it can take 20–50ms.

Disabling subsetting

Not currently supported. The subsetter always runs for embedded TrueType fonts. If you need the full font embedded (e.g., editable PDFs), this is a limitation to be aware of.

Encryption overhead

AlgorithmEncryption costFile size overhead
rc4-40Negligible~50 bytes (enc dict)
rc4-128Negligible~50 bytes
aes-128<1ms per page of content~50 bytes
aes-256<2ms per page of content~50 bytes
aes-256-r6 (PDF 2.0)<2ms per page~100 bytes

Encryption adds per-stream overhead (16 bytes per AES block), but this is trivial compared to stream content.

Linearized output

Linearized output emits Fast Web View metadata at the start of the file, enabling progressive rendering in compatible viewers:

ts
const doc = createDocument({ linearize: true });

Trade-offs

AspectStandard outputLinearized
Byte-for-byte changesRearranged object orderDocument catalog + hint table at front
Compatible with object streamsNo
Compatible with encryptionNo
Compatible with detached signaturesNo
Page countAnySingle-page (current limitation)

Use when: single-page generated PDFs served over HTTP where progressive rendering is desired.

Memory profile by page count

Estimated memory for a document with 24 lines of text per page, one embedded Latin TTF (subsetted to 8 KB), no images:

PagesIn-memory (toUint8Array)Streaming (NodeStreamSink)
1~14 KB~2 KB peak
100~1.1 MB~2 KB peak
10,000~110 MB~2 KB peak
100,000~1.1 GB~2 KB peak

These are estimates; actual numbers depend on content density. Images and embedded fonts increase memory proportionally.

Parallel execution

The serializer is synchronous. For bulk generation, run multiple createDocument/toUint8Array calls in parallel using Promise.all or worker threads:

ts
const documents = await Promise.all(
  records.map((record) => {
    const doc = createDocument({ /* ... */ });
    doc.addPage().text(record.summary, { x: 56, y: 760 });
    return doc.toUint8Array();
  })
);

Each document is independent and does not share mutable state.

Spill-to-Disk Strategy

When a document exceeds available RAM, use TempFileSink to spill serialized data directly to a temporary file. This avoids holding the entire PDF in memory:

ts
import { TempFileSink } from "@criston/zeropdf";

const sink = new TempFileSink(); // manages its own temp file
await doc.writeTo(sink);
const bytes = await sink.getBytes(); // read back when needed
await sink.close(); // deletes the temp file

writeToFile convenience

For the common case, the writeToFile shorthand writes directly to a path:

ts
await doc.writeToFile("/tmp/report.pdf");

Internally, this uses a TempFileSink under the hood for documents over a threshold size, and a BufferSink for small documents.

Progressive Page Generation

For documents with thousands of pages, use writeNextPage() to generate and flush pages one at a time. This avoids holding all page objects in the serializer's working set simultaneously:

ts
const doc = createDocument({ objectStreams: true });
const sink = new NodeStreamSink(createWriteStream("report.pdf"));

for (let i = 0; i < 5000; i++) {
  const page = doc.addPage();
  page.text(`Page ${i + 1}`, { x: 56, y: 760 });
  await doc.writeNextPage(sink);
}
await doc.finish(sink);
  • writeNextPage() serializes the most recently added page and releases its resources
  • After the loop, finish() writes the cross-reference table and trailer
  • Compatible with streaming sinks (NodeStreamSink, WebStreamSink, TempFileSink)

Lazy Object Loading

When parsing an existing PDF, stream content (page descriptions, images, embedded fonts) is not decompressed until first access. This defers CPU and memory costs to the point of use:

ts
const srcDoc = await parseDocument(inputBytes);
const page = srcDoc.getPage(0);
// Page stream decompressed lazily on first text/image extraction
const text = await page.extractText();

Objects that are never accessed during the editing session are never decompressed, reducing parse-time overhead for large documents where only a subset of pages is modified.

Document Clone

clone() creates a deep copy of a document without re-rendering, ideal for template-based generation where many variations share the same base content:

ts
const template = createDocument({ objectStreams: true });
template.addPage().text("Header", { x: 56, y: 760 });

for (const record of records) {
  const doc = template.clone();
  doc.getPage(0).text(record.body, { x: 56, y: 720 });
  await doc.writeToFile(`output/${record.id}.pdf`);
}
  • Clone copies all objects, fonts, and cross-reference tables
  • Font data is shared by reference where possible (no re-subsetting)
  • Cloned documents are independent—mutating one does not affect the original or other clones

Cross-Document Caching

When generating multiple documents that share fonts or images, the library automatically caches and reuses resources across documents:

ts
const fontBytes = await readFile("fonts/SourceSans3-Regular.ttf");
const imageBytes = await readFile("logo.png");

for (const record of records) {
  const doc = createDocument();
  // Font subsetted once, reused on subsequent embeds
  const font = doc.embedTrueTypeFont(fontBytes, { family: "SS3" });
  const page = doc.addPage();
  page.png(imageBytes, { x: 56, y: 700, width: 64 }); // Image data reused
  page.text(record.caption, { x: 56, y: 680, font });
  await doc.writeToFile(`out/${record.id}.pdf`);
}
  • Font byte arrays with identical content share a single parsed font representation
  • Image byte arrays with identical content are encoded once and referenced by subsequent embeds
  • Cache keys are content-hash based, not reference based—two identical byte arrays from different sources benefit from caching

See also

Released under the ISC license.