Skip to content

zeropdf v1.3.0


zeropdf / ParsedPdfDocument

Class: ParsedPdfDocument

Defined in: src/parser.ts:314

A read-only parsed representation of a PDF document. It provides helpers for inspecting document-level metadata, object references, pages, forms, outlines, structure trees, and source bytes.

Constructors

Constructor

ts
new ParsedPdfDocument(
   version, 
   xrefOffsets, 
   trailer, 
   objects, 
   sourceBytes, 
   startXrefOffset, 
   encryption?, 
   recoveryMode?, 
   repairWarnings?): ParsedPdfDocument;

Defined in: src/parser.ts:328

Creates a parsed PDF document wrapper around the trailer, cross-reference data, object table, source bytes, and optional encryption context.

Parameters

ParameterTypeDescription
versionstringThe parsed PDF header version.
xrefOffsetsReadonlyMap<number, number>The parsed cross-reference offsets by object id.
trailerPdfDictThe parsed trailer dictionary.
objectsReadonlyMap<number, ParsedPdfIndirectObject>The parsed indirect objects by object id.
sourceBytesUint8ArrayThe original source PDF bytes.
startXrefOffsetnumberThe byte offset of the startxref pointer.
encryption?ParsedDocumentEncryptionThe optional parsed encryption context.
recoveryMode?booleanWhether the document was parsed in recovery mode.
repairWarnings?readonly string[]Detailed repair warnings collected during recovery parsing.

Returns

ParsedPdfDocument

Properties

PropertyModifierTypeDescriptionDefined in
lazyObjectsreadonlySet<number>Set of object ids whose stream data has not yet been decoded.src/parser.ts:352
recoveryMode?readonlybooleanWhether the document was parsed in recovery mode.src/parser.ts:336
repairWarnings?readonlyreadonly string[]Detailed repair warnings collected during recovery parsing.src/parser.ts:337
startXrefOffsetreadonlynumberThe byte offset of the startxref pointer.src/parser.ts:334
trailerreadonlyPdfDictThe parsed trailer dictionary.src/parser.ts:331
versionreadonlystringThe parsed PDF header version.src/parser.ts:329
xrefOffsetsreadonlyReadonlyMap<number, number>The parsed cross-reference offsets by object id.src/parser.ts:330

Methods

extractText()

ts
extractText(options?): readonly PageTextExtraction[];

Defined in: src/parser.ts:663

Extracts positioned Unicode text from every page of the document. Best-effort: decodes each font via its /ToUnicode CMap when present and falls back to Latin-1 otherwise. Positions are user-space coordinates (origin lower-left) computed from the active text and graphics matrices.

Parameters

ParameterType
optionsExtractTextOptions

Returns

readonly PageTextExtraction[]

Per-page text and positioned runs.


getConformanceProfile()

ts
getConformanceProfile(): 
  | DocumentConformanceProfile
  | undefined;

Defined in: src/parser.ts:600

Detects the supported PDF/A or PDF/UA conformance profile from catalog and XMP metadata.

Returns

| DocumentConformanceProfile | undefined

The detected supported conformance profile, if any.


getEffectiveVersion()

ts
getEffectiveVersion(): string;

Defined in: src/parser.ts:468

Returns the effective PDF version, taking into account a Catalog /Version override that may raise the version above the file header.

Returns

string


getEncryptionContext()

ts
getEncryptionContext(): PdfEncryptionContext | undefined;

Defined in: src/parser.ts:515

Returns the parsed encryption context when the source PDF is encrypted.

Returns

PdfEncryptionContext | undefined

The encryption context, if present.


getEncryptionObjectId()

ts
getEncryptionObjectId(): number | undefined;

Defined in: src/parser.ts:524

Returns the Encrypt dictionary object id when one was parsed.

Returns

number | undefined

The encryption dictionary object id, if present.


getInfoObjectId()

ts
getInfoObjectId(): number | undefined;

Defined in: src/parser.ts:443

Returns the document information dictionary object id when one is present.

Returns

number | undefined

The information dictionary object id, if present.


getLanguage()

ts
getLanguage(): string | undefined;

Defined in: src/parser.ts:574

Returns the catalog language value when present.

Returns

string | undefined

The document language, if present.


getMaxObjectId()

ts
getMaxObjectId(): number;

Defined in: src/parser.ts:434

Returns the highest parsed object id, or zero for an empty object table.

Returns

number

The highest object id.


getMetadata()

ts
getMetadata(): ParsedPdfMetadata;

Defined in: src/parser.ts:533

Returns document information dictionary metadata.

Returns

ParsedPdfMetadata

The parsed document metadata.


getObject()

ts
getObject(id): 
  | ParsedPdfIndirectObject
  | undefined;

Defined in: src/parser.ts:361

Returns a parsed indirect object by object id.

Parameters

ParameterTypeDescription
idnumberThe object id to look up.

Returns

| ParsedPdfIndirectObject | undefined

The parsed object, if present.


getObjectIds()

ts
getObjectIds(): readonly number[];

Defined in: src/parser.ts:425

Returns all parsed object ids sorted in ascending order.

Returns

readonly number[]

The sorted object ids.


getPageCount()

ts
getPageCount(): number;

Defined in: src/parser.ts:636

Returns the number of pages in the document.

Returns

number

The page count.


getPagesObjectId()

ts
getPagesObjectId(): number;

Defined in: src/parser.ts:497

Returns the root Pages tree object id.

Returns

number

The Pages tree object id.


getRepairWarnings()

ts
getRepairWarnings(): readonly string[];

Defined in: src/parser.ts:345

Returns the repair warnings collected during recovery parsing.

Returns

readonly string[]

The repair warnings, or an empty array.


getRootObjectId()

ts
getRootObjectId(): number;

Defined in: src/parser.ts:453

Returns the catalog object id from the trailer root entry.

Returns

number

The catalog object id.


getSourceBytes()

ts
getSourceBytes(): Uint8Array;

Defined in: src/parser.ts:506

Returns a defensive copy of the original source bytes.

Returns

Uint8Array

A copy of the source bytes.


getXmpMetadata()

ts
getXmpMetadata(): string | undefined;

Defined in: src/parser.ts:557

Returns decoded XMP metadata from the catalog metadata stream when present.

Returns

string | undefined

The decoded XMP metadata, if present.


inspectStructureElements()

ts
inspectStructureElements(): readonly ParsedPdfInspectedStructureElement[];

Defined in: src/parser.ts:825

Returns a nested inspection view of tagged structure elements.

Returns

readonly ParsedPdfInspectedStructureElement[]

The nested structure inspection records.


isTagged()

ts
isTagged(): boolean;

Defined in: src/parser.ts:583

Reports whether the document declares tagged PDF structure.

Returns

boolean

True when the document is tagged.


listAttachments()

ts
listAttachments(): readonly ParsedPdfAttachment[];

Defined in: src/parser.ts:754

Returns embedded file attachment metadata.

Returns

readonly ParsedPdfAttachment[]

The attachment metadata records.


listEditableStructureElements()

ts
listEditableStructureElements(): readonly ParsedPdfEditableStructureElement[];

Defined in: src/parser.ts:848

Returns editable handles for tagged structure elements.

Returns

readonly ParsedPdfEditableStructureElement[]

The editable structure element handles.


listFormFields()

ts
listFormFields(): readonly ParsedPdfFormField[];

Defined in: src/parser.ts:777

Returns AcroForm field metadata.

Returns

readonly ParsedPdfFormField[]

The parsed form fields.


listNamedDestinations()

ts
listNamedDestinations(): readonly ParsedPdfNamedDestination[];

Defined in: src/parser.ts:683

Returns named destinations discovered from the document name tree.

Returns

readonly ParsedPdfNamedDestination[]

The named destinations.


listOutlines()

ts
listOutlines(): readonly ParsedPdfOutlineItem[];

Defined in: src/parser.ts:706

Returns the document outline tree.

Returns

readonly ParsedPdfOutlineItem[]

The outline items.


listPageLabels()

ts
listPageLabels(): readonly ParsedPdfPageLabelRange[];

Defined in: src/parser.ts:733

Returns page-label ranges discovered from the page labels number tree.

Returns

readonly ParsedPdfPageLabelRange[]

The page-label ranges.


listPages()

ts
listPages(): readonly ParsedPdfPage[];

Defined in: src/parser.ts:648

Returns parsed page records discovered from the page tree.

Returns

readonly ParsedPdfPage[]

The parsed page records.


listStructureElements()

ts
listStructureElements(): readonly ParsedPdfStructureElement[];

Defined in: src/parser.ts:803

Returns a flat list of tagged structure elements.

Returns

readonly ParsedPdfStructureElement[]

The parsed structure elements.


preloadObjects()

ts
preloadObjects(ids): void;

Defined in: src/parser.ts:412

Eagerly loads and decodes the specified object ids so they are available without further lazy-parsing.

Parameters

ParameterTypeDescription
idsreadonly number[]The object ids to preload.

Returns

void

Released under the ISC license.