Appearance
zeropdf / ParsedPdfDocument
Class: ParsedPdfDocument
Defined in: src/parser.ts:314
A read-only parsed representation of a PDF document. It provides helpers for inspecting document-level metadata, object references, pages, forms, outlines, structure trees, and source bytes.
Constructors
Constructor
ts
new ParsedPdfDocument(
version,
xrefOffsets,
trailer,
objects,
sourceBytes,
startXrefOffset,
encryption?,
recoveryMode?,
repairWarnings?): ParsedPdfDocument;Defined in: src/parser.ts:328
Creates a parsed PDF document wrapper around the trailer, cross-reference data, object table, source bytes, and optional encryption context.
Parameters
| Parameter | Type | Description |
|---|---|---|
version | string | The parsed PDF header version. |
xrefOffsets | ReadonlyMap<number, number> | The parsed cross-reference offsets by object id. |
trailer | PdfDict | The parsed trailer dictionary. |
objects | ReadonlyMap<number, ParsedPdfIndirectObject> | The parsed indirect objects by object id. |
sourceBytes | Uint8Array | The original source PDF bytes. |
startXrefOffset | number | The byte offset of the startxref pointer. |
encryption? | ParsedDocumentEncryption | The optional parsed encryption context. |
recoveryMode? | boolean | Whether the document was parsed in recovery mode. |
repairWarnings? | readonly string[] | Detailed repair warnings collected during recovery parsing. |
Returns
ParsedPdfDocument
Properties
| Property | Modifier | Type | Description | Defined in |
|---|---|---|---|---|
lazyObjects | readonly | Set<number> | Set of object ids whose stream data has not yet been decoded. | src/parser.ts:352 |
recoveryMode? | readonly | boolean | Whether the document was parsed in recovery mode. | src/parser.ts:336 |
repairWarnings? | readonly | readonly string[] | Detailed repair warnings collected during recovery parsing. | src/parser.ts:337 |
startXrefOffset | readonly | number | The byte offset of the startxref pointer. | src/parser.ts:334 |
trailer | readonly | PdfDict | The parsed trailer dictionary. | src/parser.ts:331 |
version | readonly | string | The parsed PDF header version. | src/parser.ts:329 |
xrefOffsets | readonly | ReadonlyMap<number, number> | The parsed cross-reference offsets by object id. | src/parser.ts:330 |
Methods
extractText()
ts
extractText(options?): readonly PageTextExtraction[];Defined in: src/parser.ts:663
Extracts positioned Unicode text from every page of the document. Best-effort: decodes each font via its /ToUnicode CMap when present and falls back to Latin-1 otherwise. Positions are user-space coordinates (origin lower-left) computed from the active text and graphics matrices.
Parameters
| Parameter | Type |
|---|---|
options | ExtractTextOptions |
Returns
readonly PageTextExtraction[]
Per-page text and positioned runs.
getConformanceProfile()
ts
getConformanceProfile():
| DocumentConformanceProfile
| undefined;Defined in: src/parser.ts:600
Detects the supported PDF/A or PDF/UA conformance profile from catalog and XMP metadata.
Returns
| DocumentConformanceProfile | undefined
The detected supported conformance profile, if any.
getEffectiveVersion()
ts
getEffectiveVersion(): string;Defined in: src/parser.ts:468
Returns the effective PDF version, taking into account a Catalog /Version override that may raise the version above the file header.
Returns
string
getEncryptionContext()
ts
getEncryptionContext(): PdfEncryptionContext | undefined;Defined in: src/parser.ts:515
Returns the parsed encryption context when the source PDF is encrypted.
Returns
PdfEncryptionContext | undefined
The encryption context, if present.
getEncryptionObjectId()
ts
getEncryptionObjectId(): number | undefined;Defined in: src/parser.ts:524
Returns the Encrypt dictionary object id when one was parsed.
Returns
number | undefined
The encryption dictionary object id, if present.
getInfoObjectId()
ts
getInfoObjectId(): number | undefined;Defined in: src/parser.ts:443
Returns the document information dictionary object id when one is present.
Returns
number | undefined
The information dictionary object id, if present.
getLanguage()
ts
getLanguage(): string | undefined;Defined in: src/parser.ts:574
Returns the catalog language value when present.
Returns
string | undefined
The document language, if present.
getMaxObjectId()
ts
getMaxObjectId(): number;Defined in: src/parser.ts:434
Returns the highest parsed object id, or zero for an empty object table.
Returns
number
The highest object id.
getMetadata()
ts
getMetadata(): ParsedPdfMetadata;Defined in: src/parser.ts:533
Returns document information dictionary metadata.
Returns
The parsed document metadata.
getObject()
ts
getObject(id):
| ParsedPdfIndirectObject
| undefined;Defined in: src/parser.ts:361
Returns a parsed indirect object by object id.
Parameters
| Parameter | Type | Description |
|---|---|---|
id | number | The object id to look up. |
Returns
| ParsedPdfIndirectObject | undefined
The parsed object, if present.
getObjectIds()
ts
getObjectIds(): readonly number[];Defined in: src/parser.ts:425
Returns all parsed object ids sorted in ascending order.
Returns
readonly number[]
The sorted object ids.
getPageCount()
ts
getPageCount(): number;Defined in: src/parser.ts:636
Returns the number of pages in the document.
Returns
number
The page count.
getPagesObjectId()
ts
getPagesObjectId(): number;Defined in: src/parser.ts:497
Returns the root Pages tree object id.
Returns
number
The Pages tree object id.
getRepairWarnings()
ts
getRepairWarnings(): readonly string[];Defined in: src/parser.ts:345
Returns the repair warnings collected during recovery parsing.
Returns
readonly string[]
The repair warnings, or an empty array.
getRootObjectId()
ts
getRootObjectId(): number;Defined in: src/parser.ts:453
Returns the catalog object id from the trailer root entry.
Returns
number
The catalog object id.
getSourceBytes()
ts
getSourceBytes(): Uint8Array;Defined in: src/parser.ts:506
Returns a defensive copy of the original source bytes.
Returns
Uint8Array
A copy of the source bytes.
getXmpMetadata()
ts
getXmpMetadata(): string | undefined;Defined in: src/parser.ts:557
Returns decoded XMP metadata from the catalog metadata stream when present.
Returns
string | undefined
The decoded XMP metadata, if present.
inspectStructureElements()
ts
inspectStructureElements(): readonly ParsedPdfInspectedStructureElement[];Defined in: src/parser.ts:825
Returns a nested inspection view of tagged structure elements.
Returns
readonly ParsedPdfInspectedStructureElement[]
The nested structure inspection records.
isTagged()
ts
isTagged(): boolean;Defined in: src/parser.ts:583
Reports whether the document declares tagged PDF structure.
Returns
boolean
True when the document is tagged.
listAttachments()
ts
listAttachments(): readonly ParsedPdfAttachment[];Defined in: src/parser.ts:754
Returns embedded file attachment metadata.
Returns
readonly ParsedPdfAttachment[]
The attachment metadata records.
listEditableStructureElements()
ts
listEditableStructureElements(): readonly ParsedPdfEditableStructureElement[];Defined in: src/parser.ts:848
Returns editable handles for tagged structure elements.
Returns
readonly ParsedPdfEditableStructureElement[]
The editable structure element handles.
listFormFields()
ts
listFormFields(): readonly ParsedPdfFormField[];Defined in: src/parser.ts:777
Returns AcroForm field metadata.
Returns
readonly ParsedPdfFormField[]
The parsed form fields.
listNamedDestinations()
ts
listNamedDestinations(): readonly ParsedPdfNamedDestination[];Defined in: src/parser.ts:683
Returns named destinations discovered from the document name tree.
Returns
readonly ParsedPdfNamedDestination[]
The named destinations.
listOutlines()
ts
listOutlines(): readonly ParsedPdfOutlineItem[];Defined in: src/parser.ts:706
Returns the document outline tree.
Returns
readonly ParsedPdfOutlineItem[]
The outline items.
listPageLabels()
ts
listPageLabels(): readonly ParsedPdfPageLabelRange[];Defined in: src/parser.ts:733
Returns page-label ranges discovered from the page labels number tree.
Returns
readonly ParsedPdfPageLabelRange[]
The page-label ranges.
listPages()
ts
listPages(): readonly ParsedPdfPage[];Defined in: src/parser.ts:648
Returns parsed page records discovered from the page tree.
Returns
readonly ParsedPdfPage[]
The parsed page records.
listStructureElements()
ts
listStructureElements(): readonly ParsedPdfStructureElement[];Defined in: src/parser.ts:803
Returns a flat list of tagged structure elements.
Returns
readonly ParsedPdfStructureElement[]
The parsed structure elements.
preloadObjects()
ts
preloadObjects(ids): void;Defined in: src/parser.ts:412
Eagerly loads and decodes the specified object ids so they are available without further lazy-parsing.
Parameters
| Parameter | Type | Description |
|---|---|---|
ids | readonly number[] | The object ids to preload. |
Returns
void