PDF Extraction Fixture Matrix

Use the PDF matrix to choose between text-heavy, layout-driven, form-like, and damaged fixtures for preview and extraction pipelines.

How to Use This Matrix

  • Covers single-page, multi-page, layout-heavy, and damaged PDF cases.
  • Built for preview rendering, text extraction, field mapping, and parser error paths.
  • Useful for invoice, report, and document-processing workflows where layout matters.

Open Primary Library

This matrix is anchored to the PDF library page and its manifest.

Fixture Rows

Variant Profile Test Focus File Size Download
Single-Page Text
Best default sanity check for renderers and PDF text extraction.
Valid baseline Simple rendering and extraction pdf_single_page_text_sample.pdf 725 B Download
Multi-Page Report
Useful for multi-page previews, extraction batching, and document splitting.
Valid document Pagination and page count pdf_multi_page_report_sample.pdf 1.3 KB Download
Invoice Layout
Targets invoice parsers and structured extraction pipelines.
Layout-driven fixture Field extraction from fixed layouts pdf_invoice_layout_sample.pdf 774 B Download
Form-Like PDF
Useful for OCR-adjacent field mapping and fixed-position extraction logic.
Structured layout Form field and box detection pdf_form_like_sample.pdf 773 B Download
Landscape Report
Targets preview rotation, table extraction, and page-fit UI handling.
Orientation variant Wide-table rendering pdf_landscape_report_sample.pdf 743 B Download
Truncated PDF
Good for parser failures, preview fallback, and corrupt-download handling.
Broken fixture Damaged file recovery pdf_truncated_edge_case_sample.pdf 701 B Download

Related Workflows

Upload Validation Fixtures

Sample files and packs for upload-limit checks, MIME validation, archive intake, and mixed-content workflow testing.

Open Workflow

Parser Regression Fixtures

Stable and edge-case fixtures for document, data, and archive parsers that need deterministic regression coverage.

Open Workflow