Document Extraction Fixture Pack

Bundle of real PDF and TXT fixtures for extraction, layout parsing, OCR-style validation, protected-document handling, and damaged-file workflows.

Download the Pack

document_extraction_fixture_pack.zip · 16.8 KB

Best For

  • Field extraction and fixed-layout parsing across clean, scan-style, and protected PDFs.
  • Text extraction and encoding validation using UTF-8, UTF-16, and minimal TXT fixtures.
  • Repeatable setup for OCR, parser, and document-extraction QA workflows.

Included Fixtures

Filename Format Size Download
pdf_invoice_layout_sample.pdf PDF 774 B Download
pdf_form_like_sample.pdf PDF 773 B Download
pdf_scan_like_image_sample.pdf PDF 3.7 KB Download
pdf_ocr_noise_sample.pdf PDF 7.9 KB Download
pdf_multi_column_report_sample.pdf PDF 3.3 KB Download
pdf_password_protected_sample.pdf PDF 3.2 KB Download
pdf_truncated_edge_case_sample.pdf PDF 701 B Download
txt_utf8_multilingual_sample.txt TXT 94 B Download
txt_utf16le_sample.txt TXT 176 B Download
txt_minimal_readme_sample.txt TXT 100 B Download

Primary Fixture Matrix

Use the curated PDF matrix to move from this pack into the exact single-fixture variants behind it.

Primary Library

This pack is anchored to the PDF sample library and works best when paired with individual fixture downloads.

Open PDF Samples