Document Extraction Fixtures
PDF and TXT fixtures for layout parsing, OCR-style extraction, protected-document handling, and text normalization workflows.
3
Why This Workflow Matters
7
Files
Use workflow pages to move from a job to the exact fixtures, packs, and supporting references.
Why This Workflow Matters
About This Workflow
- Mix clean layout PDFs, scan-style pages, protected files, and damaged documents in one extraction suite.
- Pair PDF extraction cases with TXT encoding fixtures to validate plain-text fallback and normalization.
- Use the extraction pack for repeatable parser, OCR, and field-mapping setup.
Recommended Packs
Fixture Packs
Document Extraction Fixture Pack
Image Extraction Fixture Pack
Fixture Matrices
Fixture Matrices
PDF Extraction Fixture Matrix
TXT Encoding Fixture Matrix
Suggested Fixtures
Files
| Filename | Format | Size | Actions |
|---|---|---|---|
|
pdf_invoice_layout_sample.pdf
|
774 B | ||
|
pdf_scan_like_image_sample.pdf
|
3.7 KB | ||
|
pdf_ocr_noise_sample.pdf
|
7.9 KB | ||
|
pdf_multi_column_report_sample.pdf
|
3.3 KB | ||
|
pdf_password_protected_sample.pdf
|
3.2 KB | ||
|
txt_utf8_multilingual_sample.txt
|
TXT | 94 B | |
|
txt_utf16le_sample.txt
|
TXT | 176 B |
Related Strategy Pages