PDF use case
PDF Sample File for OCR and Text Extraction
Validate OCR, text extraction, and layout-aware parsing with clean text PDFs, scan-like pages, and noisy image-based documents.
3.7 KB
Manifest JSON
SHA256
Starter file
Download
pdf_scan_like_image_sample.pdf
Checklist
Testing Steps
- Compare extracted text between scan-like, OCR-noise, and clean text PDF controls.
- Check how tables, multi-column layouts, and multi-page reports affect text order and field extraction.
- Verify fallback messaging when extraction quality drops on image-heavy PDF inputs.
Companion fixtures
Related Variants
Next steps