MinerU-Diffusion: Rethinking Document OCR as Inverse Rendering via Diffusion Decoding Paper • 2603.22458 • Published Mar 23 • 135
CommonForms: A Large, Diverse Dataset for Form Field Detection Paper • 2509.16506 • Published Sep 20, 2025 • 22
GutenOCR: A Grounded Vision-Language Front-End for Documents Paper • 2601.14490 • Published Jan 20 • 37
RICO Collection A collection of RICO screenshot-based datasets for training and evaluation. We've attempted to compile all surrounding metadata for the relevant tasks • 8 items • Updated Jan 16 • 5
Gemma 3 Collection All versions of Google's new multimodal models including QAT in 1B, 4B, 12B, and 27B sizes. In GGUF, dynamic 4-bit and 16-bit formats. • 54 items • Updated 4 days ago • 114