← Back to Feed
LlamaIndex highlights that frontier models like Gemini, Opus, and GPT struggle with visual grounding in document OCR, an
LlamaIndex highlights that frontier models like Gemini, Opus, and GPT struggle with visual grounding in document OCR, and claims significant improvements in their own models for spatial positioning on pages.
Original Post
One of the biggest requirements for document OCR is visual grounding, and frontier models (gemini, opus, gpt-5.4) suck at it by default.
In other words they don't have a great sense of the positions of things on a page.
We've made massive strides in making sure our models are https://t.co/c8rjkDwaQ4