← Back to Feed
Industry News ocr document_parsing llm visual_grounding

LlamaIndex highlights that frontier models like Gemini, Opus, and GPT struggle with visual grounding in document OCR, an

LlamaIndex highlights that frontier models like Gemini, Opus, and GPT struggle with visual grounding in document OCR, and claims significant improvements in their own models for spatial positioning on pages.
One of the biggest requirements for document OCR is visual grounding, and frontier models (gemini, opus, gpt-5.4) suck at it by default. In other words they don't have a great sense of the positions of things on a page. We've made massive strides in making sure our models are https://t.co/c8rjkDwaQ4

View Original Post ↗