Industry News agents llm industry_trends

A skeptical HN post questioning whether 'long time horizon' capability in AI agents is a meaningful metric or easily gam

A skeptical HN post questioning whether 'long time horizon' capability in AI agents is a meaningful metric or easily gameable, arguing context window size is the actual constraint.

Original Post

Ask HN: Why would we care about "extended time horizons" and LLMs? Is it more impressive to take longer to answer 2 + 2? It’s not. The longer one takes, the less intelligent we would rate that person.

Somehow for AI agents taking longer is getting praise with the framing “maintaining attention for long-time horizons?”

Have we collectively gone down to room temperature IQs with COVID?

Why would the time dimension matter for a tool that is limited in context window? Doesn’t matter if you fill up the window in 1 second or 60 minutes. Also, it’s super easy to game. Insert random lags, reduce tokens/sec, there you have a model that maintains attention over “long-time horizons”

Maybe more importantly how do people in this field buy into these easily game-able non-indicators so easily? How did they not develop the instinct to instantly call out metrics like lines of code, number of tokens burned or time taken to process a task as BS the instant they hear it?

How do they benchmark their code? The longer running the better? Number of CPU cycles spent?

Source: HACKERNEWS (hackernews)
Author: ozozozd
Date: 2026-05-05
Relevance: 3
Topics: agents, llm, industry_trends

View Original Post ↗

A skeptical HN post questioning whether 'long time horizon' capability in AI agents is a meaningful metric or easily gam

Related Posts

OpenAI and major chip/cloud vendors co-release MRC, an open networking protocol ...

GPT-5.5 launch metrics show API revenue growing 2x faster than any prior release...

MRC is already deployed across OpenAI's largest supercomputers at Oracle and Mic...