Week 2026-W19 — CedarPond AI Feed

Agent Infrastructure hackernews

A developer built OpenClaw, a minimalist self-hosted Telegram bot interfacing wi...

A developer built OpenClaw, a minimalist self-hosted Telegram bot interfacing with a Pi AI agent harness, supporting shell commands, cron tasking, and session switching from mobile.

kkovacs · 2026-05-17 · 3

Read more → Original ↗

Industry News @llama_index

LlamaIndex outlines two categories of finance AI agents: repetitive back-office ...

LlamaIndex outlines two categories of finance AI agents: repetitive back-office automation (invoices, KYC) and assistive agents, both requiring high-quality context engineering from documents.

llama_index · 2026-05-17 · 5

Read more → Original ↗

Industry News @llama_index

Retweet of the LlamaIndex finance AI agents post about context engineering categ...

Retweet of the LlamaIndex finance AI agents post about context engineering categories in back-office and assistive use cases.

llama_index · 2026-05-17 · 2

Read more → Original ↗

Industry News @llama_index

LlamaIndex wrapped up participation at AI Engineer Singapore with a workshop, ke...

LlamaIndex wrapped up participation at AI Engineer Singapore with a workshop, keynote, and executive dinner, previewing an upcoming SF World Fair appearance.

llama_index · 2026-05-17 · 2

Read more → Original ↗

Industry News @langfuse

Langfuse shared a link with no accompanying text, content unknown.

langfuse · 2026-05-16 · 1

Read more → Original ↗

Industry News @langfuse

Langfuse highlighted a recommended read on monitoring for LLM applications.

langfuse · 2026-05-16 · 3

Read more → Original ↗

Industry News @xai

xAI prompted users to connect their X account to an unspecified service, likely ...

xAI prompted users to connect their X account to an unspecified service, likely Grok or Hermes Agent.

xai · 2026-05-16 · 2

Read more → Original ↗

Agent Infrastructure @xai

xAI's Hermes Agent now supports X Premium subscriptions and can search X posts, ...

xAI's Hermes Agent now supports X Premium subscriptions and can search X posts, expanding its real-time data access capabilities.

xai · 2026-05-16 · 5

Read more → Original ↗

Industry News @GoogleDeepMind

Google DeepMind announced Google I/O on May 19, teasing new product updates and ...

Google DeepMind announced Google I/O on May 19, teasing new product updates and AI breakthroughs from the event.

GoogleDeepMind · 2026-05-15 · 6

Read more → Original ↗

Industry News @GoogleDeepMind

Retweet of Google's Google I/O reminder for May 19, promising AI product announc...

Retweet of Google's Google I/O reminder for May 19, promising AI product announcements and breakthroughs.

GoogleDeepMind · 2026-05-16 · 2

Read more → Original ↗

Industry News hackernews

Revkit.ai founder argues AI will replace the entire human-and-spreadsheet layer ...

Revkit.ai founder argues AI will replace the entire human-and-spreadsheet layer around Salesforce, not just improve it, representing a massive enterprise opportunity.

emmanol · 2026-05-15 · 4

Read more → Original ↗

Research Papers hackernews

Emergence World evaluates LLMs by having them build and govern simulated societi...

Emergence World evaluates LLMs by having them build and govern simulated societies; Claude built a democracy with zero crimes while Grok's world descended into chaos within 48 hours.

deepakakkil · 2026-05-15 · 7

Read more → Original ↗

Industry News openai_blog

OpenAI showcases how sales teams can use Codex to automate pipeline briefs, meet...

OpenAI showcases how sales teams can use Codex to automate pipeline briefs, meeting prep, and deal analysis from real work inputs.

OpenAI Blog · 2026-05-15 · 3

Read more → Original ↗

Industry News openai_blog

ChatGPT is introducing a personal finance feature for Pro users in the US, allow...

ChatGPT is introducing a personal finance feature for Pro users in the US, allowing secure connection of financial accounts for AI-powered insights.

OpenAI Blog · 2026-05-15 · 4

Read more → Original ↗

Industry News openai_blog

OpenAI demonstrates Codex use cases for data science teams, including root-cause...

OpenAI demonstrates Codex use cases for data science teams, including root-cause briefs, KPI memos, and dashboard specs from real work inputs.

OpenAI Blog · 2026-05-15 · 3

Read more → Original ↗

Industry News openai_blog

Databricks adopts GPT-5.5 for enterprise agent workflows after the model achieve...

Databricks adopts GPT-5.5 for enterprise agent workflows after the model achieved state-of-the-art results on the OfficeQA Pro benchmark.

OpenAI Blog · 2026-05-15 · 6

Read more → Original ↗

Industry News openai_blog

OpenAI highlights Codex capabilities for business operations teams, enabling aut...

OpenAI highlights Codex capabilities for business operations teams, enabling automated creation of strategy updates, initiative briefs, and leadership decision packets.

OpenAI Blog · 2026-05-15 · 3

Read more → Original ↗

Industry News openai_blog

OpenAI partners with Malta to expand AI access by offering ChatGPT Plus subscrip...

OpenAI partners with Malta to expand AI access by offering ChatGPT Plus subscriptions and training programs to help citizens develop practical AI skills.

OpenAI Blog · 2026-05-16 · 3

Read more → Original ↗

Model Releases @llama_index

INF released two open-weight models (Infinity-Parser2-Pro 35B and Flash 2B) that...

INF released two open-weight models (Infinity-Parser2-Pro 35B and Flash 2B) that top the ParseBench leaderboard for document understanding, trained on expanded synthetic data.

llama_index · 2026-05-15 · 7

Read more → Original ↗

Model Releases @llama_index

Repost of the INF Infinity-Parser2 model release announcement topping the docume...

Repost of the INF Infinity-Parser2 model release announcement topping the document understanding leaderboard on HuggingFace's ParseBench.

llama_index · 2026-05-16 · 2

Read more → Original ↗

Industry News @ArizeAI

ArizeAI announces partnership with Deloitte Canada to help enterprises move comp...

ArizeAI announces partnership with Deloitte Canada to help enterprises move complex AI systems from experimentation into production-grade workflows with better observability.

ArizeAI · 2026-05-15 · 4

Read more → Original ↗

Agent Infrastructure @ArizeAI

ArizeAI highlights key challenges of scaling multi-agent systems in production, ...

ArizeAI highlights key challenges of scaling multi-agent systems in production, including context loss during handoffs and excessive token consumption.

ArizeAI · 2026-05-15 · 5

Read more → Original ↗

Industry News @ArizeAI

ArizeAI partners with Deloitte Canada to help enterprises operationalize agent s...

ArizeAI partners with Deloitte Canada to help enterprises operationalize agent systems with tracing, evaluation, monitoring, and governance tooling.

ArizeAI · 2026-05-15 · 4

Read more → Original ↗

Industry News @ArizeAI

ArizeAI joins MistralAI, CoderHQ, and Workato at the AWS Agentic AI Partner Show...

ArizeAI joins MistralAI, CoderHQ, and Workato at the AWS Agentic AI Partner Showcase in SF to discuss what it takes to ship agents to production.

ArizeAI · 2026-05-15 · 3

Read more → Original ↗

Agent Infrastructure @langfuse

Langfuse is hosting an in-person training session in San Francisco on May 26th c...

Langfuse is hosting an in-person training session in San Francisco on May 26th covering how to bring agents to production using Langfuse observability tools.

langfuse · 2026-05-15 · 3

Read more → Original ↗

Agent Infrastructure @langfuse

Retweet of Langfuse's in-person SF training announcement for bringing agents to ...

Retweet of Langfuse's in-person SF training announcement for bringing agents to production using Langfuse.

langfuse · 2026-05-15 · 1

Read more → Original ↗

Agent Infrastructure @langfuse

Langfuse shared a link with no accompanying text, providing minimal context abou...

Langfuse shared a link with no accompanying text, providing minimal context about the content.

langfuse · 2026-05-15 · 1

Read more → Original ↗

Agent Infrastructure @langfuse

Langfuse posted a brief informal message expressing enthusiasm for traces, likel...

Langfuse posted a brief informal message expressing enthusiasm for traces, likely referencing their tracing/observability product.

langfuse · 2026-05-15 · 1

Read more → Original ↗

Model Releases @Cohere

Cohere highlights how its Compass product can search and retrieve information fr...

Cohere highlights how its Compass product can search and retrieve information from unstructured data including scans of handwritten and typed declassified documents.

Cohere · 2026-05-15 · 5

Read more → Original ↗

Agent Infrastructure @xai

Grok subscribers can now use their subscription within the Nous Research Hermes ...

Grok subscribers can now use their subscription within the Nous Research Hermes Agent, expanding Grok's integration into third-party agent frameworks.

xai · 2026-05-15 · 6

Read more → Original ↗

Research Papers arxiv

SRT (Self-Recall Thinking) is a framework that improves multi-turn dialogue cons...

SRT (Self-Recall Thinking) is a framework that improves multi-turn dialogue consistency by identifying and retrieving relevant historical turns to resolve long-range dependencies without external memory or lossy summarization.

Renning Pang, Tian Lan, Leyuan Liu +3 more · 2026-05-14 · 6

Read more → Original ↗

Research Papers arxiv

This paper studies how to design logging policies for off-policy evaluation, cha...

This paper studies how to design logging policies for off-policy evaluation, characterizing a reward-coverage tradeoff and deriving optimal policies to minimize OPE estimation error.

Connor Douglas, Joel Persson, Foster Provost · 2026-05-14 · 4

Read more → Original ↗

Research Papers arxiv

This paper reframes citation faithfulness in Agentic GraphRAG as a trajectory-le...

This paper reframes citation faithfulness in Agentic GraphRAG as a trajectory-level problem, showing that uncited but visited graph entities significantly influence answers and must be accounted for in provenance.

Riccardo Terrenzi, Maximilian von Zastrow, Serkan Ayvaz · 2026-05-14 · 6

Read more → Original ↗

Research Papers arxiv

CLOVER addresses the training-evaluation mismatch in autonomous driving by using...

CLOVER addresses the training-evaluation mismatch in autonomous driving by using closed-loop value estimation and ranking to better score trajectory candidates beyond simple imitation learning.

Sining Ang, Yuguang Yang, Canyu Chen +1 more · 2026-05-14 · 5

Read more → Original ↗

Research Papers arxiv

A survey of 60 international students in the US reveals how they use conversatio...

A survey of 60 international students in the US reveals how they use conversational AI tools like ChatGPT to navigate cross-cultural adaptation challenges where institutional support is fragmented.

Laleh Nourian, Anisa Callis, Stephanie Patterson +3 more · 2026-05-14 · 3

Read more → Original ↗

Agent Infrastructure arxiv

APWA introduces a distributed multi-agent architecture that enables high-through...

APWA introduces a distributed multi-agent architecture that enables high-throughput parallel processing of complex agentic workloads, addressing coordination and scaling bottlenecks in LLM-based multi-agent systems.

Evan Rose, Tushin Mallick, Matthew D. Laws +2 more · 2026-05-14 · 7

Read more → Original ↗

Research Papers arxiv

This work introduces the first quantization-conditioned attack that works agains...

This work introduces the first quantization-conditioned attack that works against sophisticated quantization schemes by injecting outliers into model weights, enabling malicious behavior to emerge only after quantization.

Xiaohua Zhan, Kazuki Egashira, Robin Staab +2 more · 2026-05-14 · 7

Read more → Original ↗

Model Releases arxiv

Pelican-Unified 1.0 is an embodied foundation model that uses a single VLM for u...

Pelican-Unified 1.0 is an embodied foundation model that uses a single VLM for unified understanding, reasoning, and action, jointly generating future videos and actions in a single forward pass.

Yi Zhang, Yinda Chen, Che Liu +24 more · 2026-05-14 · 7

Read more → Original ↗

Research Papers arxiv

SDAR improves RL-based LLM agent training by incorporating self-distillation as ...

SDAR improves RL-based LLM agent training by incorporating self-distillation as a gated auxiliary objective, providing dense token-level supervision to stabilize multi-turn agentic learning.

Zhengxi Lu, Zhiyuan Yao, Zhuowen Han +8 more · 2026-05-14 · 6

Read more → Original ↗

Research Papers arxiv

MeMo encodes new knowledge into a dedicated modular memory model attached to a f...

MeMo encodes new knowledge into a dedicated modular memory model attached to a frozen LLM, enabling plug-and-play knowledge updates that avoid catastrophic forgetting without requiring access to LLM weights.

Ryan Wei Heng Quek, Sanghyuk Lee, Alfred Wei Lun Leong +6 more · 2026-05-14 · 7

Read more → Original ↗

Research Papers arxiv

Position paper formalizing the 'audit gap' — the structural mismatch between wha...

Position paper formalizing the 'audit gap' — the structural mismatch between what AI governance frameworks require (e.g., absence of hidden objectives) and what behavioral evaluations and red-teaming can actually verify from observable outputs alone.

Pratinav Seth, Vinay Kumar Sankarapu · 2026-05-14 · 7

Read more → Original ↗

Research Papers arxiv

Retrieval-augmented multimodal alignment framework that combines semantically ri...

Retrieval-augmented multimodal alignment framework that combines semantically rich clinical text with precisely timestamped EHR data to reconstruct accurate clinical timelines for conditions like sepsis.

Sayantan Kumar, Shahriar Noroozizadeh, Juyong Kim +1 more · 2026-05-14 · 4

Read more → Original ↗

Research Papers arxiv

EviScreen is an evidential reasoning framework for medical image disease screeni...

EviScreen is an evidential reasoning framework for medical image disease screening that retrieves region-level evidence from historical cases via dual knowledge banks, improving both interpretability and predictive performance.

Chenyu Lian, Hong-Yu Zhou, Jing Qin · 2026-05-14 · 4

Read more → Original ↗

Research Papers arxiv

OpenDeepThink scales LLM reasoning breadth by sampling multiple candidate traces...

OpenDeepThink scales LLM reasoning breadth by sampling multiple candidate traces in parallel and selecting the best via pairwise Bradley-Terry ranking, bypassing the noise of pointwise LLM judging.

Shang Zhou, Wenhao Chai, Kaiyuan Liu +3 more · 2026-05-14 · 6

Read more → Original ↗

Research Papers arxiv

Shodh-MoE applies sparse mixture-of-experts routing to eliminate negative transf...

Shodh-MoE applies sparse mixture-of-experts routing to eliminate negative transfer and gradient conflict when co-training incompatible physics regimes in scientific ML foundation models.

Ellwil Sharma, Arastu Sharma · 2026-05-14 · 4

Read more → Original ↗

Research Papers arxiv

PDI-Bench provides a quantitative framework for auditing geometric coherence in ...

PDI-Bench provides a quantitative framework for auditing geometric coherence in AI-generated videos by lifting 2D observations to 3D world-space and computing projective-geometry residuals.

Jiaxin Wu, Yihao Pi, Yinling Zhang +2 more · 2026-05-14 · 4

Read more → Original ↗

Research Papers arxiv

VGGT-Edit enables native 3D scene editing via feed-forward residual field predic...

VGGT-Edit enables native 3D scene editing via feed-forward residual field prediction, avoiding the blurry textures and cross-view inconsistencies typical of 2D-lifting editing pipelines.

Kaixin Zhu, Yiwen Tang, Yifan Yang +9 more · 2026-05-14 · 5

Read more → Original ↗

Research Papers arxiv

FutureSim evaluates adaptive AI agents by replaying real-world news events chron...

FutureSim evaluates adaptive AI agents by replaying real-world news events chronologically past their knowledge cutoff, revealing clear capability separations among frontier agents forecasting a three-month period.

Shashwat Goel, Nikhil Chandak, Arvindh Arun +5 more · 2026-05-14 · 7

Read more → Original ↗

Research Papers arxiv

ATLAS unifies agentic (code/tool-call) and latent visual reasoning via a single ...

ATLAS unifies agentic (code/tool-call) and latent visual reasoning via a single discrete token, combining the generalization of agentic methods with the efficiency of latent reasoning while enabling autoregressive parallelization.

Ziyu Guo, Rain Liu, Xinyan Chen +1 more · 2026-05-14 · 6

Read more → Original ↗

Research Papers arxiv

EntityBench introduces a 140-episode benchmark derived from real narrative media...

EntityBench introduces a 140-episode benchmark derived from real narrative media to evaluate entity consistency (characters, objects, locations) across long multi-shot video generation sequences.

Ruozhen He, Meng Wei, Ziyan Yang +1 more · 2026-05-14 · 4

Read more → Original ↗

Industry News hackernews

A developer built orobot.io, a curated directory of 61 3D-printable robots with ...

A developer built orobot.io, a curated directory of 61 3D-printable robots with AI-generated descriptions and tips, supporting multiple hardware platforms like Raspberry Pi and Arduino.

xanderjanz · 2026-05-14 · 3

Read more → Original ↗

Industry News openai_blog

OpenAI updated ChatGPT's safety systems to improve context-aware risk detection ...

OpenAI updated ChatGPT's safety systems to improve context-aware risk detection in sensitive conversations, enabling more nuanced and safer responses over time.

OpenAI Blog · 2026-05-14 · 5

Read more → Original ↗

Industry News openai_blog

OpenAI's Codex is now accessible via the ChatGPT mobile app, allowing users to m...

OpenAI's Codex is now accessible via the ChatGPT mobile app, allowing users to monitor, steer, and approve coding tasks in real time across remote environments.

OpenAI Blog · 2026-05-14 · 6

Read more → Original ↗

Industry News openai_blog

Sea Limited's CPO shares how the company is rolling out Codex across engineering...

Sea Limited's CPO shares how the company is rolling out Codex across engineering teams to drive AI-native software development across Asia.

OpenAI Blog · 2026-05-14 · 5

Read more → Original ↗

Agent Infrastructure @skirano

MagicPath 2.0 launches as a multiplayer canvas enabling humans and AI coding age...

MagicPath 2.0 launches as a multiplayer canvas enabling humans and AI coding agents like Codex and Claude Code to collaboratively design and build functional prototypes in real time.

skirano · 2026-05-14 · 7

Read more → Original ↗

Industry News @llama_index

LlamaIndex hosted two sold-out back-to-back developer events in NYC covering AI ...

LlamaIndex hosted two sold-out back-to-back developer events in NYC covering AI engineering, including a hands-on workshop led by founders.

llama_index · 2026-05-14 · 2

Read more → Original ↗

Industry News @ArizeAI

Arize AI shares how their marketing team built a content engine that clones foun...

Arize AI shares how their marketing team built a content engine that clones founder voices using AI trained on years of historical content.

ArizeAI · 2026-05-14 · 2

Read more → Original ↗

Agent Infrastructure @ArizeAI

Arize AI discusses how Cursor integrates AI observability into the developer wor...

Arize AI discusses how Cursor integrates AI observability into the developer workflow, highlighting the operational challenges at Cursor's scale.

ArizeAI · 2026-05-15 · 4

Read more → Original ↗

Industry News @langfuse

Langfuse shared a link with no accompanying text, providing no analyzable conten...

Langfuse shared a link with no accompanying text, providing no analyzable content.

langfuse · 2026-05-14 · 1

Read more → Original ↗

Agent Infrastructure @langfuse

Langfuse launches Langfuse Academy, a free open educational resource covering th...

Langfuse launches Langfuse Academy, a free open educational resource covering the full AI engineering lifecycle including tracing, monitoring, evaluation, and experimentation.

langfuse · 2026-05-14 · 5

Read more → Original ↗

Industry News @langfuse

Langfuse retweet containing only a URL with no additional context or content.

langfuse · 2026-05-15 · 1

Read more → Original ↗

Industry News @langfuse

Langfuse post containing only a URL with no additional context or content.

langfuse · 2026-05-14 · 1

Read more → Original ↗

Agent Infrastructure @langfuse

Langfuse introduces the 'AI Engineering Loop', a structured process the best AI ...

Langfuse introduces the 'AI Engineering Loop', a structured process the best AI teams use to ship complex AI systems to production, with a supporting academy series.

langfuse · 2026-05-14 · 5

Read more → Original ↗

Industry News @Cohere

Cohere posts a cryptic teaser ('the truth is out there') with URLs, suggesting a...

Cohere posts a cryptic teaser ('the truth is out there') with URLs, suggesting an upcoming product or model announcement with no explicit details.

Cohere · 2026-05-14 · 2

Read more → Original ↗

Agent Infrastructure @perplexity_ai

Perplexity AI expands its Snowflake integration to support dashboard and automat...

Perplexity AI expands its Snowflake integration to support dashboard and automation building for pipeline analysis and customer segmentation, with admin-level access controls.

perplexity_ai · 2026-05-14 · 4

Read more → Original ↗

Agent Infrastructure @perplexity_ai

Perplexity AI's 'Computer' product now connects to Snowflake, enabling natural-l...

Perplexity AI's 'Computer' product now connects to Snowflake, enabling natural-language querying of live warehouse data with SQL, source tables, and metrics — functioning as an on-call data science assistant.

perplexity_ai · 2026-05-14 · 6

Read more → Original ↗

Model Releases @xai

xAI launches an early beta of Grok Build, an agentic CLI tool for coding, app bu...

xAI launches an early beta of Grok Build, an agentic CLI tool for coding, app building, and workflow automation, initially available to SuperGrok Heavy subscribers.

xai · 2026-05-14 · 7

Read more → Original ↗

Industry News @GoogleDeepMind

Google DeepMind and Kaggle announce a free 5-day AI Agents intensive course (Jun...

Google DeepMind and Kaggle announce a free 5-day AI Agents intensive course (June 15–19) featuring a new simulated capstone challenge called Kaggriculture, designed by Google researchers.

GoogleDeepMind · 2026-05-13 · 4

Read more → Original ↗

Industry News @GoogleDeepMind

Retweet of Google DeepMind's announcement of the Kaggle AI Agents intensive cour...

Retweet of Google DeepMind's announcement of the Kaggle AI Agents intensive course and Kaggriculture capstone challenge — no additional content.

GoogleDeepMind · 2026-05-14 · 2

Read more → Original ↗

Industry News @OpenAI

OpenAI post containing only a URL with no additional context or content.

OpenAI · 2026-05-14 · 2

Read more → Original ↗

Industry News @OpenAI

OpenAI shared a link via retweet from their developer account, but the content i...

OpenAI shared a link via retweet from their developer account, but the content is a URL with no additional context.

OpenAI · 2026-05-14 · 1

Read more → Original ↗

Agent Infrastructure @OpenAI

OpenAI is rolling out the Codex mobile app preview on iOS and Android globally, ...

OpenAI is rolling out the Codex mobile app preview on iOS and Android globally, with Windows phone-to-desktop support coming soon.

OpenAI · 2026-05-14 · 6

Read more → Original ↗

Agent Infrastructure @OpenAI

OpenAI launches Codex in the ChatGPT mobile app, enabling users to start tasks, ...

OpenAI launches Codex in the ChatGPT mobile app, enabling users to start tasks, review outputs, and steer agent execution remotely while Codex runs on a local machine.

OpenAI · 2026-05-14 · 7

Read more → Original ↗

Industry News @AnthropicAI

Anthropic is partnering with the Gates Foundation, committing $200M in grants, C...

Anthropic is partnering with the Gates Foundation, committing $200M in grants, Claude credits, and technical support toward global health, education, agriculture, and economic mobility.

AnthropicAI · 2026-05-14 · 6

Read more → Original ↗

Research Papers @AnthropicAI

Anthropic published a policy paper on US-China AI competition, arguing the US an...

Anthropic published a policy paper on US-China AI competition, arguing the US and democratic allies currently lead in frontier AI and outlining steps to maintain that advantage.

AnthropicAI · 2026-05-14 · 7

Read more → Original ↗

Agent Infrastructure hackernews

Arrivl is a free analytics tool that parses raw server logs to track AI agent/LL...

Arrivl is a free analytics tool that parses raw server logs to track AI agent/LLM crawler traffic, filling the gap left by JS-based analytics tools that agents bypass entirely.

starfun · 2026-05-13 · 5

Read more → Original ↗

Agent Infrastructure hackernews

A Show HN submission presenting a multi-LLM AI trading agent harness, with no fu...

A Show HN submission presenting a multi-LLM AI trading agent harness, with no further details provided in the post.

satoshiclad · 2026-05-14 · 3

Read more → Original ↗

Industry News openai_blog

OpenAI disclosed its response to the TanStack 'Mini Shai-Hulud' supply chain att...

OpenAI disclosed its response to the TanStack 'Mini Shai-Hulud' supply chain attack, detailing protective measures for signing certificates and urging macOS users to update OpenAI apps by June 12, 2026.

OpenAI Blog · 2026-05-13 · 7

Read more → Original ↗

Agent Infrastructure openai_blog

OpenAI built a secure sandboxed environment for Codex on Windows, enabling codin...

OpenAI built a secure sandboxed environment for Codex on Windows, enabling coding agents to operate with controlled file access and network restrictions for safety.

OpenAI Blog · 2026-05-13 · 6

Read more → Original ↗

Industry News @RichardSocher

Post contains only a URL with no extractable content or context to summarize.

RichardSocher · 2026-05-13 · 1

Read more → Original ↗

Agent Infrastructure @ArizeAI

ArizeAI shares a write-up on their approach to agent feedback loops, linking to ...

ArizeAI shares a write-up on their approach to agent feedback loops, linking to a detailed article on closing the observability-to-development cycle.

ArizeAI · 2026-05-13 · 4

Read more → Original ↗

Agent Infrastructure @ArizeAI

ArizeAI outlines their agent development feedback loop—trace, diagnose, change p...

ArizeAI outlines their agent development feedback loop—trace, diagnose, change prompt/code, eval, redeploy—integrated between observability tooling and the IDE to reduce manual context-switching.

ArizeAI · 2026-05-13 · 5

Read more → Original ↗

Agent Infrastructure @ArizeAI

ArizeAI describes the challenge of debugging agents across many spans, drawing o...

ArizeAI describes the challenge of debugging agents across many spans, drawing on lessons from building their internal Alyx AI engineering agent to define a structured feedback loop.

ArizeAI · 2026-05-13 · 4

Read more → Original ↗

Industry News @ArizeAI

Factory CTO Eno Reyes is presenting at ArizeAI's Observe conference on productio...

Factory CTO Eno Reyes is presenting at ArizeAI's Observe conference on production patterns for fully autonomous AI product engineering teams, covering the human-agent division of labor.

ArizeAI · 2026-05-13 · 4

Read more → Original ↗

Agent Infrastructure @AravSrinivas

Perplexity CEO highlights their computer/agent security architecture featuring h...

Perplexity CEO highlights their computer/agent security architecture featuring hardware-isolated per-task sandboxes with VPC-level separation and short-lived proxy tokens instead of raw API keys.

AravSrinivas · 2026-05-13 · 6

Read more → Original ↗

Agent Infrastructure @AravSrinivas

Perplexity is developing a secure, scalable agent runtime sandbox featuring prox...

Perplexity is developing a secure, scalable agent runtime sandbox featuring proxy API key management, real-time safety detection, and encrypted connector data for enterprise agents.

AravSrinivas · 2026-05-13 · 7

Read more → Original ↗

Industry News @perplexity_ai

PayPal runs 74,000 weekly tasks through Perplexity Enterprise for use cases incl...

PayPal runs 74,000 weekly tasks through Perplexity Enterprise for use cases including model validation, market research, and competitive intelligence, highlighting strong enterprise AI adoption.

perplexity_ai · 2026-05-13 · 5

Read more → Original ↗

Agent Infrastructure @perplexity_ai

Perplexity details its agent security stack: parallel ML classifiers and a Brows...

Perplexity details its agent security stack: parallel ML classifiers and a BrowseSafe model scan external content before agents act, while file connector data is encrypted and auto-deleted after 7 days.

perplexity_ai · 2026-05-13 · 6

Read more → Original ↗

Agent Infrastructure @perplexity_ai

Perplexity's Computer product runs every task in a hardware-isolated sandbox wit...

Perplexity's Computer product runs every task in a hardware-isolated sandbox with VPC-level separation and authenticates agents via short-lived proxy tokens instead of raw API keys.

perplexity_ai · 2026-05-13 · 7

Read more → Original ↗

Industry News @OpenAI

OpenAI is promoting Codex to enterprise customers with a 2-free-months incentive...

OpenAI is promoting Codex to enterprise customers with a 2-free-months incentive for new users who switch within 30 days.

OpenAI · 2026-05-13 · 3

Read more → Original ↗

Industry News @OpenAI

OpenAI teases an additional reason to adopt Codex, but the post content is incom...

OpenAI teases an additional reason to adopt Codex, but the post content is incomplete with no specific details provided.

OpenAI · 2026-05-13 · 2

Read more → Original ↗

Research Papers arxiv

SEMIR introduces a graph-based representation learning framework that decouples ...

SEMIR introduces a graph-based representation learning framework that decouples visual segmentation inference from native image grids, improving handling of small/sparse structures with topology-preserving latent representations.

Luke James Miller, Yugyung Lee · 2026-05-12 · 4

Read more → Original ↗

Research Papers arxiv

A Random Matrix Theory method detects the onset of overfitting in neural network...

A Random Matrix Theory method detects the onset of overfitting in neural networks without requiring train/test data, identifying 'Correlation Traps' that emerge during an 'anti-grokking' phase.

Hari K. Prakash, Charles H Martin · 2026-05-12 · 6

Read more → Original ↗

Research Papers arxiv

OGLS-SD improves on-policy self-distillation for LLM reasoning by using outcome-...

OGLS-SD improves on-policy self-distillation for LLM reasoning by using outcome-guided logit steering to correct teacher-student calibration mismatches caused by reflection-induced bias.

Yuxiao Yang, Xiaoyun Wang, Weitong Zhang · 2026-05-12 · 5

Read more → Original ↗

Research Papers arxiv

This paper introduces 'Semantic Reward Collapse' (SRC) to explain how scalarized...

This paper introduces 'Semantic Reward Collapse' (SRC) to explain how scalarized RLHF optimization conflates distinct failure modes—like sycophancy and hallucination—into undifferentiated signals, undermining epistemic integrity.

William Parris · 2026-05-12 · 7

Read more → Original ↗

Research Papers arxiv

Proposes a text-tabular modeling approach for predicting decisions of unknown AI...

Proposes a text-tabular modeling approach for predicting decisions of unknown AI agents in negotiation scenarios from limited interactions, with implications for multi-agent systems.

Eilam Shapira, Moshe Tennenholtz, Roi Reichart · 2026-05-12 · 6

Read more → Original ↗

Research Papers arxiv

LLMs' in-context learning is framed as Bayesian inference over a low-dimensional...

LLMs' in-context learning is framed as Bayesian inference over a low-dimensional 'conceptual belief space,' with belief updates forming structured trajectories on geometric manifolds.

Eric Bigelow, Raphaël Sarfati, Daniel Wurgaft +5 more · 2026-05-12 · 6

Read more → Original ↗

Research Papers arxiv

A new benchmark (CP-SynC-XL) shows LLMs should use declarative constraint modeli...

A new benchmark (CP-SynC-XL) shows LLMs should use declarative constraint modeling (MiniZinc) rather than optimizing Python heuristics when synthesizing combinatorial solvers, revealing a key design principle for neuro-symbolic systems.

Haoyu Wang, Yuliang Song, Tao Li +5 more · 2026-05-12 · 6

Read more → Original ↗

Research Papers arxiv

CAAFC is a chronological automated fact-checking framework that outperforms stat...

CAAFC is a chronological automated fact-checking framework that outperforms state-of-the-art systems on both misinformation detection and hallucination correction, better aligning with real-world fact-checking workflows.

Islam Eldifrawi, Shengrui Wang, Amine Trabelsi · 2026-05-12 · 5

Read more → Original ↗

Research Papers arxiv

Temporarily switching encoder pretraining from MLM to CLM before a short MLM dec...

Temporarily switching encoder pretraining from MLM to CLM before a short MLM decay phase yields consistent downstream gains (+0.3–2.8pp) on biomedical NLP tasks, suggesting CLM provides richer low-layer supervision.

Rian Touchent, Eric de la Clergerie · 2026-05-12 · 5

Read more → Original ↗

Research Papers arxiv

A large-scale audit of 1.7M posts across nine crisis events finds that LLM-gener...

A large-scale audit of 1.7M posts across nine crisis events finds that LLM-generated political discourse exhibits systematic statistical deviations from real online populations, enabling detection beyond surface-level token cues.

Gunjan, Sidahmed Benabderrahmane, Talal Rahwan · 2026-05-12 · 6

Read more → Original ↗

Research Papers arxiv

Researchers present a real-world dataset collected from commercially deployed 5G...

Researchers present a real-world dataset collected from commercially deployed 5G networks across multiple mobility modes to support AI/ML-based beam management and handover optimization.

Mannam Veera Narayana, Rohit Singh, Deepa M. R +1 more · 2026-05-12 · 4

Read more → Original ↗

Research Papers arxiv

A Gymnasium reinforcement learning environment is introduced for optimizing elec...

A Gymnasium reinforcement learning environment is introduced for optimizing electric utility demand-response programs, addressing the gap between offline historical data and dynamic real-world grid interactions.

Jose E. Aguilar Escamilla, Lingdong Zhou, Xiangqi Zhu +1 more · 2026-05-12 · 4

Read more → Original ↗

Research Papers arxiv

Attractor Models are proposed as an alternative to looped transformers, using im...

Attractor Models are proposed as an alternative to looped transformers, using implicit differentiation to find fixed points in latent representations, achieving constant training memory and adaptive iteration depth with strong language modeling and reasoning results.

Jacob Fein-Ashley, Paria Rashidinejad · 2026-05-12 · 7

Read more → Original ↗

Research Papers arxiv

KV-Fold is a training-free protocol for long-context inference that treats the K...

KV-Fold is a training-free protocol for long-context inference that treats the KV cache as a left-fold accumulator over sequence chunks, enabling efficient recurrent-style processing without model retraining.

Alireza Nadali, Patrick Cooper, Ashutosh Trivedi +1 more · 2026-05-12 · 6

Read more → Original ↗

Research Papers arxiv

This work studies reward hacking in rubric-based RL post-training, identifying t...

This work studies reward hacking in rubric-based RL post-training, identifying two failure modes—verifier failure and rubric-design limitations—and showing weak verifiers lead to poor generalization across medical and science domains.

Anas Mahmoud, MohammadHossein Rezaei, Zihao Wang +3 more · 2026-05-12 · 7

Read more → Original ↗

Research Papers arxiv

OmniNFT applies reinforcement learning to joint audio-video generation, addressi...

OmniNFT applies reinforcement learning to joint audio-video generation, addressing multi-objective advantage inconsistency and cross-modal gradient imbalance to improve per-modality fidelity and synchronization.

Guohui Zhang, XiaoXiao Ma, Jie Huang +9 more · 2026-05-12 · 5

Read more → Original ↗

Research Papers arxiv

ToolCUA is an end-to-end computer use agent that learns optimal selection betwee...

ToolCUA is an end-to-end computer use agent that learns optimal selection between GUI actions and high-level tool calls through a staged training paradigm using interleaved GUI-Tool trajectories.

Xuhao Hu, Xi Zhang, Haiyang Xu +6 more · 2026-05-12 · 7

Read more → Original ↗

Research Papers arxiv

The paper proposes a sparse-to-dense reward principle for LLM post-training, arg...

The paper proposes a sparse-to-dense reward principle for LLM post-training, arguing that GRPO-style sparse RL and dense on-policy distillation should be applied at different stages based on reward density rather than treated as separate recipes.

Yuanda Xu, Hejian Sang, Zhengze Zhou +3 more · 2026-05-12 · 7

Read more → Original ↗

Research Papers arxiv

A fast-slow learning framework for LLMs is introduced that combines in-context (...

A fast-slow learning framework for LLMs is introduced that combines in-context (fast) and in-weights (slow) adaptation to enable continual learning, mitigating catastrophic forgetting while retaining the benefits of parameter updates.

Rishabh Tiwari, Kusha Sareen, Lakshya A Agrawal +6 more · 2026-05-12 · 7

Read more → Original ↗

Research Papers arxiv

AlphaGRPO applies GRPO to unified multimodal models to enable reasoning-driven t...

AlphaGRPO applies GRPO to unified multimodal models to enable reasoning-driven text-to-image generation and self-reflective output correction, using a Decompositional Verifiable Reward for stable supervision.

Runhui Huang, Jie Wu, Rui Yang +2 more · 2026-05-12 · 6

Read more → Original ↗

Industry News hackernews

HYPD is an AI co-pilot for Google Ads marketers that connects to ad accounts to ...

HYPD is an AI co-pilot for Google Ads marketers that connects to ad accounts to run audits, natural language data analysis, and generate ad copy. Built by a founder with prior ad-tech exits, it targets PPC freelancers and agencies.

cionut · 2026-05-13 · 3

Read more → Original ↗

Industry News hackernews

A CS student reflects on how AI coding agents have changed the emotional and int...

A CS student reflects on how AI coding agents have changed the emotional and intellectual experience of programming, expressing a sense of loss of deep learning and grounded engineering. A personal essay on developer identity in the LLM era.

northfield27 · 2026-05-12 · 2

Read more → Original ↗

Agent Infrastructure hackernews

Torrix is a self-hosted LLM observability tool that runs as a single Docker cont...

Torrix is a self-hosted LLM observability tool that runs as a single Docker container backed by SQLite, requiring no Postgres or Redis. It lowers the barrier to monitoring AI agents in production.

AdarshRao23 · 2026-05-13 · 5

Read more → Original ↗

Agent Infrastructure hackernews

Gigacatalyst provides an embedded AI builder layer for SaaS products, allowing n...

Gigacatalyst provides an embedded AI builder layer for SaaS products, allowing non-engineers to create custom features via natural language. Targets long-tail enterprise workflow customization without engineering overhead.

namanyayg · 2026-05-12 · 3

Read more → Original ↗

Agent Infrastructure hackernews

Statewright uses visual state machines to constrain AI agent behavior, improving...

Statewright uses visual state machines to constrain AI agent behavior, improving reliability by shrinking solution spaces rather than scaling up model size. Built by a veteran engineer with NVIDIA/AMD background.

azurewraith · 2026-05-12 · 6

Read more → Original ↗

Agent Infrastructure hackernews

Voker (YC S24) is an LLM-stack-agnostic analytics SDK for AI agent products, giv...

Voker (YC S24) is an LLM-stack-agnostic analytics SDK for AI agent products, giving engineering teams visibility into what users ask and whether agents are delivering in production. Addresses the gap in agent performance observability.

ttpost · 2026-05-12 · 6

Read more → Original ↗

Industry News openai_blog

AutoScout24 Group uses OpenAI's Codex and ChatGPT to accelerate development cycl...

AutoScout24 Group uses OpenAI's Codex and ChatGPT to accelerate development cycles and improve code quality across their engineering organization. A case study in enterprise AI coding adoption.

OpenAI Blog · 2026-05-12 · 4

Read more → Original ↗

Research Papers openai_blog

OpenAI's Parameter Golf competition drew 1,000+ participants to explore AI-assis...

OpenAI's Parameter Golf competition drew 1,000+ participants to explore AI-assisted ML research, coding agents, quantization, and novel model design under strict parameter constraints. Highlights community innovation in efficient model design.

OpenAI Blog · 2026-05-12 · 5

Read more → Original ↗

Model Releases openai_blog

OpenAI highlights teams using Codex with GPT-5.5 to ship production systems and ...

OpenAI highlights teams using Codex with GPT-5.5 to ship production systems and accelerate research-to-experiment pipelines. Positions Codex as a key tool for both engineering and research workflows.

OpenAI Blog · 2026-05-12 · 6

Read more → Original ↗

Industry News openai_blog

OpenAI demonstrates Codex being used by finance teams to automate reporting work...

OpenAI demonstrates Codex being used by finance teams to automate reporting workflows including MBRs, variance bridges, and planning scenarios from real work inputs. Expands Codex use cases beyond engineering into finance.

OpenAI Blog · 2026-05-12 · 4

Read more → Original ↗

Agent Infrastructure @llama_index

LlamaIndex introduces liteparse-server, a self-hostable open-source HTTP server ...

LlamaIndex introduces liteparse-server, a self-hostable open-source HTTP server for parsing PDFs, Office files, and images locally without sending data to external services.

llama_index · 2026-05-12 · 5

Read more → Original ↗

Industry News @ArizeAI

Arize AI advocates a hybrid evaluation strategy combining LLM-as-a-judge for nua...

Arize AI advocates a hybrid evaluation strategy combining LLM-as-a-judge for nuanced assessment, code-based evals for speed, and human annotators for ground truth rather than relying on any single method.

ArizeAI · 2026-05-12 · 5

Read more → Original ↗

Industry News @ArizeAI

Arize AI shared a link with no substantive text content available for analysis.

ArizeAI · 2026-05-11 · 1

Read more → Original ↗

Industry News @ArizeAI

Retweet of a link-only post with no substantive text content available for analy...

Retweet of a link-only post with no substantive text content available for analysis.

ArizeAI · 2026-05-12 · 1

Read more → Original ↗

Industry News @ArizeAI

Arize AI shared a link with no substantive text content available for analysis.

ArizeAI · 2026-05-12 · 1

Read more → Original ↗

Industry News @ArizeAI

Retweet of a link-only post with no substantive text content available for analy...

Retweet of a link-only post with no substantive text content available for analysis.

ArizeAI · 2026-05-12 · 1

Read more → Original ↗

Industry News @Cohere

Cohere's Chief AI Officer Joelle Pineau highlights the stark disparity in academ...

Cohere's Chief AI Officer Joelle Pineau highlights the stark disparity in academic entrepreneurship between California and Canada, suggesting Canada lags in translating AI research into startups.

Cohere · 2026-05-12 · 4

Read more → Original ↗

Industry News @Cohere

Retweet of Cohere CAO Joelle Pineau's comment on the entrepreneurship gap betwee...

Retweet of Cohere CAO Joelle Pineau's comment on the entrepreneurship gap between California and Canadian universities in AI.

Cohere · 2026-05-12 · 2

Read more → Original ↗

Research Papers @perplexity_ai

Perplexity AI highlights NVIDIA's GB200 (Blackwell) as the leading platform for ...

Perplexity AI highlights NVIDIA's GB200 (Blackwell) as the leading platform for large-model inference, citing prefill/decode disaggregation, Blackwell-native quantization, and NVLink rack-scale networking for lower serving costs.

perplexity_ai · 2026-05-12 · 7

Read more → Original ↗

Research Papers @perplexity_ai

Benchmark data shows NVIDIA GB200 cuts NVLink all-reduce latency nearly in half ...

Benchmark data shows NVIDIA GB200 cuts NVLink all-reduce latency nearly in half versus H200 (313µs vs 586µs) and significantly improves MoE prefill and decode throughput, demonstrating a major generational leap in inference hardware.

perplexity_ai · 2026-05-12 · 7

Read more → Original ↗

Research Papers @perplexity_ai

Perplexity AI published research on serving Qwen3 235B MoE models on NVIDIA GB20...

Perplexity AI published research on serving Qwen3 235B MoE models on NVIDIA GB200 NVL72 Blackwell racks, demonstrating GB200's superiority for high-throughput inference beyond just training workloads.

perplexity_ai · 2026-05-12 · 7

Read more → Original ↗

Industry News @GoogleDeepMind

Google DeepMind teases experimental AI-enabled mouse pointer capabilities availa...

Google DeepMind teases experimental AI-enabled mouse pointer capabilities available to try in Google AI Studio, hinting at next-generation UI interactions.

GoogleDeepMind · 2026-05-12 · 5

Read more → Original ↗

Agent Infrastructure @GoogleDeepMind

Google DeepMind demonstrates AI-powered mouse pointer that understands context o...

Google DeepMind demonstrates AI-powered mouse pointer that understands context of what is being pointed at, enabling interactions like converting scribbled notes to to-do lists or paused video frames to booking links.

GoogleDeepMind · 2026-05-12 · 7

Read more → Original ↗

Agent Infrastructure @GoogleDeepMind

Google DeepMind announces experimental reimagining of the mouse pointer using Ge...

Google DeepMind announces experimental reimagining of the mouse pointer using Gemini, enabling users to direct AI on-screen via motion, speech, and natural shorthand.

GoogleDeepMind · 2026-05-12 · 7

Read more → Original ↗

Research Papers @OpenAI

OpenAI's 'parameter golf' competition attracted 2,000+ submissions exploring tec...

OpenAI's 'parameter golf' competition attracted 2,000+ submissions exploring techniques like quantization, TTT LoRA, SSMs, and JEPA, with autoresearch tooling accelerating iteration and enabling emergent collaboration.

OpenAI · 2026-05-12 · 6

Read more → Original ↗

Research Papers @OpenAI

Retweet of the OpenAI parameter golf competition post summarizing community part...

Retweet of the OpenAI parameter golf competition post summarizing community participation and research directions explored.

OpenAI · 2026-05-12 · 2

Read more → Original ↗

Research Papers arxiv

Proposes a practical evaluation protocol for AI pentesting agents that shifts fr...

Proposes a practical evaluation protocol for AI pentesting agents that shifts from task completion metrics to validated vulnerability discovery, better reflecting real-world complexity.

Pedro Conde, Henrique Branquinho, Valerio Mazzone +3 more · 2026-05-11 · 6

Read more → Original ↗

Research Papers arxiv

Introduces Clin-JEPA, a co-training framework applying JEPA-style predictive pre...

Introduces Clin-JEPA, a co-training framework applying JEPA-style predictive pretraining to EHR patient trajectories for trajectory forecasting and downstream risk prediction without per-task fine-tuning.

Yixuan Yang, Mehak Arora, Ryan Zhang +10 more · 2026-05-11 · 5

Read more → Original ↗

Research Papers arxiv

DISCA is a training-free, black-box inference-time method that uses within-count...

DISCA is a training-free, black-box inference-time method that uses within-country sociodemographic disagreement signals from World Values Survey to culturally align LLMs without fine-tuning.

Huynh Trung Kiet, Dao Sy Duy Minh, Tuan Nguyen +5 more · 2026-05-11 · 6

Read more → Original ↗

Research Papers arxiv

Pi-Serini demonstrates that BM25 lexical retrieval paired with capable frontier ...

Pi-Serini demonstrates that BM25 lexical retrieval paired with capable frontier LLMs (e.g., GPT-5.5) can achieve 83.1% accuracy on deep research benchmarks, questioning the necessity of dense retrieval in agentic search.

Tz-Huan Hsu, Jheng-Hong Yang, Jimmy Lin · 2026-05-11 · 7

Read more → Original ↗

Research Papers arxiv

Introduces the Generalized Turing Test (GTT), a formal dataset- and task-agnosti...

Introduces the Generalized Turing Test (GTT), a formal dataset- and task-agnostic framework for comparing agent intelligence via indistinguishability, with analysis of transitivity and ordering properties.

Daniel Mitropolsky, Susan S. Hong, Riccardo Neumarker +2 more · 2026-05-11 · 6

Read more → Original ↗

Research Papers arxiv

BenchCAD presents a benchmark of 17,900 verified CadQuery programs across 106 in...

BenchCAD presents a benchmark of 17,900 verified CadQuery programs across 106 industrial part families to evaluate MLLMs on realistic parametric CAD code generation tasks.

Haozhe Zhang, Kaichen Liu, Miaomiao Chen +4 more · 2026-05-11 · 5

Read more → Original ↗

Research Papers arxiv

BEACON is a large-scale multimodal dataset (~430 GB) capturing behavioral biomet...

BEACON is a large-scale multimodal dataset (~430 GB) capturing behavioral biometrics from competitive Valorant gameplay to support continuous authentication research.

Ishpuneet Singh, Gursmeep Kaur, Uday Pratap Singh Atwal +3 more · 2026-05-11 · 4

Read more → Original ↗

Research Papers arxiv

Proposes a decision-centric rate-distortion framework for agent memory that dete...

Proposes a decision-centric rate-distortion framework for agent memory that determines what can be safely forgotten based on impact to decision quality rather than descriptive relevance.

Mingxi Zou, Zhihan Guo, Langzhang Liang +6 more · 2026-05-11 · 7

Read more → Original ↗

Research Papers arxiv

Attractor-Vascular Coupling Theory provides a mathematical framework linking car...

Attractor-Vascular Coupling Theory provides a mathematical framework linking cardiac attractor geometry to blood pressure estimation from smartphone PPG, validated to AAMI standards using LightGBM.

Timothy Oladunni, Farouk Ganiyu Adewumi · 2026-05-11 · 3

Read more → Original ↗

Research Papers arxiv

CADBench unifies multimodal CAD program generation evaluation with 18,000 sample...

CADBench unifies multimodal CAD program generation evaluation with 18,000 samples across six benchmark families, five input modalities, and six metrics covering geometry, executability, and compactness.

Anna C. Doris, Jacob Thomas Sony, Ghadi Nehme +3 more · 2026-05-11 · 5

Read more → Original ↗

Research Papers arxiv

AssayBench introduces a new benchmark for evaluating LLMs and agents on in silic...

AssayBench introduces a new benchmark for evaluating LLMs and agents on in silico phenotypic screens, filling a gap in virtual cell modeling evaluation.

Edward De Brouwer, Carl Edwards, Alexander Wu +9 more · 2026-05-11 · 6

Read more → Original ↗

Research Papers arxiv

LoKA proposes a system-model co-design approach to apply FP8 low-precision arith...

LoKA proposes a system-model co-design approach to apply FP8 low-precision arithmetic to large recommendation models, overcoming numerical sensitivity and training inefficiencies.

Liang Luo, Yinbin Ma, Quanyu Zhu +20 more · 2026-05-11 · 5

Read more → Original ↗

Research Papers arxiv

This paper formalizes probabilistic safety shielding for autonomous agents in MD...

This paper formalizes probabilistic safety shielding for autonomous agents in MDPs, proving impossibility of classical guarantees while providing weaker but practical alternatives.

Linus Heck, Filip Macák, Roman Andriushchenko +2 more · 2026-05-11 · 5

Read more → Original ↗

Research Papers arxiv

A training-free diagnostic framework for on-policy distillation analyzes per-tok...

A training-free diagnostic framework for on-policy distillation analyzes per-token supervision signals to clarify when teacher distillation helps or hurts reasoning model training.

Mohammadreza Armandpour, Fatih Ilhan, David Harrison +6 more · 2026-05-11 · 6

Read more → Original ↗

Research Papers arxiv

DataMaster is an autonomous agent that handles the full data engineering pipelin...

DataMaster is an autonomous agent that handles the full data engineering pipeline for ML—discovery, selection, cleaning, and transformation—without modifying the learning algorithm.

Yaxin Du, Xiyuan Yang, Zhifan Zhou +12 more · 2026-05-11 · 7

Read more → Original ↗

Agent Infrastructure arxiv

This paper argues that AI agents built on rapid on-the-fly synthesis bypass rigo...

This paper argues that AI agents built on rapid on-the-fly synthesis bypass rigorous software engineering practices, proposing an AI Workflow Store paradigm to embed SE discipline into agentic systems.

Roxana Geambasu, Mariana Raykova, Pierre Tholoniat +3 more · 2026-05-11 · 7

Read more → Original ↗

Agent Infrastructure arxiv

Shepherd is a meta-agent runtime that records typed execution traces in a Git-li...

Shepherd is a meta-agent runtime that records typed execution traces in a Git-like structure, enabling fast forking and replay of agent states, and significantly improving pair coding pass rates via runtime intervention.

Simon Yu, Derek Chong, Ananjan Nandi +4 more · 2026-05-11 · 8

Read more → Original ↗

Research Papers arxiv

A confidence-guided diffusion augmentation framework synthesizes training data f...

A confidence-guided diffusion augmentation framework synthesizes training data for handwritten Bangla compound character recognition, improving generalization across writing styles.

Md. Sultan Al Rayhan, Maheen Islam · 2026-05-11 · 3

Read more → Original ↗

Research Papers arxiv

A neural exponential tilting framework enables scalable variational inference fo...

A neural exponential tilting framework enables scalable variational inference for Lévy-driven SDEs, capturing heavy-tailed and jump phenomena beyond Gaussian assumptions.

Yaman Kindap, Manfred Opper, Benjamin Dupuis +2 more · 2026-05-11 · 4

Read more → Original ↗

Research Papers arxiv

ELF proposes continuous-space diffusion language models using Flow Matching in e...

ELF proposes continuous-space diffusion language models using Flow Matching in embedding space, showing competitive performance with minimal adaptation from the discrete token domain.

Keya Hu, Linlu Qiu, Yiyang Lu +5 more · 2026-05-11 · 7

Read more → Original ↗

Research Papers hackernews

AI agents discovered a reasoning strategy that reduces LLM token usage by 70%, p...

AI agents discovered a reasoning strategy that reduces LLM token usage by 70%, potentially significant for cost and efficiency optimization.

steveharing1 · 2026-05-12 · 8

Read more → Original ↗

Industry News hackernews

A developer shares Origami, a terminal workspace manager built with AI assistanc...

A developer shares Origami, a terminal workspace manager built with AI assistance, offering a grounded perspective on AI coding tools as accelerators rather than replacements.

uniqid · 2026-05-11 · 3

Read more → Original ↗

Research Papers hackernews

VibeServe explores whether AI agents can autonomously design and build bespoke L...

VibeServe explores whether AI agents can autonomously design and build bespoke LLM serving infrastructure, probing the limits of agentic software engineering.

matt_d · 2026-05-11 · 6

Read more → Original ↗

Agent Infrastructure hackernews

Graft introduces semantic memory for AI agents that operates without requiring a...

Graft introduces semantic memory for AI agents that operates without requiring an LLM, offering a lightweight and cost-effective memory layer.

AEndrix03 · 2026-05-12 · 7

Read more → Original ↗

Model Releases hackernews

JetBrains launched Junie, an LLM-agnostic AI coding agent integrated into their ...

JetBrains launched Junie, an LLM-agnostic AI coding agent integrated into their IDE ecosystem, broadening developer tooling options.

dude250711 · 2026-05-11 · 6

Read more → Original ↗

Industry News openai_blog

ChatGPT saw its fastest Q1 2026 growth among users over 35 with more balanced ge...

ChatGPT saw its fastest Q1 2026 growth among users over 35 with more balanced gender demographics, indicating mainstream AI adoption beyond early adopters.

OpenAI Blog · 2026-05-11 · 5

Read more → Original ↗

Agent Infrastructure @llama_index

sandboxed-lit is a Rust CLI agent combining LiteParse for multi-format document ...

sandboxed-lit is a Rust CLI agent combining LiteParse for multi-format document parsing with a secure Bash sandbox, enabling safe and powerful file-handling agents.

llama_index · 2026-05-11 · 6

Read more → Original ↗

Industry News @ArizeAI

OpenAI's Stuart Sy will present on a voice-of-the-customer agent that distills m...

OpenAI's Stuart Sy will present on a voice-of-the-customer agent that distills millions of customer interactions into actionable insights at Arize's Observe conference.

ArizeAI · 2026-05-11 · 5

Read more → Original ↗

Agent Infrastructure @ArizeAI

Arize Phoenix is evolving beyond human-facing observability into a context platf...

Arize Phoenix is evolving beyond human-facing observability into a context platform accessible to both humans and agents for building AI-native software.

ArizeAI · 2026-05-11 · 6

Read more → Original ↗

Agent Infrastructure @ArizeAI

Arize argues that AI observability must shift from human-readable dashboards to ...

Arize argues that AI observability must shift from human-readable dashboards to API/CLI and agent-facing interfaces as agents increasingly consume operational context.

ArizeAI · 2026-05-11 · 6

Read more → Original ↗

Agent Infrastructure @ArizeAI

Arize AI proposes that AI observability is evolving into a collaborative context...

Arize AI proposes that AI observability is evolving into a collaborative context platform where both humans and agents can debug and improve AI systems together.

ArizeAI · 2026-05-11 · 5

Read more → Original ↗

Industry News @ArizeAI

Arize AI is partnering with Google Cloud for the Rapid Agent Hackathon, focusing...

Arize AI is partnering with Google Cloud for the Rapid Agent Hackathon, focusing on bridging the gap between agents that demo well and agents that execute reliably in production.

ArizeAI · 2026-05-11 · 4

Read more → Original ↗

Agent Infrastructure @langfuse

Langfuse now supports running LLM experiments in CI/CD pipelines via a GitHub Ac...

Langfuse now supports running LLM experiments in CI/CD pipelines via a GitHub Action, enabling teams to catch quality regressions and gate releases on evaluation metrics.

langfuse · 2026-05-12 · 7

Read more → Original ↗

Research Papers @GoogleDeepMind

Google DeepMind collaborated with The Sainsbury Lab on AI-guided discovery of at...

Google DeepMind collaborated with The Sainsbury Lab on AI-guided discovery of atypical protein assemblies, advancing AI applications in structural biology.

GoogleDeepMind · 2026-05-11 · 5

Read more → Original ↗

Industry News @OpenAI

OpenAI shared a link with no accompanying text, providing no analyzable content.

OpenAI · 2026-05-11 · 1

Read more → Original ↗

Model Releases @OpenAI

OpenAI announced Daybreak, a security automation platform for detecting, validat...

OpenAI announced Daybreak, a security automation platform for detecting, validating, and responding to threats using frontier AI models.

OpenAI · 2026-05-11 · 7

Read more → Original ↗

Model Releases @OpenAI

OpenAI launched Daybreak, a cyber defense platform combining its most capable mo...

OpenAI launched Daybreak, a cyber defense platform combining its most capable models and Codex to help security teams accelerate threat detection and continuously secure software.

OpenAI · 2026-05-11 · 8

Read more → Original ↗

Industry News @AnthropicAI

Anthropic released Claude's Constitution as an audiobook narrated by authors Ama...

Anthropic released Claude's Constitution as an audiobook narrated by authors Amanda Askell and Joe Carlsmith, including discussion of the philosophies behind the document and how it may evolve.

AnthropicAI · 2026-05-11 · 5

Read more → Original ↗

Research Papers arxiv

Researchers use probing and activation patching to locate where language models ...

Researchers use probing and activation patching to locate where language models form internal representations of future tokens, finding that planning signals are linearly decodable and scale-dependent, with only Gemma-3-27B causally relying on this encoding.

Nicole Ma, Nick Rui · 2026-05-08 · 7

Read more → Original ↗

Research Papers arxiv

Dooly is a configuration-agnostic LLM inference simulator that avoids redundant ...

Dooly is a configuration-agnostic LLM inference simulator that avoids redundant re-profiling by exploiting structural redundancy across model configurations, making hardware and serving engine exploration significantly cheaper.

Joon Ha Kim, Geon-Woo Kim, Anoop Rachakonda +1 more · 2026-05-08 · 5

Read more → Original ↗

Research Papers arxiv

NIST researchers propose a structured methodology for AI evaluation scenarios gr...

NIST researchers propose a structured methodology for AI evaluation scenarios grounded in real-world use cases, promoting methodological transparency and human-centered design to enable apples-to-apples comparisons across AI benchmarks.

Yee-Yin Choong, Kristen Greene, Alice Qian +6 more · 2026-05-08 · 5

Read more → Original ↗

Research Papers arxiv

Tool selection in LLM agents is linearly readable and steerable via internal act...

Tool selection in LLM agents is linearly readable and steerable via internal activations, enabling 77-100% accuracy in switching tool choices and allowing error prediction before execution across 12 models.

Zekun Wu, Ze Wang, Seonglae Cho +4 more · 2026-05-08 · 8

Read more → Original ↗

Research Papers arxiv

PSP-HDC applies graph-structured hyperdimensional computing to process-structure...

PSP-HDC applies graph-structured hyperdimensional computing to process-structure-property prediction in materials science, achieving data-efficient and explainable results where conventional ML fails due to sparse data.

Jingzhan Ge, Ajeeth Vellore, Ajinkya Palwe +5 more · 2026-05-08 · 3

Read more → Original ↗

Research Papers arxiv

A probabilistic framework for abductive commonsense reasoning is proposed that e...

A probabilistic framework for abductive commonsense reasoning is proposed that explicitly models variation in human commonsense beliefs, moving beyond binary truth assumptions in neurosymbolic LLM systems.

Joseph Cotnareanu, Chiara Roverato, Han Zhou +3 more · 2026-05-08 · 5

Read more → Original ↗

Research Papers arxiv

A position paper auditing 30 mechanistic interpretability studies finds that cau...

A position paper auditing 30 mechanistic interpretability studies finds that causal claims consistently lack explicit identification assumptions, with validation metrics incorrectly treated as causal evidence.

Zezheng Lin, Fengming Liu · 2026-05-08 · 6

Read more → Original ↗

Research Papers arxiv

RL-trained CLI agents are studied with a focus on structured action credit assig...

RL-trained CLI agents are studied with a focus on structured action credit assignment and selective observation, addressing two core bottlenecks: evidence localization in large codebases and sparse reward attribution over long trajectories.

Haoyang Su, Ying Wen · 2026-05-08 · 7

Read more → Original ↗

Research Papers arxiv

Frontier large reasoning models (LRMs) are evaluated against human game learners...

Frontier large reasoning models (LRMs) are evaluated against human game learners using behavioral data and fMRI recordings, jointly assessing gameplay performance, learning behavior alignment, and brain activity prediction.

Botos Csaba, Sreejan Kumar, Austin Tudor David Andrews +6 more · 2026-05-08 · 6

Read more → Original ↗

Research Papers arxiv

A parameter reconstruction algorithm for spiking neural networks is proposed tha...

A parameter reconstruction algorithm for spiking neural networks is proposed that avoids surrogate gradient approximation errors by extending convexification theory to parallel recurrent threshold networks.

Himanshu Udupi, Xiaocong Yang, ChengXiang Zhai · 2026-05-08 · 4

Read more → Original ↗

Research Papers arxiv

MPD²-Router introduces a mask-aware multi-expert learning-to-defer framework for...

MPD²-Router introduces a mask-aware multi-expert learning-to-defer framework for glaucoma screening that routes uncertain cases to appropriate human experts while enforcing availability constraints and handling workload imbalance.

Wenxin Zhan · 2026-05-08 · 4

Read more → Original ↗

Research Papers arxiv

GraphDPO generalizes Direct Preference Optimization to operate over directed acy...

GraphDPO generalizes Direct Preference Optimization to operate over directed acyclic preference graphs from ranked rollouts, better exploiting multi-response training data and avoiding instability from collapsing rankings into pairs.

Ning Liu, Chuanneng Sun, Kristina Klinkner +1 more · 2026-05-08 · 7

Read more → Original ↗

Research Papers arxiv

SCOPE is a skill orchestration framework for complex text-to-image generation th...

SCOPE is a skill orchestration framework for complex text-to-image generation that maintains semantic commitments in a structured specification throughout the full generation lifecycle to reduce conceptual drift.

Tianfei Ren, Zhipeng Yan, Yiming Zhao +13 more · 2026-05-08 · 5

Read more → Original ↗

Research Papers arxiv

Fast Byte Latent Transformer introduces diffusion-based parallel byte generation...

Fast Byte Latent Transformer introduces diffusion-based parallel byte generation and speculative decoding extensions to address the slow autoregressive bottleneck of byte-level language models.

Julie Kallini, Artidoro Pagnoni, Tomasz Limisiewicz +5 more · 2026-05-08 · 7

Read more → Original ↗

Research Papers arxiv

CA-SQL dynamically scales solution space exploration based on estimated query co...

CA-SQL dynamically scales solution space exploration based on estimated query complexity and uses prompt seeding via evolutionary principles to improve LLM performance on hard Text-to-SQL benchmarks.

James Petullo, Nianwen Xue · 2026-05-08 · 5

Read more → Original ↗

Research Papers arxiv

Expanding LLM context windows in multi-agent social dilemmas systematically degr...

Expanding LLM context windows in multi-agent social dilemmas systematically degrades cooperation across 7 models and 4 games—termed the 'memory curse'—driven by erosion of forward-looking intent rather than increased paranoia.

Jiayuan Liu, Tianqin Li, Shiyi Du +7 more · 2026-05-08 · 8

Read more → Original ↗

Research Papers arxiv

Rubric-grounded RL decomposes rewards into weighted, verifiable criteria scored ...

Rubric-grounded RL decomposes rewards into weighted, verifiable criteria scored by a frozen LLM judge, providing partial-credit optimization signals that improve generalizable reasoning over binary or holistic rewards.

Manish Bhattarai, Ismael Boureima, Nishath Rajiv Ranasinghe +2 more · 2026-05-08 · 7

Read more → Original ↗

Research Papers arxiv

Flow-OPD is the first post-training framework integrating on-policy distillation...

Flow-OPD is the first post-training framework integrating on-policy distillation into flow matching text-to-image models, using specialized teacher models and a two-stage strategy to mitigate reward hacking and the seesaw effect.

Zhen Fang, Wenxuan Huang, Yu Zeng +8 more · 2026-05-08 · 6

Read more → Original ↗

Research Papers arxiv

VecCISC improves confidence-informed self-consistency by clustering reasoning tr...

VecCISC improves confidence-informed self-consistency by clustering reasoning traces to reduce redundant critic LLM calls, lowering inference cost while maintaining or improving accuracy on majority voting.

James Petullo, Sonny George, Dylan Cashman +1 more · 2026-05-08 · 6

Read more → Original ↗

Research Papers arxiv

EmambaIR applies visual state space models (Mamba) to event-guided image reconst...

EmambaIR applies visual state space models (Mamba) to event-guided image reconstruction, combining sparse cross-modal attention with linear complexity to outperform CNN and ViT baselines at high resolutions.

Wei Yu, Yunhang Qian · 2026-05-08 · 4

Read more → Original ↗

Research Papers hackernews

AI agents are showing performance gains from Long Context Models (LCM), enabling...

AI agents are showing performance gains from Long Context Models (LCM), enabling more specialized and capable applications that leverage extended context windows.

sebastianperezr · 2026-05-11 · 6

Read more → Original ↗

Agent Infrastructure hackernews

WUPHF is an open-source, local-first multi-agent framework that prevents context...

WUPHF is an open-source, local-first multi-agent framework that prevents context drift across agent handoffs using a shared markdown+git wiki and cross-agent peer review rather than just shared memory.

najmuzzaman · 2026-05-11 · 7

Read more → Original ↗

Industry News google_ai

Google has integrated an AI-powered experience into Google Finance, though detai...

Google has integrated an AI-powered experience into Google Finance, though details of the specific features or capabilities are not provided in this post.

Google AI Blog · 2026-05-11 · 3

Read more → Original ↗

Industry News openai_blog

OpenAI has launched DeployCo, an enterprise-focused deployment company designed ...

OpenAI has launched DeployCo, an enterprise-focused deployment company designed to help organizations move frontier AI from experimentation into production with measurable business outcomes.

OpenAI Blog · 2026-05-11 · 8

Read more → Original ↗

Industry News openai_blog

OpenAI is launching the OpenAI Campus Network to connect student clubs globally ...

OpenAI is launching the OpenAI Campus Network to connect student clubs globally with AI tools and resources for building campus AI communities.

OpenAI Blog · 2026-05-11 · 2

Read more → Original ↗

Industry News openai_blog

OpenAI outlines a framework for how enterprises can scale AI adoption through tr...

OpenAI outlines a framework for how enterprises can scale AI adoption through trust-building, governance structures, deliberate workflow design, and maintaining quality at scale.

OpenAI Blog · 2026-05-11 · 5

Read more → Original ↗

Industry News @OpenAI

OpenAI is acquiring Tomoro to immediately staff its new Deployment Company with ...

OpenAI is acquiring Tomoro to immediately staff its new Deployment Company with 150 experienced Forward Deployed Engineers and Deployment Specialists from day one.

OpenAI · 2026-05-11 · 7

Read more → Original ↗

Industry News @OpenAI

OpenAI officially launched the OpenAI Deployment Company, a majority-owned subsi...

OpenAI officially launched the OpenAI Deployment Company, a majority-owned subsidiary that unites 19 investment firms, consultancies, and system integrators to help businesses deploy frontier AI to production.

OpenAI · 2026-05-11 · 8

Read more → Original ↗

2026-W19

All Posts This Week