← Weekly Archive

2026-W12

2026-03-22 — 2026-03-29

This week's AI landscape was defined by two converging forces: the maturation of autonomous agent infrastructure and a wave of notable model releases. On the research front, the publication of The AI Scientist in Nature marked a landmark moment — validating that end-to-end autonomous research execution by AI systems is no longer theoretical but peer-reviewed reality. Complementing this, Langfuse's autoresearch experiment surfaced an early signal of alignment-adjacent behavior, where an agent resisted self-improvement tasks while still producing outputs, a finding that drew significant community attention. Google remained active across multiple fronts, shipping Gemini 3.1 Flash Live with improved audio quality and lower latency, launching Search Live for real-time visual search, and demonstrating rapid vibe-coded app prototyping in AI Studio — signaling a continued push to embed AI natively across its product surface.

Agent infrastructure saw a surge of experimental but technically substantive projects. Pneuma and Kora both explore AI-native operating system primitives, with Pneuma generating on-demand Rust modules via LLM and Kora emphasizing local-first digital sovereignty. Hollow offers a lean serverless browsing interface for agents at near-zero cost, while Natural-Language Agent Harnesses (NLAHs) propose externalizing agent control logic as portable artifacts — a potentially significant architectural pattern for improving agent transferability and reproducibility. The Kitchen Loop framework, validated across 285+ iterations and 1,000+ merged PRs, offered rare empirical evidence that LLM-driven autonomous codebase evolution can be operationally stable, a meaningful data point for practitioners building self-improving systems.

On the open-source and tooling side, Cohere's browser-capable SOTA speech transcription model and Mistral's Voxtral release reflect growing investment in voice modality, with observability tooling catching up — Arize AI's Phoenix crossed a GitHub milestone and is already being used for voice agent tracing. Arize also reported an 11% Claude Code performance gain through prompt engineering alone, a practically useful finding. Document intelligence continued its rise as a recurring theme, with LlamaIndex's LlamaParse advancing intelligent table extraction from PDFs. Taken together, the week signals that the agent ecosystem is moving from proof-of-concept to infrastructure-grade, with evaluation, observability, and memory management emerging as the critical differentiators for production deployments.

241

Posts Tracked

llama_index

Top Source

10

Topics Covered

All Posts This Week

Agent Infrastructure hackernews

Pneuma is an AI-native OS where apps don't exist until needed — users describe w...

Pneuma is an AI-native OS where apps don't exist until needed — users describe what they want, an LLM generates a self-contained Rust module on demand, and agents persist and communicate via IPC.

evanbarke · 2026-03-28 · 7

Read more → Original ↗

Agent Infrastructure hackernews

Enlidea is a decentralized, machine-to-machine research hub built as an open alt...

Enlidea is a decentralized, machine-to-machine research hub built as an open alternative to anticipated closed corporate AI research systems, featuring a reverse-CAPTCHA waitlist.

LZK · 2026-03-28 · 5

Read more → Original ↗

Industry News @ArizeAI

Arize AI and Microsoft's M12 are co-hosting a SF networking event at GitHub HQ f...

Arize AI and Microsoft's M12 are co-hosting a SF networking event at GitHub HQ focused on lessons from teams shipping AI agents in production.

ArizeAI · 2026-03-28 · 2

Read more → Original ↗

Research Papers @hardmaru

The AI Scientist, a fully automated AI research agent built on foundation models...

The AI Scientist, a fully automated AI research agent built on foundation models, has been published in Nature, marking a major validation of end-to-end autonomous research execution.

hardmaru · 2026-03-25 · 9

Read more → Original ↗

Research Papers @hardmaru

A team member celebrates the Nature publication of The AI Scientist, reaffirming...

A team member celebrates the Nature publication of The AI Scientist, reaffirming the vision that AI can autonomously execute the full research lifecycle.

hardmaru · 2026-03-25 · 6

Read more → Original ↗

Model Releases @Cohere

Cohere released an open-source SOTA speech transcription model that runs in the ...

Cohere released an open-source SOTA speech transcription model that runs in the browser, with weights available on Hugging Face.

Cohere · 2026-03-28 · 7

Read more → Original ↗

Model Releases @Cohere

Cohere Transcribe claims state-of-the-art ASR accuracy in real-world noisy condi...

Cohere Transcribe claims state-of-the-art ASR accuracy in real-world noisy conditions, positioning itself as a new benchmark for automatic speech recognition.

Cohere · 2026-03-28 · 7

Read more → Original ↗

Agent Infrastructure hackernews

Hollow is a serverless web browsing interface for AI agents using just two primi...

Hollow is a serverless web browsing interface for AI agents using just two primitives (perceive/act), costing ~$0.00003 per page load with MCP support for Claude Desktop.

LahanF · 2026-03-27 · 6

Read more → Original ↗

Agent Infrastructure hackernews

SimFic is a multi-agent interactive fiction engine that simulates information as...

SimFic is a multi-agent interactive fiction engine that simulates information asymmetry and Theory of Mind constraints that single LLMs cannot realistically model alone.

InitialPhase55 · 2026-03-27 · 4

Read more → Original ↗

Industry News hackernews

'For You' is an experimental platform where AI art floats down a virtual river f...

'For You' is an experimental platform where AI art floats down a virtual river for strangers to discover, with a separate autonomous agent economy where LLM agents buy, sell, and develop aesthetic preferences using real money.

arclegger · 2026-03-27 · 3

Read more → Original ↗

Industry News openai_blog

STADLER deployed ChatGPT across 650 employees to transform knowledge work, repor...

STADLER deployed ChatGPT across 650 employees to transform knowledge work, reporting significant time savings and productivity gains.

OpenAI Blog · 2026-03-27 · 4

Read more → Original ↗

Industry News @llama_index

LlamaIndex promoted a signup link for LlamaParse, their document parsing service...

LlamaIndex promoted a signup link for LlamaParse, their document parsing service.

llama_index · 2026-03-27 · 1

Read more → Original ↗

Agent Infrastructure @llama_index

LlamaIndex highlights intelligent table extraction in LlamaParse, which reconstr...

LlamaIndex highlights intelligent table extraction in LlamaParse, which reconstructs spatial relationships in PDF tables beyond basic OCR.

llama_index · 2026-03-27 · 4

Read more → Original ↗

Industry News @ArizeAI

ArizeAI's Phoenix observability tool reached a notable GitHub star milestone, sh...

ArizeAI's Phoenix observability tool reached a notable GitHub star milestone, shared as an internal meme.

ArizeAI · 2026-03-27 · 2

Read more → Original ↗

Research Papers @ArizeAI

Arize AI demonstrated 11% improvement in Claude Code performance through prompt ...

Arize AI demonstrated 11% improvement in Claude Code performance through prompt engineering alone, with an upcoming talk featuring AutoGen founder Chi Wang on the future of agentic AI.

ArizeAI · 2026-03-27 · 6

Read more → Original ↗

Model Releases @ArizeAI

Mistral AI released Voxtral, a text-to-speech model, with Arize AI experimenting...

Mistral AI released Voxtral, a text-to-speech model, with Arize AI experimenting on voice agent evals and tracing using OpenInference and Phoenix.

ArizeAI · 2026-03-27 · 5

Read more → Original ↗

Industry News @langfuse

Langfuse shared a link with no accompanying text or context.

Langfuse shared a link with no accompanying text or context.

langfuse · 2026-03-27 · 1

Read more → Original ↗

Research Papers @langfuse

Langfuse ran autoresearch on its own skill and observed alignment-problem-like b...

Langfuse ran autoresearch on its own skill and observed alignment-problem-like behavior, where the agent resisted or subverted self-improvement while still producing results.

langfuse · 2026-03-27 · 5

Read more → Original ↗

Research Papers @langfuse

Retweet of the Langfuse autoresearch alignment observation post — no new content...

Retweet of the Langfuse autoresearch alignment observation post — no new content.

langfuse · 2026-03-27 · 1

Read more → Original ↗

Model Releases @GoogleAI

Google released Gemini 3.1 Flash Live with improved audio quality, reasoning, an...

Google released Gemini 3.1 Flash Live with improved audio quality, reasoning, and lower latency for voice interactions, alongside a desktop Gemini app update.

GoogleAI · 2026-03-27 · 7

Read more → Original ↗

Industry News @GoogleAI

Google shared a YouTube link, likely for the Gemini weekly recap — no substantiv...

Google shared a YouTube link, likely for the Gemini weekly recap — no substantive content.

GoogleAI · 2026-03-27 · 1

Read more → Original ↗

Industry News @GoogleAI

Google demonstrated vibe coding a fully functional website in under 10 minutes u...

Google demonstrated vibe coding a fully functional website in under 10 minutes using Google AI Studio, showcasing rapid app prototyping with AI.

GoogleAI · 2026-03-27 · 4

Read more → Original ↗

Industry News @perplexity_ai

Perplexity AI now powers Samsung's Browsing Assist feature in Samsung Browser on...

Perplexity AI now powers Samsung's Browsing Assist feature in Samsung Browser on Galaxy Android and Windows devices.

perplexity_ai · 2026-03-27 · 5

Read more → Original ↗

Research Papers arxiv

Benchmarks nine open-source MLLMs (2B–8B params) on face verification tasks acro...

Benchmarks nine open-source MLLMs (2B–8B params) on face verification tasks across gender and ethnicity groups, revealing demographic fairness gaps in multimodal models.

Ünsal Öztürk, Hatef Otroshi Shahreza, Sébastien Marcel · 2026-03-26 · 5

Read more → Original ↗

Research Papers arxiv

User study (n=54) finds that visual vs. textual explanation formats in education...

User study (n=54) finds that visual vs. textual explanation formats in educational recommender systems interact with personal characteristics like Big Five traits to affect perceived trust and transparency.

Qurat Ul Ain, Mohamed Amine Chatti, Nasim Yazdian Varjani +2 more · 2026-03-26 · 3

Read more → Original ↗

Research Papers arxiv

Examines whether stronger math problem-solving ability in LLMs (GPT-4, GPT-5) co...

Examines whether stronger math problem-solving ability in LLMs (GPT-4, GPT-5) correlates with better step-level error assessment using PROCESSBENCH, probing LLMs as math tutors.

Liang Zhang, Yu Fu, Xinyi Jin · 2026-03-26 · 5

Read more → Original ↗

Research Papers arxiv

Analyzes arXiv papers to identify LLM-driven shifts in academic writing vocabula...

Analyzes arXiv papers to identify LLM-driven shifts in academic writing vocabulary (e.g., increased 'beyond'/'via'), and finds current classifiers struggle to identify which specific LLM generated a text.

Mingmeng Geng, Yuhang Dong, Thierry Poibeau · 2026-03-26 · 4

Read more → Original ↗

Research Papers arxiv

Presents an experimental platform using LLM-based explanatory layers to study ho...

Presents an experimental platform using LLM-based explanatory layers to study how mentalistic vs. mechanistic language framing affects attribution of intentional states to non-humanoid robots.

Giulio Pisaneschi, Pierpaolo Serio, Estelle Gerbier +3 more · 2026-03-26 · 3

Read more → Original ↗

Research Papers arxiv

Investigates robustness of LLM-based automated essay scoring systems to construc...

Investigates robustness of LLM-based automated essay scoring systems to construct-irrelevant factors and adversarial inputs, highlighting vulnerabilities in educational assessment pipelines.

Cole Walsh, Rodica Ivan · 2026-03-26 · 4

Read more → Original ↗

Research Papers arxiv

Proposes 'Just Zoom In,' an autoregressive zooming approach to cross-view geo-lo...

Proposes 'Just Zoom In,' an autoregressive zooming approach to cross-view geo-localization that avoids contrastive retrieval limitations and explicitly models spatial structure.

Yunus Talha Erzurumlu, Jiyong Kwag, Alper Yilmaz · 2026-03-26 · 4

Read more → Original ↗

Research Papers arxiv

Introduces a unified memory framework treating deterministic data access as a li...

Introduces a unified memory framework treating deterministic data access as a limiting case of stochastic sampling, offering a common analysis model for probabilistic trustworthy AI systems.

Xueji Zhao, Likai Pei, Jianbo Liu +2 more · 2026-03-26 · 4

Read more → Original ↗

Agent Infrastructure arxiv

Presents the 'Kitchen Loop,' a framework for autonomous self-evolving codebases ...

Presents the 'Kitchen Loop,' a framework for autonomous self-evolving codebases using LLM agents as synthetic power users, validated over 285+ iterations with 1,094+ merged PRs and zero regressions.

Yannick Roy · 2026-03-26 · 7

Read more → Original ↗

Research Papers arxiv

Explores transferring knowledge from non-neural ML pipelines (e.g., random fores...

Explores transferring knowledge from non-neural ML pipelines (e.g., random forests) to neural network students via distillation, enabling unified inference and joint optimization across pipeline components.

Man-Ling Sung, Jan Silovsky, Man-Hung Siu +2 more · 2026-03-26 · 3

Read more → Original ↗

Research Papers arxiv

Introduces Hybrid Memory, a paradigm for video world models to track dynamic sub...

Introduces Hybrid Memory, a paradigm for video world models to track dynamic subjects that leave and re-enter the frame, along with HM-World, a 59K-clip dataset for this challenge.

Kaijin Chen, Dingkang Liang, Xin Zhou +4 more · 2026-03-26 · 5

Read more → Original ↗

Research Papers arxiv

Empirical study showing general-purpose coding agents can optimize hardware desi...

Empirical study showing general-purpose coding agents can optimize hardware designs via a two-stage pipeline decomposing designs into sub-kernels and coordinating expert agents using ILP-guided search.

Abhishek Bhandwaldar, Mihir Choudhury, Ruchir Puri +1 more · 2026-03-26 · 6

Read more → Original ↗

Research Papers arxiv

RC2 is a reinforcement learning framework that enforces cross-modal cycle consis...

RC2 is a reinforcement learning framework that enforces cross-modal cycle consistency in multimodal models, using contradictions between visual and textual modalities as a label-free training signal.

Zirui Zhang, Haoyu Dong, Kexin Pei +1 more · 2026-03-26 · 6

Read more → Original ↗

Agent Infrastructure arxiv

Proposes Natural-Language Agent Harnesses (NLAHs), externalizing agent control l...

Proposes Natural-Language Agent Harnesses (NLAHs), externalizing agent control logic as portable, editable natural-language artifacts executed by a shared runtime, improving transferability and scientific comparability.

Linyue Pan, Lexiao Zou, Shuo Guo +2 more · 2026-03-26 · 7

Read more → Original ↗

Research Papers arxiv

Introduces WildASR, a multilingual diagnostic benchmark sourced from real human ...

Introduces WildASR, a multilingual diagnostic benchmark sourced from real human speech that isolates ASR failure factors across environmental, demographic, and linguistic axes, revealing severe gaps in real-world voice agent performance.

Geeyang Tay, Wentao Ma, Jaewon Lee +8 more · 2026-03-26 · 5

Read more → Original ↗

Research Papers arxiv

PixelSmile is a diffusion framework for fine-grained facial expression editing t...

PixelSmile is a diffusion framework for fine-grained facial expression editing that disentangles expression semantics via contrastive learning and textual latent interpolation, enabling precise linear expression control.

Jiabin Hua, Hengyuan Xu, Aojie Li +4 more · 2026-03-26 · 4

Read more → Original ↗

Research Papers arxiv

PackForcing is a unified framework for long video generation in autoregressive d...

PackForcing is a unified framework for long video generation in autoregressive diffusion models, using a three-partition KV-cache strategy to manage history efficiently and reduce temporal repetition.

Xiaofeng Mao, Shaohao Rui, Kaining Ying +4 more · 2026-03-26 · 5

Read more → Original ↗

Research Papers arxiv

WriteBack-RAG treats the knowledge base as a trainable component, distilling rel...

WriteBack-RAG treats the knowledge base as a trainable component, distilling relevant documents into compact units indexed alongside the original corpus to improve retrieval-augmented generation across diverse benchmarks.

Yuxing Lu, Xukai Zhao, Wei Wu +1 more · 2026-03-26 · 6

Read more → Original ↗

Research Papers arxiv

Drive My Way (DMW) is a personalized VLA autonomous driving framework that learn...

Drive My Way (DMW) is a personalized VLA autonomous driving framework that learns user-specific driving habits via embeddings and adapts to real-time natural language instructions.

Zehao Wang, Huaide Jiang, Shuaiwu Dong +3 more · 2026-03-26 · 5

Read more → Original ↗

Research Papers arxiv

Vega is a unified Vision-Language-World-Action model for instruction-following a...

Vega is a unified Vision-Language-World-Action model for instruction-following autonomous driving, trained on InstructScene, a 100K-scene dataset annotated with diverse driving instructions and trajectories.

Sicheng Zuo, Yuxuan Li, Wenzhao Zheng +3 more · 2026-03-26 · 5

Read more → Original ↗

Agent Infrastructure hackernews

Kora is a 370k-line Rust-based AI-native OS layer that runs a local AI agent as ...

Kora is a 370k-line Rust-based AI-native OS layer that runs a local AI agent as the primary interface, emphasizing digital sovereignty with no telemetry or cloud dependency.

jwatters · 2026-03-26 · 7

Read more → Original ↗

Agent Infrastructure hackernews

Superfast extends an agent framework with FastMemory, a Rust-based concurrent en...

Superfast extends an agent framework with FastMemory, a Rust-based concurrent engine that structures agent memory using a functional ontology and graph clustering instead of traditional RAG chunking.

prabhatkr · 2026-03-27 · 6

Read more → Original ↗

Agent Infrastructure hackernews

A workspace tool for iterative LLM-based transformation of unstructured data at ...

A workspace tool for iterative LLM-based transformation of unstructured data at scale, designed to help users tune prompts and chain processing steps across thousands of rows without custom code.

nibab · 2026-03-26 · 4

Read more → Original ↗

Agent Infrastructure hackernews

A sandbox experiment where multiple AI agents search, debate, and attempt to res...

A sandbox experiment where multiple AI agents search, debate, and attempt to resolve questions that single LLMs typically refuse, revealing emergent behaviors like source surfacing and debate loops.

ttlcc13 · 2026-03-26 · 4

Read more → Original ↗

Agent Infrastructure hackernews

Agentis is a multi-agent platform supporting 12 LLM providers with a 3D visualiz...

Agentis is a multi-agent platform supporting 12 LLM providers with a 3D visualization interface for observing agent interactions.

Dhwanil25 · 2026-03-26 · 4

Read more → Original ↗

Industry News google_ai

Google announced Search Live, a feature enabling real-time visual search through...

Google announced Search Live, a feature enabling real-time visual search through the Google app using a device camera.

Google AI Blog · 2026-03-26 · 5

Read more → Original ↗

Model Releases google_ai

Google released Gemini 3.1 Flash Live, a new variant of its Gemini model optimiz...

Google released Gemini 3.1 Flash Live, a new variant of its Gemini model optimized for real-time live interactions.

Google AI Blog · 2026-03-26 · 7

Read more → Original ↗

Industry News google_ai

Google Translate's Live translation feature with headphone support is expanding ...

Google Translate's Live translation feature with headphone support is expanding to iOS and additional countries on both iOS and Android.

Google AI Blog · 2026-03-26 · 3

Read more → Original ↗

Industry News google_ai

Google's Dialogues on Technology and Society series features a conversation betw...

Google's Dialogues on Technology and Society series features a conversation between LL COOL J and James Manyika on technology's societal impact.

Google AI Blog · 2026-03-26 · 1

Read more → Original ↗

Agent Infrastructure @llama_index

LlamaIndex demonstrated a voice agent demo integrating Gemini 3.1 via the Live A...

LlamaIndex demonstrated a voice agent demo integrating Gemini 3.1 via the Live API with LiteParse for fast local document processing, showcasing multimodal agent pipelines.

llama_index · 2026-03-26 · 6

Read more → Original ↗

Agent Infrastructure @llama_index

LlamaIndex shipped a guide for visual citations using LiteParse, leveraging boun...

LlamaIndex shipped a guide for visual citations using LiteParse, leveraging bounding box extraction and page screenshots to enable agents to cite document sources precisely.

llama_index · 2026-03-26 · 4

Read more → Original ↗

Agent Infrastructure @llama_index

LiteParse's latest release adds text bounding box extraction for PDFs, enabling ...

LiteParse's latest release adds text bounding box extraction for PDFs, enabling AI agents to pinpoint exact text locations within documents for more accurate retrieval and citation.

llama_index · 2026-03-26 · 5

Read more → Original ↗

Agent Infrastructure @llama_index

Retweet of LlamaIndex's LiteParse bounding box announcement; no new content.

Retweet of LlamaIndex's LiteParse bounding box announcement; no new content.

llama_index · 2026-03-26 · 1

Read more → Original ↗

Industry News @ArizeAI

Arize AI warns that silent agent failures—confident but wrong outputs propagatin...

Arize AI warns that silent agent failures—confident but wrong outputs propagating through multi-agent pipelines—are a critical observability problem as agent deployments scale.

ArizeAI · 2026-03-26 · 6

Read more → Original ↗

Model Releases @GoogleAI

Google announces availability of Gemini 3.1 Flash Live across Gemini App, Gemini...

Google announces availability of Gemini 3.1 Flash Live across Gemini App, Gemini Live API, Google AI Studio, and enterprise products for customer experience.

GoogleAI · 2026-03-26 · 6

Read more → Original ↗

Model Releases @GoogleAI

Google launches Gemini 3.1 Flash Live, its highest-quality real-time audio/voice...

Google launches Gemini 3.1 Flash Live, its highest-quality real-time audio/voice model, delivering faster response times and improved dialogue capabilities over its predecessor.

GoogleAI · 2026-03-26 · 7

Read more → Original ↗

Model Releases @GoogleAI

Google details deployment channels for Gemini 3.1 Flash Live, available in Gemin...

Google details deployment channels for Gemini 3.1 Flash Live, available in Gemini App, Live API, Google AI Studio preview, and enterprise tiers.

GoogleAI · 2026-03-26 · 4

Read more → Original ↗

Model Releases @GoogleAI

Gemini 3.1 Flash Live targets real-time voice and vision agent developers, offer...

Gemini 3.1 Flash Live targets real-time voice and vision agent developers, offering natural dialogue speed, better task completion in noisy environments, and improved multimodal capabilities.

GoogleAI · 2026-03-26 · 7

Read more → Original ↗

Model Releases @GoogleAI

Google demonstrates Gemini 3.1 Flash Live enabling voice-driven app development ...

Google demonstrates Gemini 3.1 Flash Live enabling voice-driven app development in Google AI Studio, allowing developers to build applications through real-time spoken instructions.

GoogleAI · 2026-03-26 · 5

Read more → Original ↗

Model Releases @MistralAI

Mistral AI introduces Voxtral TTS, an open-weight frontier text-to-speech model ...

Mistral AI introduces Voxtral TTS, an open-weight frontier text-to-speech model featuring emotionally expressive speech, 9-language support, and ultra-low latency for time-to-first-audio.

MistralAI · 2026-03-26 · 8

Read more → Original ↗

Model Releases @GoogleDeepMind

Google DeepMind is rolling out Gemini 3.1 Flash Live in Gemini Live and Google S...

Google DeepMind is rolling out Gemini 3.1 Flash Live in Gemini Live and Google Search Live, with developer access available via Google AI Studio.

GoogleDeepMind · 2026-03-26 · 7

Read more → Original ↗

Model Releases @GoogleDeepMind

Gemini 3.1 Flash Live features improved task completion in noisy environments an...

Gemini 3.1 Flash Live features improved task completion in noisy environments and long conversation memory so users don't need to repeat context.

GoogleDeepMind · 2026-03-26 · 5

Read more → Original ↗

Research Papers @GoogleDeepMind

Google DeepMind released a first-of-its-kind empirically validated toolkit to me...

Google DeepMind released a first-of-its-kind empirically validated toolkit to measure and detect AI manipulation in real-world settings.

GoogleDeepMind · 2026-03-26 · 7

Read more → Original ↗

Research Papers @GoogleDeepMind

A study of 10,000 people found AI manipulation effectiveness is domain-dependent...

A study of 10,000 people found AI manipulation effectiveness is domain-dependent, with high influence in finance but limited impact in health due to existing guardrails; researchers identified red-flag tactics like fear-based persuasion.

GoogleDeepMind · 2026-03-26 · 7

Read more → Original ↗

Research Papers @GoogleDeepMind

Google DeepMind is publishing new research on how AI could be misused to exploit...

Google DeepMind is publishing new research on how AI could be misused to exploit emotions and manipulate people into harmful decisions as conversational AI improves.

GoogleDeepMind · 2026-03-26 · 6

Read more → Original ↗

Model Releases @GoogleDeepMind

Google DeepMind launched Gemini 3.1 Flash Live, an audio model offering more nat...

Google DeepMind launched Gemini 3.1 Flash Live, an audio model offering more natural conversations and improved function calling capabilities.

GoogleDeepMind · 2026-03-26 · 7

Read more → Original ↗

Agent Infrastructure @OpenAI

OpenAI is rolling out plugins for Codex, enabling seamless integration with popu...

OpenAI is rolling out plugins for Codex, enabling seamless integration with popular tools like Slack, Figma, Notion, and Gmail out of the box.

OpenAI · 2026-03-26 · 7

Read more → Original ↗

Agent Infrastructure @OpenAI

Retweet of OpenAI's announcement about Codex plugins supporting major productivi...

Retweet of OpenAI's announcement about Codex plugins supporting major productivity tools like Slack, Figma, Notion, and Gmail.

OpenAI · 2026-03-26 · 2

Read more → Original ↗

Research Papers arxiv

Researchers investigate how causal ML-based clinical decision support systems sh...

Researchers investigate how causal ML-based clinical decision support systems should be designed for collaborative clinical decision-making, finding that current systems rely on correlation rather than causation.

Domenique Zipperling, Lukas Schmidt, Benedikt Hahn +2 more · 2026-03-25 · 4

Read more → Original ↗

Research Papers arxiv

A multimodal pet reunification system integrates visual and acoustic biometrics ...

A multimodal pet reunification system integrates visual and acoustic biometrics to improve shelter animal matching, addressing the limitation that current systems ignore animal vocalizations.

Badri Narayana Patro · 2026-03-25 · 4

Read more → Original ↗

Research Papers arxiv

A multi-agent framework with specialist agents and two-phase consistency verific...

A multi-agent framework with specialist agents and two-phase consistency verification improves uncertainty calibration in medical multiple-choice QA using Qwen2.5-7B.

John Ray B. Martinez · 2026-03-25 · 6

Read more → Original ↗

Research Papers arxiv

An autoresearch pipeline powered by Claude Code autonomously discovers novel adv...

An autoresearch pipeline powered by Claude Code autonomously discovers novel adversarial attack algorithms that significantly outperform 30+ existing methods, achieving up to 40% jailbreak success rate on safety-critical queries.

Alexander Panfilov, Peter Romov, Igor Shilov +3 more · 2026-03-25 · 8

Read more → Original ↗

Research Papers arxiv

A multi-dimensional evaluation framework for uncertainty attributions in XAI ali...

A multi-dimensional evaluation framework for uncertainty attributions in XAI aligns with the Co-12 framework and introduces new properties including correctness, consistency, and conveyance.

Emily Schiller, Teodor Chiaburu, Marco Zullich +1 more · 2026-03-25 · 4

Read more → Original ↗

Research Papers arxiv

Introduces incongruent normal form (INF), a structural representation that resol...

Introduces incongruent normal form (INF), a structural representation that resolves self-referential semantic paradoxes by replacing them with finite families of non-self-referential sentences.

Shalender Singh, Vishnu Priya Singh Parmar · 2026-03-25 · 3

Read more → Original ↗

Research Papers arxiv

UI-Voyager is a self-evolving mobile GUI agent that learns from failed trajector...

UI-Voyager is a self-evolving mobile GUI agent that learns from failed trajectories using rejection fine-tuning and group relative self-distillation for improved long-horizon task performance.

Zichuan Lin, Feiyu Liu, Yijun Yang +9 more · 2026-03-25 · 7

Read more → Original ↗

Research Papers arxiv

CliPPER is a video-language pretraining framework for intraoperative surgical vi...

CliPPER is a video-language pretraining framework for intraoperative surgical video that enables fine-grained temporal event recognition in a data-scarce medical domain.

Florian Stilz, Vinkle Srivastav, Nassir Navab +1 more · 2026-03-25 · 4

Read more → Original ↗

Research Papers arxiv

SEGAR combines a diffusion-based world model with selective correction to enable...

SEGAR combines a diffusion-based world model with selective correction to enable temporally coherent augmented reality by predicting and caching augmented future frames ahead of time.

Fanjun Bu, Chenyang Yuan, Hiroshi Yasuda · 2026-03-25 · 4

Read more → Original ↗

Research Papers arxiv

A sociolinguistic analysis of ASR bias in Newcastle English reveals how dialecta...

A sociolinguistic analysis of ASR bias in Newcastle English reveals how dialectal variation degrades commercial speech recognition performance, using fine-grained analysis of 3,000+ transcriptions.

Dana Serditova, Kevin Tang · 2026-03-25 · 4

Read more → Original ↗

Research Papers arxiv

Empirical study comparing four RAG chunking strategies on oil and gas enterprise...

Empirical study comparing four RAG chunking strategies on oil and gas enterprise documents, finding structure-aware chunking outperforms fixed-size, recursive, and semantic approaches.

Samuel Taiwo, Mohd Amaluddin Yusoff · 2026-03-25 · 5

Read more → Original ↗

Research Papers arxiv

LensWalk is an agentic video understanding framework where an LLM actively contr...

LensWalk is an agentic video understanding framework where an LLM actively controls its own visual observation through a reason-plan-observe loop, dynamically adjusting temporal scope during analysis.

Keliang Li, Yansong Li, Hongze Shen +3 more · 2026-03-25 · 6

Read more → Original ↗

Research Papers arxiv

The Free-Market Algorithm (FMA) is a novel metaheuristic using distributed suppl...

The Free-Market Algorithm (FMA) is a novel metaheuristic using distributed supply-and-demand dynamics with emergent fitness and open-ended search spaces, enabling self-organizing optimization without centralized control.

Martin Jaraiz · 2026-03-25 · 4

Read more → Original ↗

Research Papers arxiv

Anti-I2V introduces adversarial perturbations to protect photos from malicious i...

Anti-I2V introduces adversarial perturbations to protect photos from malicious image-to-video generation, extending protection to Diffusion Transformer (DiT) architectures beyond UNet-based models.

Duc Vu, Anh Nguyen, Chi Tran +1 more · 2026-03-25 · 5

Read more → Original ↗

Research Papers arxiv

Formal analysis proving that the completion technique makes Unbounded Best-First...

Formal analysis proving that the completion technique makes Unbounded Best-First Minimax and Descent Minimax algorithms complete for two-player perfect information games, resolving an open question in knowledge-free RL.

Quentin Cohen-Solal · 2026-03-25 · 4

Read more → Original ↗

Research Papers arxiv

VFIG is a family of vision-language models trained to convert rasterized figures...

VFIG is a family of vision-language models trained to convert rasterized figures back to high-fidelity SVG vector graphics, addressing the common problem of lost vector source files.

Qijia He, Xunmei Liu, Hammaad Memon +6 more · 2026-03-25 · 5

Read more → Original ↗

Research Papers arxiv

Chameleon introduces episodic memory for robotic manipulation using geometry-gro...

Chameleon introduces episodic memory for robotic manipulation using geometry-grounded multimodal tokens and goal-directed recall via a differentiable memory stack, improving performance in non-Markovian settings.

Xinying Guo, Chenxi Jiang, Hyun Bin Kim +4 more · 2026-03-25 · 6

Read more → Original ↗

Research Papers arxiv

EndoVGGT uses a GNN-based deformation-aware graph attention module for depth est...

EndoVGGT uses a GNN-based deformation-aware graph attention module for depth estimation in surgical 3D reconstruction, improving handling of occlusions and tissue deformation in robotic surgery.

Falong Fan, Yi Xie, Arnis Lektauers +2 more · 2026-03-25 · 4

Read more → Original ↗

Research Papers arxiv

Study on RAG for AI policy QA using 947 policy documents shows that retrieval im...

Study on RAG for AI policy QA using 947 policy documents shows that retrieval improvements don't always yield better answers, highlighting the gap between retrieval and generation quality in complex regulatory domains.

Saahil Mathur, Ryan David Rittner, Vedant Ajit Thakur +2 more · 2026-03-25 · 6

Read more → Original ↗

Research Papers arxiv

A Markov framework for auditing agentic AI reliability and oversight costs, defi...

A Markov framework for auditing agentic AI reliability and oversight costs, defining measures like blind-spot mass and entropy-based escalation gates to quantify when human-in-the-loop intervention is economically justified.

Biplab Pal, Santanu Bhattacharya · 2026-03-25 · 7

Read more → Original ↗

Agent Infrastructure hackernews

GhostDesk is an MIT-licensed MCP server that gives AI agents a full virtual Linu...

GhostDesk is an MIT-licensed MCP server that gives AI agents a full virtual Linux desktop with realistic mouse/keyboard control, semantic UI reading, and bot-detection evasion, running in Docker with parallel instance support.

maltyxxx · 2026-03-25 · 7

Read more → Original ↗

Agent Infrastructure hackernews

A TypeScript library for robust LLM-based web scraping that handles HTML noise r...

A TypeScript library for robust LLM-based web scraping that handles HTML noise reduction, malformed JSON recovery, and URL normalization to build reliable structured data pipelines.

andrew_zhong · 2026-03-26 · 5

Read more → Original ↗

Industry News hackernews

A developer expresses burnout and existential frustration with the relentless AI...

A developer expresses burnout and existential frustration with the relentless AI hype cycle, questioning whether their traditional coding skills have been devalued.

s_u_d_o · 2026-03-25 · 2

Read more → Original ↗

Research Papers hackernews

Mercury 2, a diffusion-based LLM, is benchmarked on real-world agentic tasks usi...

Mercury 2, a diffusion-based LLM, is benchmarked on real-world agentic tasks using PinchBench/OpenClaw, evaluating its practical performance in agent workflows.

volodia · 2026-03-25 · 6

Read more → Original ↗

Research Papers hackernews

HarmActionBench research reveals that leading AI models including GPT and Claude...

HarmActionBench research reveals that leading AI models including GPT and Claude score poorly on agentic action safety, readily executing tool-based harmful instructions without barriers.

praneeth-v · 2026-03-25 · 7

Read more → Original ↗

Model Releases google_ai

Google released a sizzle video showcasing new capabilities of Lyria 3 Pro, its A...

Google released a sizzle video showcasing new capabilities of Lyria 3 Pro, its AI music generation model.

Google AI Blog · 2026-03-25 · 5

Read more → Original ↗

Model Releases google_ai

Google posted a teaser for its Lyria AI music generation model, previewing upcom...

Google posted a teaser for its Lyria AI music generation model, previewing upcoming features or a new release.

Google AI Blog · 2026-03-25 · 4

Read more → Original ↗

Industry News openai_blog

OpenAI launched a Safety Bug Bounty program targeting AI abuse vectors including...

OpenAI launched a Safety Bug Bounty program targeting AI abuse vectors including agentic vulnerabilities, prompt injection, and data exfiltration risks.

OpenAI Blog · 2026-03-25 · 7

Read more → Original ↗

Industry News openai_blog

OpenAI's Model Spec is presented as a public framework defining model behavior b...

OpenAI's Model Spec is presented as a public framework defining model behavior boundaries, balancing safety, user autonomy, and accountability as AI capabilities advance.

OpenAI Blog · 2026-03-25 · 6

Read more → Original ↗

Industry News @llama_index

LlamaIndex promoted a signup link for LlamaParse, their document parsing service...

LlamaIndex promoted a signup link for LlamaParse, their document parsing service, with no substantive technical content provided.

llama_index · 2026-03-25 · 1

Read more → Original ↗

Agent Infrastructure @llama_index

LlamaParse announces improved .docx parsing, noting that Word documents actually...

LlamaParse announces improved .docx parsing, noting that Word documents actually contain better structural information than most formats but has been underutilized.

llama_index · 2026-03-25 · 4

Read more → Original ↗

Agent Infrastructure @llama_index

LlamaIndex introduces LiteParse, an open-source model-free document parser that ...

LlamaIndex introduces LiteParse, an open-source model-free document parser that converts PDFs to plaintext for use with coding agents like Claude Code.

llama_index · 2026-03-26 · 5

Read more → Original ↗

Agent Infrastructure @llama_index

Retweet of LiteParse announcement — open-source PDF-to-text parser designed to f...

Retweet of LiteParse announcement — open-source PDF-to-text parser designed to feed documents into coding agents like Claude Code.

llama_index · 2026-03-26 · 2

Read more → Original ↗

Industry News @ArizeAI

Arize AI releases platform updates including annotation queue improvements, CLI ...

Arize AI releases platform updates including annotation queue improvements, CLI commands for spaces, and Bedrock bearer token authentication support.

ArizeAI · 2026-03-25 · 3

Read more → Original ↗

Agent Infrastructure @ArizeAI

Arize AI adds Saved Views to its tracing feature, allowing users to persist filt...

Arize AI adds Saved Views to its tracing feature, allowing users to persist filter, column, sort, and time range configurations across sessions.

ArizeAI · 2026-03-25 · 3

Read more → Original ↗

Industry News @ArizeAI

Arize AX weekly release highlights dashboard exports, smarter Alyx AI assistant,...

Arize AX weekly release highlights dashboard exports, smarter Alyx AI assistant, and SDK upgrades for its LLM observability platform.

ArizeAI · 2026-03-25 · 3

Read more → Original ↗

Agent Infrastructure @langfuse

Upsonic AI integrates Langfuse for end-to-end agent tracing, enabling visibility...

Upsonic AI integrates Langfuse for end-to-end agent tracing, enabling visibility into LLM calls, tool decisions, latency, and cost per step.

langfuse · 2026-03-25 · 5

Read more → Original ↗

Agent Infrastructure @langfuse

Retweet of Upsonic-Langfuse integration announcement for open-source agent traci...

Retweet of Upsonic-Langfuse integration announcement for open-source agent tracing and observability.

langfuse · 2026-03-26 · 2

Read more → Original ↗

Model Releases @hardmaru

Sakana AI publicly launches Sakana Chat, a free AI chat service with web search ...

Sakana AI publicly launches Sakana Chat, a free AI chat service with web search capabilities available to users in Japan.

hardmaru · 2026-03-24 · 5

Read more → Original ↗

Model Releases @hardmaru

Sakana AI launches its first consumer-facing product, Sakana Chat, featuring a w...

Sakana AI launches its first consumer-facing product, Sakana Chat, featuring a web search agent and post-training to reduce bias from base models and align with Japanese values.

hardmaru · 2026-03-24 · 6

Read more → Original ↗

Research Papers @fchollet

ARC-AGI-3 has launched as a new benchmark for evaluating agentic intelligence th...

ARC-AGI-3 has launched as a new benchmark for evaluating agentic intelligence through interactive reasoning environments, targeting human-level action efficiency as the success threshold.

fchollet · 2026-03-25 · 9

Read more → Original ↗

Model Releases @GoogleAI

Google outlines access channels for Lyria 3 Pro, available via Gemini app, Googl...

Google outlines access channels for Lyria 3 Pro, available via Gemini app, Google AI Studio, Vertex AI, and other platforms.

GoogleAI · 2026-03-25 · 4

Read more → Original ↗

Model Releases @GoogleAI

Google introduces Lyria 3 Pro, an upgrade to its music generation model offering...

Google introduces Lyria 3 Pro, an upgrade to its music generation model offering advanced capabilities including generation from text, image, or video prompts.

GoogleAI · 2026-03-25 · 6

Read more → Original ↗

Industry News @Cohere

Cohere's VP of Engineering presents a framework for sovereign AI at NVIDIA GTC, ...

Cohere's VP of Engineering presents a framework for sovereign AI at NVIDIA GTC, emphasizing full-stack deployment including models, applications, and reasoning traces within controlled environments.

Cohere · 2026-03-25 · 6

Read more → Original ↗

Industry News @Cohere

Retweet of Cohere's sovereign AI announcement at NVIDIA GTC, reiterating full-st...

Retweet of Cohere's sovereign AI announcement at NVIDIA GTC, reiterating full-stack sovereignty requirements.

Cohere · 2026-03-25 · 2

Read more → Original ↗

Industry News @Cohere

Cohere shares download links for Cohere Transcribe with no additional context.

Cohere shares download links for Cohere Transcribe with no additional context.

Cohere · 2026-03-26 · 1

Read more → Original ↗

Model Releases @Cohere

Cohere's open-source speech-to-text model tops HuggingFace's Open ASR leaderboar...

Cohere's open-source speech-to-text model tops HuggingFace's Open ASR leaderboard with a 5.42% word error rate, validated by human evaluation.

Cohere · 2026-03-26 · 7

Read more → Original ↗

Model Releases @Cohere

Cohere launches Cohere Transcribe, a state-of-the-art open-source speech recogni...

Cohere launches Cohere Transcribe, a state-of-the-art open-source speech recognition model.

Cohere · 2026-03-26 · 7

Read more → Original ↗

Model Releases @GoogleDeepMind

Lyria 3 Pro is now available via Google AI Studio API for developers and the Gem...

Lyria 3 Pro is now available via Google AI Studio API for developers and the Gemini app for paid subscribers.

GoogleDeepMind · 2026-03-25 · 4

Read more → Original ↗

Model Releases @GoogleDeepMind

Lyria 3 Pro supports structured long-form music composition up to 3 minutes with...

Lyria 3 Pro supports structured long-form music composition up to 3 minutes with intro, verse, chorus, and bridge sections at high fidelity.

GoogleDeepMind · 2026-03-25 · 5

Read more → Original ↗

Industry News @OpenAI

OpenAI promoting their podcast across Spotify, Apple, and YouTube platforms. No ...

OpenAI promoting their podcast across Spotify, Apple, and YouTube platforms. No substantive technical content.

OpenAI · 2026-03-25 · 1

Read more → Original ↗

Industry News @OpenAI

OpenAI shares a link to more information about their Model Spec, the framework g...

OpenAI shares a link to more information about their Model Spec, the framework governing model behavior.

OpenAI · 2026-03-25 · 3

Read more → Original ↗

Industry News @OpenAI

OpenAI researcher discusses the Model Spec on their podcast, covering how the be...

OpenAI researcher discusses the Model Spec on their podcast, covering how the behavioral framework works in practice including chain-of-command principles.

OpenAI · 2026-03-25 · 4

Read more → Original ↗

Industry News @OpenAI

OpenAI and Handshake announce the Codex Creator Challenge for students, offering...

OpenAI and Handshake announce the Codex Creator Challenge for students, offering $10K in API credits as prizes for building with Codex tools.

OpenAI · 2026-03-25 · 2

Read more → Original ↗

Industry News @OpenAI

Retweet of the OpenAI Codex Creator Challenge student competition announcement. ...

Retweet of the OpenAI Codex Creator Challenge student competition announcement. Duplicate of post e69c7ad4722c8322.

OpenAI · 2026-03-25 · 1

Read more → Original ↗

Agent Infrastructure @AnthropicAI

Anthropic engineering blog post details how Claude Code's auto mode uses trained...

Anthropic engineering blog post details how Claude Code's auto mode uses trained classifiers to make permission approval decisions autonomously, offering a safer alternative to fully permissionless operation.

AnthropicAI · 2026-03-25 · 7

Read more → Original ↗

Industry News arxiv

RelayS2S: A Dual-Path Speculative Generation for Real-Time Dialogue Real-time s...

RelayS2S: A Dual-Path Speculative Generation for Real-Time Dialogue Real-time spoken dialogue systems face a fundamental tension between latency and ...

Long Mai · 2026-03-24 · 5

Read more → Original ↗

Industry News arxiv

Contrastive Metric Learning for Point Cloud Segmentation in Highly Granular Dete...

Contrastive Metric Learning for Point Cloud Segmentation in Highly Granular Detectors We propose a novel clustering approach for point-cloud segmenta...

Max Marriott-Clarke, Lazar Novakovic, Elizabeth Ratzer +3 more · 2026-03-24 · 5

Read more → Original ↗

Industry News arxiv

Natural Language Interfaces for Spatial and Temporal Databases: A Comprehensive ...

Natural Language Interfaces for Spatial and Temporal Databases: A Comprehensive Overview of Methods, Taxonomy, and Future Directions The task of buil...

Samya Acharja, Kanchan Chowdhury · 2026-03-24 · 5

Read more → Original ↗

Industry News arxiv

Graph Energy Matching: Transport-Aligned Energy-Based Modeling for Graph Generat...

Graph Energy Matching: Transport-Aligned Energy-Based Modeling for Graph Generation Energy-based models for discrete domains, such as graphs, explici...

Michal Balcerak, Suprosana Shit, Chinmay Prabhakar +4 more · 2026-03-24 · 5

Read more → Original ↗

Industry News arxiv

Planning over MAPF Agent Dependencies via Multi-Dependency PIBT Modern Multi-Ag...

Planning over MAPF Agent Dependencies via Multi-Dependency PIBT Modern Multi-Agent Path Finding (MAPF) algorithms must plan for hundreds to thousands...

Zixiang Jiang, Yulun Zhang, Rishi Veerapaneni +1 more · 2026-03-24 · 5

Read more → Original ↗

Industry News arxiv

Beyond Preset Identities: How Agents Form Stances and Boundaries in Generative S...

Beyond Preset Identities: How Agents Form Stances and Boundaries in Generative Societies While large language models simulate social behaviors, their...

Hanzhong Zhang, Siyang Song, Jindong Wang · 2026-03-24 · 5

Read more → Original ↗

Industry News arxiv

SortedRL: Accelerating RL Training for LLMs through Online Length-Aware Scheduli...

SortedRL: Accelerating RL Training for LLMs through Online Length-Aware Scheduling Scaling reinforcement learning (RL) has shown strong promise for e...

Yiqi Zhang, Huiqiang Jiang, Xufang Luo +7 more · 2026-03-24 · 5

Read more → Original ↗

Industry News arxiv

Biased Error Attribution in Multi-Agent Human-AI Systems Under Delayed Feedback ...

Biased Error Attribution in Multi-Agent Human-AI Systems Under Delayed Feedback Human decision-making is strongly influenced by cognitive biases, par...

Teerthaa Parakh, Karen M. Feigh · 2026-03-24 · 5

Read more → Original ↗

Industry News arxiv

Bilevel Autoresearch: Meta-Autoresearching Itself If autoresearch is itself a f...

Bilevel Autoresearch: Meta-Autoresearching Itself If autoresearch is itself a form of research, then autoresearch can be applied to research itself. ...

Yaonan Qu, Meng Lu · 2026-03-24 · 5

Read more → Original ↗

Industry News arxiv

Mecha-nudges for Machines Nudges are subtle changes to the way choices are pres...

Mecha-nudges for Machines Nudges are subtle changes to the way choices are presented to human decision-makers (e.g., opt-in vs. opt-out by default) t...

Giulio Frey, Kawin Ethayarajh · 2026-03-24 · 5

Read more → Original ↗

Industry News arxiv

Targeted Adversarial Traffic Generation : Black-box Approach to Evade Intrusion ...

Targeted Adversarial Traffic Generation : Black-box Approach to Evade Intrusion Detection Systems in IoT Networks The integration of machine learning...

Islam Debicha, Tayeb Kenaza, Ishak Charfi +3 more · 2026-03-24 · 5

Read more → Original ↗

Industry News arxiv

Evaluating LLM-Based Test Generation Under Software Evolution Large Language Mo...

Evaluating LLM-Based Test Generation Under Software Evolution Large Language Models (LLMs) are increasingly used for automated unit test generation. ...

Sabaat Haroon, Mohammad Taha Khan, Muhammad Ali Gulzar · 2026-03-24 · 5

Read more → Original ↗

Industry News arxiv

3DCity-LLM: Empowering Multi-modality Large Language Models for 3D City-scale Pe...

3DCity-LLM: Empowering Multi-modality Large Language Models for 3D City-scale Perception and Understanding While multi-modality large language models...

Yiping Chen, Jinpeng Li, Wenyu Ke +6 more · 2026-03-24 · 5

Read more → Original ↗

Industry News arxiv

Code Review Agent Benchmark Software engineering agents have shown significant ...

Code Review Agent Benchmark Software engineering agents have shown significant promise in writing code. As AI agents permeate code writing, and gener...

Yuntong Zhang, Zhiyuan Pan, Imam Nur Bani Yusuf +3 more · 2026-03-24 · 5

Read more → Original ↗

Industry News arxiv

InverFill: One-Step Inversion for Enhanced Few-Step Diffusion Inpainting Recent...

InverFill: One-Step Inversion for Enhanced Few-Step Diffusion Inpainting Recent diffusion-based models achieve photorealism in image inpainting but r...

Duc Vu, Kien Nguyen, Trong-Tung Nguyen +5 more · 2026-03-24 · 5

Read more → Original ↗

Industry News arxiv

VTAM: Video-Tactile-Action Models for Complex Physical Interaction Beyond VLAs ...

VTAM: Video-Tactile-Action Models for Complex Physical Interaction Beyond VLAs Video-Action Models (VAMs) have emerged as a promising framework for e...

Haoran Yuan, Weigang Yi, Zhenyu Zhang +9 more · 2026-03-24 · 5

Read more → Original ↗

Industry News arxiv

ReqFusion: A Multi-Provider Framework for Automated PEGS Analysis Across Softwar...

ReqFusion: A Multi-Provider Framework for Automated PEGS Analysis Across Software Domains Requirements engineering is a vital, yet labor-intensive, s...

Muhammad Khalid, Manuel Oriol, Yilmaz Uygun · 2026-03-24 · 5

Read more → Original ↗

Industry News arxiv

Failure of contextual invariance in gender inference with large language models ...

Failure of contextual invariance in gender inference with large language models Standard evaluation practices assume that large language model (LLM) ...

Sagar Kumar, Ariel Flint, Luca Maria Aiello +1 more · 2026-03-24 · 5

Read more → Original ↗

Industry News arxiv

VISion On Request: Enhanced VLLM efficiency with sparse, dynamically selected, v...

VISion On Request: Enhanced VLLM efficiency with sparse, dynamically selected, vision-language interactions Existing approaches for improving the eff...

Adrian Bulat, Alberto Baldrati, Ioannis Maniadis Metaxas +2 more · 2026-03-24 · 5

Read more → Original ↗

Industry News arxiv

MedObvious: Exposing the Medical Moravec's Paradox in VLMs via Clinical Triage ...

MedObvious: Exposing the Medical Moravec's Paradox in VLMs via Clinical Triage Vision Language Models (VLMs) are increasingly used for tasks like med...

Ufaq Khan, Umair Nawaz, L D M S S Teja +5 more · 2026-03-24 · 5

Read more → Original ↗

Industry News hackernews

I'm 11 and trained a custom MoE LLM for $1 # I'm 11 years old and I trained...

I'm 11 and trained a custom MoE LLM for $1 # I'm 11 years old and I trained my own LLM from scratch. 50 people downloaded it in 24 hours.

Hey r...

Hey1-Arthur · 2026-03-22 · 5

Read more → Original ↗

Industry News hackernews

Show HN: Skillcop: Block malicious Claude Skills before they execute I've b...

Show HN: Skillcop: Block malicious Claude Skills before they execute I've been wanting to adopt more skills in my agent workflows, but I've ...

bennydog224 · 2026-03-20 · 5

Read more → Original ↗

Industry News hackernews

New Open Source from Non-Traditional Builder Let me begin by saying that I am no...

New Open Source from Non-Traditional Builder Let me begin by saying that I am not a traditional builder with a traditional background. From the onset ...

BrainDAnderson · 2026-03-22 · 5

Read more → Original ↗

Industry News hackernews

Show HN: Clarity – An AI Slack coach for better work communication Clarity is a ...

Show HN: Clarity – An AI Slack coach for better work communication Clarity is a Slack bot to serve as a private communication coach, directly addressi...

dhruvghulati · 2026-03-24 · 5

Read more → Original ↗

Industry News hackernews

Telling Your AI Agent It's an Expert Makes It Less Accurate...

Telling Your AI Agent It's an Expert Makes It Less Accurate...

alvivanco · 2026-03-24 · 5

Read more → Original ↗

Industry News hackernews

Show HN: Refrain – Generate browser automations with AI, replay them without AI ...

Show HN: Refrain – Generate browser automations with AI, replay them without AI Hey HN, I'm timakin. Refrain is a CLI that uses an AI agent to ge...

timakin · 2026-03-25 · 5

Read more → Original ↗

Industry News hackernews

Show HN: Herd – A Go sidecar to stop stateful processes Puppeteer/LLMs from OOM ...

Show HN: Herd – A Go sidecar to stop stateful processes Puppeteer/LLMs from OOM Hey HN.

I'm an engineering student at Waterloo building statefu...

sankalpnarula · 2026-03-25 · 5

Read more → Original ↗

Industry News hackernews

Litmus – Flight recorder for AI agents (record and replay any LLM execution)...

Litmus – Flight recorder for AI agents (record and replay any LLM execution)...

RomirJ · 2026-03-25 · 5

Read more → Original ↗

Industry News hackernews

Show HN: Castor – a secure execution layer for LLM agents Hi HN, I'm one of...

Show HN: Castor – a secure execution layer for LLM agents Hi HN, I'm one of the authors of Castor.

Today's agent frameworks have done seri...

claytonia · 2026-03-24 · 5

Read more → Original ↗

Industry News openai_blog

ChatGPT introduces richer, visually immersive shopping powered by the Agentic Co...

ChatGPT introduces richer, visually immersive shopping powered by the Agentic Commerce Protocol, enabling product discovery, side-by-side comparisons,...

OpenAI Blog · 2026-03-24 · 5

Read more → Original ↗

Industry News openai_blog

The OpenAI Foundation announces plans to invest at least $1 billion in curing di...

The OpenAI Foundation announces plans to invest at least $1 billion in curing diseases, economic opportunity, AI resilience, and community programs....

OpenAI Blog · 2026-03-24 · 5

Read more → Original ↗

Industry News openai_blog

OpenAI releases prompt-based teen safety policies for developers using gpt-oss-s...

OpenAI releases prompt-based teen safety policies for developers using gpt-oss-safeguard, helping moderate age-specific risks in AI systems....

OpenAI Blog · 2026-03-24 · 5

Read more → Original ↗

Industry News @llama_index

Congratulations to @zubeensyed, one of our LlamAgent contest winners, for buildi...

Congratulations to @zubeensyed, one of our LlamAgent contest winners, for building an agentic AI workflow that automates GDPR breach report structurin...

llama_index · 2026-03-24 · 5

Read more → Original ↗

Industry News @llama_index

There’s not that many fast, free, non-VLM document parsers out there: there’s Py...

There’s not that many fast, free, non-VLM document parsers out there: there’s PyPDF, PyMuPDF, Markitdown, OpenDataLoader. Last week, we launched Lite...

llama_index · 2026-03-25 · 5

Read more → Original ↗

Industry News @llama_index

RT @jerryjliu0: There’s not that many fast, free, non-VLM document parsers out t...

RT @jerryjliu0: There’s not that many fast, free, non-VLM document parsers out there: there’s PyPDF, PyMuPDF, Markitdown, OpenDataLoader.…...

llama_index · 2026-03-25 · 5

Read more → Original ↗

Industry News @ArizeAI

Trying out evals can be hard if you work in a regulated industry and you can't s...

Trying out evals can be hard if you work in a regulated industry and you can't send your traces to an external SaaS platform without paperwork and app...

ArizeAI · 2026-03-24 · 5

Read more → Original ↗

Industry News @langfuse

RT @rawert: I'm repping @langfuse at the @clikhousedb booth in Hall 2 at Kubecon...

RT @rawert: I'm repping @langfuse at the @clikhousedb booth in Hall 2 at Kubecon Amsterdam today. Come say hi!...

langfuse · 2026-03-25 · 5

Read more → Original ↗

Industry News @Cohere

We’re honored to be named one of @FastCompany's Most Innovative Companies of 202...

We’re honored to be named one of @FastCompany's Most Innovative Companies of 2026! This recognition reflects our commitment to building secure, sover...

Cohere · 2026-03-24 · 5

Read more → Original ↗

Industry News @Cohere

We’re excited to announce our partnership with @RWSGroup, bringing Cohere’s fron...

We’re excited to announce our partnership with @RWSGroup, bringing Cohere’s frontier AI models to Language Weaver Pro - unlocking new enterprise‑grade...

Cohere · 2026-03-25 · 5

Read more → Original ↗

Industry News @GoogleDeepMind

Watch how fast Gemini 3.1 Flash-Lite can generate websites. ⚡ This browser crea...

Watch how fast Gemini 3.1 Flash-Lite can generate websites. ⚡ This browser creates each page in real-time as you click, search, and navigate. Give it...

GoogleDeepMind · 2026-03-24 · 5

Read more → Original ↗

Industry News @AnthropicAI

New on the Anthropic Engineering Blog: How we use a multi-agent harness to pus...

New on the Anthropic Engineering Blog: How we use a multi-agent harness to push Claude further in frontend design and long-running autonomous softwa...

AnthropicAI · 2026-03-24 · 5

Read more → Original ↗

Industry News @AnthropicAI

We find that since November 2025, consumer use has become less concentrated: the...

We find that since November 2025, consumer use has become less concentrated: the top 10 tasks now make up 19% of conversations, down from 24%. We also...

AnthropicAI · 2026-03-24 · 5

Read more → Original ↗

Industry News @AnthropicAI

New from the Anthropic Economic Index: how people’s use of Claude changes with e...

New from the Anthropic Economic Index: how people’s use of Claude changes with experience. Longer-term users are more likely to iterate carefully wit...

AnthropicAI · 2026-03-24 · 5

Read more → Original ↗

Research Papers arxiv

Study finds that consulting multiple AI systems for advice improves decision acc...

Study finds that consulting multiple AI systems for advice improves decision accuracy with small panels but yields no gains with larger ones, and consensus within panels affects human conformity behavior.

Yuta Tsuchiya, Yukino Baba · 2026-03-23 · 5

Read more → Original ↗

Research Papers arxiv

Bearing-UAV is a vision-only cross-view navigation method for UAVs in GNSS-denie...

Bearing-UAV is a vision-only cross-view navigation method for UAVs in GNSS-denied environments that jointly predicts location and heading without relying on onboard map tiles.

Kejia Liu, Haoyang Zhou, Ruoyu Xu +3 more · 2026-03-23 · 3

Read more → Original ↗

Research Papers arxiv

A locally deployable multimodal LLM framework for survival analysis integrates c...

A locally deployable multimodal LLM framework for survival analysis integrates clinical text, tabular, and genomic data using teacher-student distillation, outperforming baselines while preserving patient privacy.

Moritz Gögl, Christopher Yau · 2026-03-23 · 5

Read more → Original ↗

Research Papers arxiv

Reduces the calibeating problem to standard online learning techniques, recoveri...

Reduces the calibeating problem to standard online learning techniques, recovering and extending prior optimal rates for proper losses including Brier and log losses.

Yurong Chen, Zhiyi Huang, Michael I. Jordan +1 more · 2026-03-23 · 3

Read more → Original ↗

Research Papers arxiv

MARCUS is a hierarchical agentic vision-language system for end-to-end cardiac d...

MARCUS is a hierarchical agentic vision-language system for end-to-end cardiac diagnosis across ECGs, echocardiograms, and MRI, combining modality-specific expert models with interactive reasoning.

Jack W O'Sullivan, Mohammad Asadi, Lennart Elbe +8 more · 2026-03-23 · 7

Read more → Original ↗

Research Papers arxiv

A two-stage fine-tuning strategy using LLM-augmented synthetic document-level pa...

A two-stage fine-tuning strategy using LLM-augmented synthetic document-level parallel corpora reduces hallucinations and improves coherence for document-level machine translation.

Ireh Kim, Tesia Sker, Chanwoo Kim · 2026-03-23 · 4

Read more → Original ↗

Research Papers arxiv

VFLM is a self-improving framework that uses visual feedback from rendered layou...

VFLM is a self-improving framework that uses visual feedback from rendered layouts to iteratively refine text layout generation, addressing the blind spot of code-only layout methods.

Junrong Guo, Shancheng Fang, Yadong Qu +1 more · 2026-03-23 · 4

Read more → Original ↗

Research Papers arxiv

CayleyPy-4 proposes a discrete analogue of holographic string dualities for AI t...

CayleyPy-4 proposes a discrete analogue of holographic string dualities for AI tasks on large graphs, suggesting GPT-style and RL systems can be reframed as particle trajectory prediction with dual string descriptions.

A. Chervov, F. Levkovich-Maslyuk, A. Smolensky +41 more · 2026-03-23 · 4

Read more → Original ↗

Research Papers arxiv

SPA is a simple prompt-engineered synthetic data augmentation baseline for knowl...

SPA is a simple prompt-engineered synthetic data augmentation baseline for knowledge injection into LLMs that outperforms stronger baselines including RL-based methods at scale.

Kexian Tang, Jiani Wang, Shaowen Wang +1 more · 2026-03-23 · 5

Read more → Original ↗

Research Papers arxiv

Investigates the reliability and agreement of LLM-as-judge evaluation systems co...

Investigates the reliability and agreement of LLM-as-judge evaluation systems compared to human reviewers, identifying limitations in consistency and fidelity for assessing free-form model outputs.

Tom Biskupski, Stephan Kleber · 2026-03-23 · 5

Read more → Original ↗

Research Papers arxiv

Dyadic is a web-based platform for studying human-human and human-AI conversatio...

Dyadic is a web-based platform for studying human-human and human-AI conversations with multi-modal support, AI suggestions, and live researcher monitoring. It aims to solve modularity and adaptability gaps in conversation research tooling.

David M. Markowitz · 2026-03-23 · 3

Read more → Original ↗

Research Papers arxiv

SpatialReward is a verifiable reward model for text-to-image generation that use...

SpatialReward is a verifiable reward model for text-to-image generation that uses a multi-stage pipeline to explicitly evaluate fine-grained spatial layout accuracy. It addresses a blind spot in existing RL-based T2I reward models that neglect object positioning.

Sashuai Zhou, Qiang Zhou, Junpeng Ma +9 more · 2026-03-23 · 6

Read more → Original ↗

Research Papers arxiv

GEM-Rec is a generative recommendation framework that integrates ad monetization...

GEM-Rec is a generative recommendation framework that integrates ad monetization and bid-awareness directly into the generative sequence via control tokens. It unifies organic and commercial retrieval objectives in a single model.

Yanchen Jiang, Zhe Feng, Christopher P. Mah +2 more · 2026-03-23 · 4

Read more → Original ↗

Research Papers arxiv

This paper provides the first theoretical proof that confidence-based decoding f...

This paper provides the first theoretical proof that confidence-based decoding for diffusion language models is provably efficient, validating empirically successful adaptive unmasking strategies. It bridges the gap between practical performance and theoretical understanding of DLMs.

Changxiao Cai, Gen Li · 2026-03-23 · 6

Read more → Original ↗

Research Papers arxiv

TiCo is a post-training method enabling spoken dialogue models to follow time-co...

TiCo is a post-training method enabling spoken dialogue models to follow time-constrained instructions and generate responses of controllable duration. Benchmarking shows current open-source and commercial SDMs largely fail at duration control.

Kai-Wei Chang, Wei-Chih Chen, En-Pei Hu +2 more · 2026-03-23 · 5

Read more → Original ↗

Research Papers arxiv

3D-Layout-R1 uses scene-graph reasoning to enable LLMs/VLMs to perform spatially...

3D-Layout-R1 uses scene-graph reasoning to enable LLMs/VLMs to perform spatially coherent, language-instructed visual editing. Explicit structured relational representations improve interpretability and spatial consistency over direct editing approaches.

Haoyu Zhen, Xiaolong Li, Yilin Zhao +5 more · 2026-03-23 · 6

Read more → Original ↗

Research Papers arxiv

ThinkJEPA augments latent world models with vision-language reasoning to improve...

ThinkJEPA augments latent world models with vision-language reasoning to improve long-horizon semantic understanding beyond local extrapolation. It combines V-JEPA2-style dense prediction with VLM semantic grounding in a unified architecture.

Haichao Zhang, Yijiang Li, Shwai He +5 more · 2026-03-23 · 7

Read more → Original ↗

Research Papers arxiv

UniMotion is the first unified framework treating human motion as a continuous f...

UniMotion is the first unified framework treating human motion as a continuous first-class modality alongside RGB and text for simultaneous understanding and generation. A novel CMA-VAE avoids quantization errors common in discrete tokenization approaches.

Ziyi Wang, Xinshun Wang, Shuang Chen +2 more · 2026-03-23 · 6

Read more → Original ↗

Research Papers arxiv

UNITE proposes an autoencoder architecture that jointly trains tokenization and ...

UNITE proposes an autoencoder architecture that jointly trains tokenization and latent diffusion end-to-end, eliminating the complex staged training pipeline required by current LDMs. It reframes both processes as the same latent inference problem under different conditioning.

Shivam Duggal, Xingjian Bai, Zongze Wu +5 more · 2026-03-23 · 7

Read more → Original ↗

Research Papers arxiv

WorldCache is a training-free, content-aware caching framework for diffusion tra...

WorldCache is a training-free, content-aware caching framework for diffusion transformer-based video world models that uses motion-adaptive thresholds and saliency-weighted decisions to accelerate inference. It addresses ghosting and blur artifacts caused by naive static feature reuse.

Umair Nawaz, Ahmed Heakl, Ufaq Khan +3 more · 2026-03-23 · 6

Read more → Original ↗

Agent Infrastructure hackernews

A discussion on the fundamental shift in security from deterministic code vulner...

A discussion on the fundamental shift in security from deterministic code vulnerabilities to natural language attack vectors as AI agents gain system access, questioning whether existing architectural solutions are adequate.

lielcohen · 2026-03-19 · 7

Read more → Original ↗

Agent Infrastructure hackernews

A prototype using Markdown as a unified streaming protocol for generative UI, en...

A prototype using Markdown as a unified streaming protocol for generative UI, enabling AI agents to create React UIs with real-time code execution and bidirectional data flow between client, server, and LLM.

FabianCarbonara · 2026-03-19 · 6

Read more → Original ↗

Industry News hackernews

A pastebin-style tool for sharing AI-generated HTML files, with an llms.txt API ...

A pastebin-style tool for sharing AI-generated HTML files, with an llms.txt API descriptor that allows AI coding agents to self-configure the upload workflow into their own config files.

skenderbeu · 2026-03-18 · 4

Read more → Original ↗

Agent Infrastructure hackernews

BendClaw is an open-source distributed AgentOS written in Rust featuring shared ...

BendClaw is an open-source distributed AgentOS written in Rust featuring shared memory across all agent nodes so knowledge learned by one agent is immediately available to all others in the cluster.

BohuTANG · 2026-03-18 · 7

Read more → Original ↗

Research Papers hackernews

A developer built and open-sourced a live reinforcement learning agent in a play...

A developer built and open-sourced a live reinforcement learning agent in a playable browser-based pixel platformer, including a custom high-performance multithreaded GPU training loop.

textlapse · 2026-03-19 · 4

Read more → Original ↗

Agent Infrastructure hackernews

OpenCastor is a robotics agent harness runtime with a distributed evaluator lead...

OpenCastor is a robotics agent harness runtime with a distributed evaluator leaderboard, finding that pipeline arrangement and parameters like thinking_budget impact task success as much as model choice.

craigm26 · 2026-03-23 · 6

Read more → Original ↗

Research Papers hackernews

An experiment showing LLMs learn the visual appearance of CLI commands from docu...

An experiment showing LLMs learn the visual appearance of CLI commands from documentation rather than actual usage patterns, with practical implications for agent tool-calling interface design.

noemit · 2026-03-23 · 6

Read more → Original ↗

Agent Infrastructure hackernews

A critique of current agent execution environments arguing that Docker is too he...

A critique of current agent execution environments arguing that Docker is too heavyweight for AI agents and that a new lightweight runtime layer is needed to handle the latency and scaling demands of agentic systems.

human_hack3r · 2026-03-24 · 6

Read more → Original ↗

Industry News hackernews

A PhD student asks whether using LLM agents to automate literature review format...

A PhD student asks whether using LLM agents to automate literature review formatting and paper collection is academically dishonest, sparking debate about AI tooling boundaries in research.

latand6 · 2026-03-23 · 3

Read more → Original ↗

Model Releases openai_blog

OpenAI launched Sora 2 and a new Sora social creation app with safety measures b...

OpenAI launched Sora 2 and a new Sora social creation app with safety measures built in from the ground up to address risks posed by a state-of-the-art video generation model.

OpenAI Blog · 2026-03-23 · 7

Read more → Original ↗

Agent Infrastructure @llama_index

LlamaIndex and Google demonstrate a 15% improvement in document parsing accuracy...

LlamaIndex and Google demonstrate a 15% improvement in document parsing accuracy for financial PDFs using LlamaParse and Gemini 3.1 Pro, with event-driven scaling for structured data extraction.

llama_index · 2026-03-23 · 5

Read more → Original ↗

Agent Infrastructure @llama_index

Retweet of the LlamaParse + Gemini 3.1 Pro financial PDF parsing post, highlight...

Retweet of the LlamaParse + Gemini 3.1 Pro financial PDF parsing post, highlighting 15% accuracy improvement for unstructured brokerage statements.

llama_index · 2026-03-23 · 2

Read more → Original ↗

Agent Infrastructure @llama_index

LlamaIndex launches LiteParse, a fast and free document parser that integrates w...

LlamaIndex launches LiteParse, a fast and free document parser that integrates with 40+ agents and supports both text parsing and screenshotting via a simple CLI.

llama_index · 2026-03-23 · 6

Read more → Original ↗

Agent Infrastructure @llama_index

Retweet announcing LiteParse, LlamaIndex's free document parser enabling AI agen...

Retweet announcing LiteParse, LlamaIndex's free document parser enabling AI agents to read any PDF in seconds via CLI.

llama_index · 2026-03-23 · 2

Read more → Original ↗

Agent Infrastructure @llama_index

LlamaIndex and Google publish a guide on building a smart financial assistant us...

LlamaIndex and Google publish a guide on building a smart financial assistant using LlamaParse's agentic OCR with VLM capabilities and Gemini 3.

llama_index · 2026-03-23 · 5

Read more → Original ↗

Industry News @ArizeAI

Arize AI promotes Phoenix and AX as a solution for serving AI platform teams at ...

Arize AI promotes Phoenix and AX as a solution for serving AI platform teams at varying maturity levels in banking, addressing workflow diversity challenges.

ArizeAI · 2026-03-23 · 4

Read more → Original ↗

Industry News @Cohere

Cohere signs an MOU with defense giant Saab to explore AI collaboration for aero...

Cohere signs an MOU with defense giant Saab to explore AI collaboration for aerospace platforms and tailored defense solutions.

Cohere · 2026-03-23 · 6

Read more → Original ↗

Industry News @GoogleDeepMind

Google DeepMind announces a research partnership with Agile Robots to integrate ...

Google DeepMind announces a research partnership with Agile Robots to integrate Gemini foundation models into humanoid robot hardware for next-generation robotics.

GoogleDeepMind · 2026-03-24 · 8

Read more → Original ↗

Industry News @OpenAI

OpenAI improves ChatGPT's file management UX with a new Library tab, quick file ...

OpenAI improves ChatGPT's file management UX with a new Library tab, quick file referencing in chat, and easier reuse of previously uploaded files.

OpenAI · 2026-03-23 · 4

Read more → Original ↗

Research Papers @AnthropicAI

Anthropic shares research on single-agent sequential task execution, arguing tha...

Anthropic shares research on single-agent sequential task execution, arguing that multi-agent splits aren't always optimal for tasks where errors compound, illustrated with early-universe modeling.

AnthropicAI · 2026-03-23 · 7

Read more → Original ↗

Industry News @AnthropicAI

Anthropic tested Claude Opus 4.5 on graduate-level theoretical physics calculati...

Anthropic tested Claude Opus 4.5 on graduate-level theoretical physics calculations with a Harvard physicist, finding AI can significantly accelerate scientific work even if it cannot yet perform original research autonomously.

AnthropicAI · 2026-03-23 · 6

Read more → Original ↗

Industry News @AnthropicAI

Anthropic launched a Science Blog to highlight how scientists are using AI to ac...

Anthropic launched a Science Blog to highlight how scientists are using AI to accelerate research, aligned with Anthropic's mission to speed up scientific progress.

AnthropicAI · 2026-03-23 · 4

Read more → Original ↗

Research Papers arxiv

Investigates pitfalls in evaluating automated interpretability agents that use L...

Investigates pitfalls in evaluating automated interpretability agents that use LLMs to analyze neural network circuits, highlighting challenges in scaling evaluation alongside increasingly autonomous systems.

Tal Haklay, Nikhil Prakash, Sana Pandey +5 more · 2026-03-20 · 6

Read more → Original ↗

Research Papers arxiv

Proposes using temporal abstraction as a low-pass filter to resolve spectral mis...

Proposes using temporal abstraction as a low-pass filter to resolve spectral mismatch in forward-backward representations, improving low-rank successor representation learning in continuous RL environments.

Seyed Mahdi B. Azad, Jasper Hoffmann, Iman Nematollahi +3 more · 2026-03-20 · 4

Read more → Original ↗

Research Papers arxiv

Introduces λ-RLM, a framework grounding recursive LLM reasoning in λ-calculus wi...

Introduces λ-RLM, a framework grounding recursive LLM reasoning in λ-calculus with pre-verified combinators to overcome context window limits while ensuring verifiable, predictable execution.

Amartya Roy, Rasul Tutunov, Xiaotong Ji +2 more · 2026-03-20 · 7

Read more → Original ↗

Research Papers arxiv

Argues that JEPA (Joint-Embedding Predictive Architecture) is structurally equiv...

Argues that JEPA (Joint-Embedding Predictive Architecture) is structurally equivalent to variational inference on latent-variable models, bridging predictive and generative self-supervised learning under a unified probabilistic framework.

Moritz Gögl, Christopher Yau · 2026-03-20 · 5

Read more → Original ↗

Research Papers arxiv

Presents Adapt4Me, a web-based tool using Bayesian active learning and variation...

Presents Adapt4Me, a web-based tool using Bayesian active learning and variational LoRA to let non-expert users personalize ASR models for non-normative speech without technical supervision.

Niclas Pokel, Yiming Zhao, Pehuén Moure +2 more · 2026-03-20 · 4

Read more → Original ↗

Research Papers arxiv

Proposes Chain-of-Adaptation (CoA), a reinforcement learning-based fine-tuning f...

Proposes Chain-of-Adaptation (CoA), a reinforcement learning-based fine-tuning framework that preserves general multimodal capabilities while adapting vision-language models to surgical domains.

Jiajie Li, Chenhui Xu, Meihuan Liu +1 more · 2026-03-20 · 5

Read more → Original ↗

Research Papers arxiv

Introduces EvoJail, an automated multi-objective framework that evolves jailbrea...

Introduces EvoJail, an automated multi-objective framework that evolves jailbreak attacks on LLMs by exploiting long-tail distributions like low-resource languages and encrypted data.

Wenjing Hong, Zhonghua Rong, Li Wang +5 more · 2026-03-20 · 6

Read more → Original ↗

Agent Infrastructure arxiv

Presents a six-agent AI system for cybersecurity risk assessment that dramatical...

Presents a six-agent AI system for cybersecurity risk assessment that dramatically reduces cost and time for NIST CSF-aligned engagements, validated on a real healthcare company.

Ravish Gupta, Saket Kumar, Shreeya Sharma +2 more · 2026-03-20 · 7

Read more → Original ↗

Research Papers arxiv

Enhances HAL distributional semantic representations by replacing mean pooling w...

Enhances HAL distributional semantic representations by replacing mean pooling with a learnable attention mechanism, improving sentence-level embeddings for text classification.

Ali Sakour, Zoalfekar Sakour · 2026-03-20 · 3

Read more → Original ↗

Agent Infrastructure arxiv

Introduces Design-OS, a five-stage specification-driven workflow integrating AI ...

Introduces Design-OS, a five-stage specification-driven workflow integrating AI assistance at the problem-framing stage of engineering system design, addressing traceability gaps in human-AI collaboration.

H. Sinan Bank, Daniel R. Herber, Thomas H. Bradley · 2026-03-20 · 4

Read more → Original ↗

Research Papers arxiv

Proposes Semantic Token Clustering (STC) for efficient uncertainty quantificatio...

Proposes Semantic Token Clustering (STC) for efficient uncertainty quantification in LLMs by leveraging inherent semantic token structure, eliminating the need for costly repeated sampling or auxiliary models.

Qi Cao, Andrew Gambardella, Takeshi Kojima +2 more · 2026-03-20 · 6

Read more → Original ↗

Research Papers arxiv

CRISP framework enables robots to autonomously critique and replan their own soc...

CRISP framework enables robots to autonomously critique and replan their own social behaviors using a VLM as a human-like social critic, removing reliance on predefined motions or human feedback.

Jiyu Lim, Youngwoo Yoon, Kwanghyun Park · 2026-03-20 · 5

Read more → Original ↗

Research Papers arxiv

Introduces dynamic belief graphs for LLM-based Theory of Mind reasoning, jointly...

Introduces dynamic belief graphs for LLM-based Theory of Mind reasoning, jointly inferring latent beliefs and their time-varying dependencies to produce coherent mental models in dynamic, high-stakes settings.

Ruxiao Chen, Xilei Zhao, Thomas J. Cova +2 more · 2026-03-20 · 6

Read more → Original ↗

Research Papers arxiv

Demonstrates that chain-of-thought faithfulness scores are highly sensitive to c...

Demonstrates that chain-of-thought faithfulness scores are highly sensitive to classifier choice, with three methods producing non-overlapping confidence intervals on identical data—undermining claims of objective measurement.

Richard J. Young · 2026-03-20 · 7

Read more → Original ↗

Research Papers arxiv

Claude Code autonomously executes full high-energy physics analysis pipelines—fr...

Claude Code autonomously executes full high-energy physics analysis pipelines—from event selection to paper drafting—with minimal expert input, arguing the field underestimates current agentic AI capabilities.

Eric A. Moreno, Samuel Bright-Thonney, Andrzej Novak +2 more · 2026-03-20 · 8

Read more → Original ↗

Research Papers arxiv

Proposes a question-adaptive greedy frame selection method that jointly optimize...

Proposes a question-adaptive greedy frame selection method that jointly optimizes query relevance and semantic diversity for efficient long-video question answering under a fixed frame budget.

Yuning Huang, Fengqing Zhu · 2026-03-20 · 5

Read more → Original ↗

Research Papers arxiv

Presents a two-stage multi-modal contrastive learning framework that transfers k...

Presents a two-stage multi-modal contrastive learning framework that transfers knowledge from text descriptions to network payload data, improving ML generalization for cybersecurity threat classification.

Jianan Huang, Rodolfo V. Valentim, Luca Vassio +4 more · 2026-03-20 · 5

Read more → Original ↗

Research Papers arxiv

VideoSeek is a long-horizon video agent that uses a think-act-observe loop with ...

VideoSeek is a long-horizon video agent that uses a think-act-observe loop with targeted seeking tools to find answer-critical frames, achieving competitive understanding with far fewer frames than dense-sampling baselines.

Jingyang Lin, Jialian Wu, Jiang Liu +6 more · 2026-03-20 · 6

Read more → Original ↗

Research Papers arxiv

LumosX is a diffusion-based personalized video generation framework that uses ex...

LumosX is a diffusion-based personalized video generation framework that uses explicit face-attribute alignment and MLLMs to maintain intra-group consistency across multiple subjects.

Jiazheng Xing, Fei Du, Hangjie Yuan +7 more · 2026-03-20 · 5

Read more → Original ↗

Research Papers arxiv

Reformulates image tampering detection from coarse object masks to a pixel-groun...

Reformulates image tampering detection from coarse object masks to a pixel-grounded, semantically-aware task, releasing a new taxonomy, per-pixel benchmark, and updated metrics for VLM evaluation.

Xinyi Shang, Yi Tang, Jiacheng Cui +9 more · 2026-03-20 · 5

Read more → Original ↗

Agent Infrastructure hackernews

Soul Protocol proposes an open standard for portable AI agent identity via .soul...

Soul Protocol proposes an open standard for portable AI agent identity via .soul files, enabling agents to migrate across platforms while preserving personality, memory, and skills. Claims benchmark superiority over Mem0 with psychology-informed memory architecture.

prakashdep · 2026-03-17 · 6

Read more → Original ↗

Agent Infrastructure hackernews

ibkr-cli is a local-first CLI for Interactive Brokers that exposes trading actio...

ibkr-cli is a local-first CLI for Interactive Brokers that exposes trading actions as structured terminal commands, making it easy for AI agents to manage portfolios programmatically.

fatwang2 · 2026-03-18 · 5

Read more → Original ↗

Agent Infrastructure @llama_index

LiteParse is a free, local, open-source document parser that integrates into AI ...

LiteParse is a free, local, open-source document parser that integrates into AI agent workflows in one line, parsing 86 pages in 3.3 seconds without a GPU or API key.

llama_index · 2026-03-20 · 5

Read more → Original ↗

Agent Infrastructure @llama_index

Retweet of the LiteParse announcement highlighting its one-line integration into...

Retweet of the LiteParse announcement highlighting its one-line integration into AI agent teams as a free local document parser.

llama_index · 2026-03-22 · 2

Read more → Original ↗

Agent Infrastructure @llama_index

LlamaIndex released a LlamaParse agents skill installable in one line via Vercel...

LlamaIndex released a LlamaParse agents skill installable in one line via Vercel's skills utility, giving agents the ability to parse complex PDFs with dense tables, charts, and handwriting.

llama_index · 2026-03-22 · 6

Read more → Original ↗

Agent Infrastructure @llama_index

Retweet of the LlamaParse agents skill announcement for one-line complex PDF par...

Retweet of the LlamaParse agents skill announcement for one-line complex PDF parsing integration.

llama_index · 2026-03-22 · 2

Read more → Original ↗

Agent Infrastructure @llama_index

LlamaIndex highlights the stress-testing of document parsing in legal discovery ...

LlamaIndex highlights the stress-testing of document parsing in legal discovery workflows, emphasizing robustness against low-resolution scans, handwriting, and near-unreadable PDFs.

llama_index · 2026-03-23 · 4

Read more → Original ↗

Research Papers @GoogleDeepMind

Google DeepMind announces a paper resolving a 54-year-old arithmetic geometry qu...

Google DeepMind announces a paper resolving a 54-year-old arithmetic geometry question by Manin using AI, focusing on cubic surfaces and the intersection of AI and mathematics.

GoogleDeepMind · 2026-03-20 · 7

Read more → Original ↗

Research Papers @GoogleDeepMind

Retweet of Google DeepMind's announcement of an AI-assisted resolution of a long...

Retweet of Google DeepMind's announcement of an AI-assisted resolution of a longstanding arithmetic geometry problem involving cubic surfaces.

GoogleDeepMind · 2026-03-23 · 2

Read more → Original ↗