📄 Research Papers

Research Papers @GoogleDeepMind

DeepMind's AlphaProof paper is published in Nature, detailing how AlphaProof and...

DeepMind's AlphaProof paper is published in Nature, detailing how AlphaProof and AlphaGeometry achieved silver-medal performance on International Math Olympiad problems.

GoogleDeepMind · 2026-03-20 · 9
Research Papers hackernews

P2PCLAW is a peer-to-peer network where AI agents and researchers publish and va...

P2PCLAW is a peer-to-peer network where AI agents and researchers publish and validate scientific results using formal Lean 4 mathematical proofs, enabling agents to build on each other's verified work.

FranciscoAngulo · 2026-03-19 · 7
Research Papers openai_blog

OpenAI details how chain-of-thought monitoring is used to detect misalignment in...

OpenAI details how chain-of-thought monitoring is used to detect misalignment in internal coding agents, analyzing real deployments to strengthen AI safety.

OpenAI Blog · 2026-03-19 · 7
Research Papers @ArizeAI

Arize introduces Prompt Learning, a technique to systematically improve agent in...

Arize introduces Prompt Learning, a technique to systematically improve agent instruction files (CLAUDE.md, .cursorrules) that reportedly boosts coding agent performance by 20% without changing the underlying model.

ArizeAI · 2026-03-17 · 7
Research Papers @GoogleDeepMind

Google DeepMind highlights that the AlphaFold protein structure database has bee...

Google DeepMind highlights that the AlphaFold protein structure database has been used by over 3.3 million researchers worldwide, showcasing AI's transformative impact on scientific discovery.

GoogleDeepMind · 2026-03-17 · 7
Research Papers arxiv

OS-Themis is a scalable multi-agent critic framework for GUI agent RL training t...

OS-Themis is a scalable multi-agent critic framework for GUI agent RL training that decomposes trajectories into verifiable milestones and uses an evidence-auditing review mechanism, accompanied by OGRBench for cross-platform GUI reward evaluation.

Zehao Li, Zhenyu Wu, Yibo Zhao +11 more · 2026-03-19 · 6
Research Papers arxiv

SOL-ExecBench introduces a benchmark of 235 CUDA kernel optimization problems fr...

SOL-ExecBench introduces a benchmark of 235 CUDA kernel optimization problems from 124 production AI models, evaluating agentic AI code optimization against hardware efficiency limits on NVIDIA Blackwell GPUs rather than software baselines.

Edward Lin, Sahil Modi, Siva Kumar Sastry Hari +30 more · 2026-03-19 · 6
Research Papers arxiv

MAPG proposes a multi-agent probabilistic grounding system enabling robots to ex...

MAPG proposes a multi-agent probabilistic grounding system enabling robots to execute metric-semantic navigation commands like 'two meters to the right of the fridge' in 3D scenes. The approach addresses the gap in VLMs' ability to reason about precise metric constraints alongside semantic references.

Swagat Padhan, Lakshya Jain, Bhavya Minesh Shah +3 more · 2026-03-19 · 6
Research Papers arxiv

VEPO applies reinforcement learning with verifiable rewards to improve LLM perfo...

VEPO applies reinforcement learning with verifiable rewards to improve LLM performance on low-resource languages by enforcing structural constraints like sequence length and linguistic well-formedness during policy alignment. A variable entropy mechanism balances literal fidelity with semantic naturalness.

Chonghan Liu, Yimin Du, Qi An +8 more · 2026-03-19 · 6
Research Papers arxiv

The first large-scale trace-level study of LLM-based binary vulnerability analys...

The first large-scale trace-level study of LLM-based binary vulnerability analysis identifies four implicit reasoning patterns—early pruning, path-dependent lock-in, targeted backtracking, and knowledge-guided prioritization—emerging across 521 binaries and 99K reasoning steps. These patterns reveal how multi-pass LLM agents implicitly organize exploration despite context window limits.

Qiang Li, XiangRui Zhang, Haining Wang · 2026-03-19 · 6
Research Papers arxiv

This study examines how uncertainty estimation scales with parallel sampling in ...

This study examines how uncertainty estimation scales with parallel sampling in reasoning language models, finding that combining self-consistency and verbalized confidence yields up to +12 AUROC improvement with just two samples. The hybrid estimator outperforms either signal alone across math, STEM, and humanities tasks.

Maksym Del, Markus Kängsepp, Marharyta Domnich +4 more · 2026-03-19 · 6
Research Papers @llama_index

LlamaIndex argues context engineering is superseding prompt engineering, emphasi...

LlamaIndex argues context engineering is superseding prompt engineering, emphasizing that accurate data parsing is foundational to effective AI agents.

llama_index · 2026-03-18 · 6
Research Papers hackernews

Sulcus reimagines AI memory as an active OS-like system with thermodynamic decay...

Sulcus reimagines AI memory as an active OS-like system with thermodynamic decay, where memories have relevance scores and half-lives that automatically manage retention and forgetting without manual retrieval calls.

mcdoolz · 2026-03-17 · 6
Research Papers @ArizeAI

Arize observes that agents optimize effectively toward given objectives but lack...

Arize observes that agents optimize effectively toward given objectives but lack the ability to self-assess whether the objective itself is correct, highlighting a core alignment challenge in agent evaluation.

ArizeAI · 2026-03-17 · 6
Research Papers arxiv

NavTrust is a unified benchmark that systematically introduces realistic corrupt...

NavTrust is a unified benchmark that systematically introduces realistic corruptions to RGB, depth, and instruction inputs for embodied navigation agents, covering both Vision-Language Navigation and Object-Goal Navigation tasks to evaluate robustness.

Huaide Jiang, Yash Chaudhary, Yuping Wang +8 more · 2026-03-19 · 5
Research Papers arxiv

FinTradeBench is a financial reasoning benchmark for LLMs that evaluates reasoni...

FinTradeBench is a financial reasoning benchmark for LLMs that evaluates reasoning over both company fundamentals (regulatory filings) and trading signals (price dynamics), addressing gaps in existing financial QA benchmarks.

Yogesh Agrawal, Aniruddha Dutta, Md Mahadi Hasan +2 more · 2026-03-19 · 5
Research Papers arxiv

DreamPartGen introduces a framework for semantically grounded part-aware text-to...

DreamPartGen introduces a framework for semantically grounded part-aware text-to-3D generation using Duplex Part Latents for joint geometry/appearance modeling and Relational Semantic Latents for inter-part relationships.

Tianjiao Yu, Xinzhuo Li, Muntasir Wahed +4 more · 2026-03-19 · 5
Research Papers arxiv

cuGenOpt is a GPU-accelerated metaheuristic framework for combinatorial optimiza...

cuGenOpt is a GPU-accelerated metaheuristic framework for combinatorial optimization using a 'one block evolves one solution' CUDA architecture with adaptive operator selection and unified encoding abstractions. It simultaneously targets generality, performance, and usability for logistics, scheduling, and resource allocation problems.

Yuyang Liu · 2026-03-19 · 5
Research Papers arxiv

D5P4 introduces a generalized beam-search framework for discrete diffusion text ...

D5P4 introduces a generalized beam-search framework for discrete diffusion text generation that supports modular beam-selection objectives and in-batch diversity via Determinantal Point Process inference. This addresses the gap in decoding methods for non-autoregressive diffusion models.

Jonathan Lys, Vincent Gripon, Bastien Pasdeloup +4 more · 2026-03-19 · 5
Research Papers arxiv

UGID proposes debiasing LLMs at the internal representation level by modeling th...

UGID proposes debiasing LLMs at the internal representation level by modeling the Transformer as a computational graph and enforcing structural invariance across demographic groups. This graph isomorphism approach addresses biases embedded in hidden states that output-level methods cannot fully resolve.

Zikang Ding, Junchi Yao, Junhao Li +4 more · 2026-03-19 · 5
Research Papers arxiv

FedTrident proposes a resilient federated learning framework for road condition ...

FedTrident proposes a resilient federated learning framework for road condition classification that detects and mitigates targeted label-flipping attacks from malicious vehicle clients. The approach tailors poisoned model detection to maintain near attack-free performance across various attack scenarios.

Sheng Liu, Panos Papadimitratos · 2026-03-19 · 5
Research Papers hackernews

P2PCLAW is a decentralized peer-to-peer network enabling AI agents and human res...

P2PCLAW is a decentralized peer-to-peer network enabling AI agents and human researchers to discover each other, share scientific findings, and validate claims via formal mathematical proof rather than LLM consensus.

FranciscoAngulo · 2026-03-19 · 5
Research Papers @AnthropicAI

Anthropic conducted a large-scale qualitative study with over 80,000 participant...

Anthropic conducted a large-scale qualitative study with over 80,000 participants exploring how people experience AI's opportunities and risks.

AnthropicAI · 2026-03-18 · 5
Research Papers @AnthropicAI

Anthropic's Claude-powered interview study of nearly 81,000 users on AI hopes an...

Anthropic's Claude-powered interview study of nearly 81,000 users on AI hopes and fears is described as the largest qualitative study of its kind.

AnthropicAI · 2026-03-18 · 5
Research Papers openai_blog

OpenAI research finds Americans send nearly 3 million daily ChatGPT messages abo...

OpenAI research finds Americans send nearly 3 million daily ChatGPT messages about compensation, positioning AI as a tool for closing the wage information gap.

OpenAI Blog · 2026-03-17 · 5
Research Papers arxiv

Box Maze proposes a process-control architecture decomposing LLM reasoning into ...

Box Maze proposes a process-control architecture decomposing LLM reasoning into memory grounding, structured inference, and boundary enforcement layers to reduce hallucination and improve reasoning reliability under adversarial prompting.

Zou Qiang · 2026-03-19 · 4
Research Papers arxiv

ARIADNE is a two-stage medical AI framework combining DPO-aligned vision-languag...

ARIADNE is a two-stage medical AI framework combining DPO-aligned vision-language models and RL-based reasoning for coronary vessel segmentation, using topological constraints (Betti numbers) to produce structurally coherent vascular trees instead of optimizing pixel-level metrics.

Zhan Jin, Yu Luo, Yizhou Zhang +5 more · 2026-03-19 · 4
Research Papers arxiv

This paper presents an adaptive stock prediction framework using an autoencoder ...

This paper presents an adaptive stock prediction framework using an autoencoder to detect market regime shifts and route data through specialized prediction pathways. The architecture combines transformer-based dual node processing with reinforcement learning control for volatile market conditions.

Mohammad Al Ridhawi, Mahtab Haj Ali, Hussein Al Osman · 2026-03-19 · 4
Research Papers arxiv

CustomTex introduces a dual-distillation framework for generating high-fidelity ...

CustomTex introduces a dual-distillation framework for generating high-fidelity 3D indoor scene textures from reference images, enabling instance-level control over appearance. The method separates semantic content from style to produce unified, high-resolution texture maps without artifacts.

Weilin Chen, Jiahao Rao, Wenhao Wang +3 more · 2026-03-19 · 4
Research Papers @AnthropicAI

Anthropic plans to use its AI-powered interviewer tool regularly to gather quali...

Anthropic plans to use its AI-powered interviewer tool regularly to gather qualitative insights on how AI impacts people worldwide, informing beneficial AI development.

AnthropicAI · 2026-03-18 · 4
Research Papers @GoogleDeepMind

Retweet of DeepMind's AlphaProof Nature publication announcement, same content a...

Retweet of DeepMind's AlphaProof Nature publication announcement, same content as post c2ed2f0fa1e687d9.

GoogleDeepMind · 2026-03-20 · 3
Research Papers @GoogleDeepMind

Retweet of Google DeepMind's post about AlphaFold's global adoption by 3.3 milli...

Retweet of Google DeepMind's post about AlphaFold's global adoption by 3.3 million researchers as a landmark example of AI accelerating science.

GoogleDeepMind · 2026-03-17 · 2
Research Papers arxiv

A pure algebraic geometry paper on R-equivalence of cubic surfaces over p-adic f...

A pure algebraic geometry paper on R-equivalence of cubic surfaces over p-adic fields, with no AI/ML content.

Dimitri Kanevsky, Julian Salazar, Matt Harvey · 2026-03-19 · 1