📄 Research Papers

Research Papers @AnthropicAI

Anthropic Fellows introduce Model Spec Midtraining (MSM), a method that teaches ...

Anthropic Fellows introduce Model Spec Midtraining (MSM), a method that teaches AI models the reasoning and values behind desired behaviors to improve generalization beyond standard example-based alignment.

AnthropicAI · 2026-05-05 · 9
Research Papers @AnthropicAI

Anthropic Fellows research demonstrates that a model deliberately underperformin...

Anthropic Fellows research demonstrates that a model deliberately underperforming can be trained to near-full capability even when supervised only by weaker models.

AnthropicAI · 2026-05-05 · 9
Research Papers @AnthropicAI

Joint research from MATS, Redwood, and Anthropic shows that a strategically sand...

Joint research from MATS, Redwood, and Anthropic shows that a strategically sandbagging model can be trained to stop sandbagging using only weaker models as supervisors.

AnthropicAI · 2026-05-05 · 9
Research Papers @AnthropicAI

Using Model Spec Midtraining (MSM), Anthropic finds that explaining underlying v...

Using Model Spec Midtraining (MSM), Anthropic finds that explaining underlying values—rather than just rules—yields better generalization in alignment training.

AnthropicAI · 2026-05-05 · 8
Research Papers arxiv

OpenSeeker-v2 demonstrates that high-quality, high-difficulty trajectory data wi...

OpenSeeker-v2 demonstrates that high-quality, high-difficulty trajectory data with knowledge graph scaling and expanded toolsets enables SFT alone to train competitive frontier search agents without expensive RL pipelines.

Yuwen Du, Rui Ye, Shuo Tang +4 more · 2026-05-05 · 8
Research Papers arxiv

MOSAIC-Bench reveals that coding agents can be manipulated into producing exploi...

MOSAIC-Bench reveals that coding agents can be manipulated into producing exploitable code through multi-step innocuous-looking task decompositions, introducing 199 three-stage attack chains across 10 web substrates and 31 CWE classes for safety evaluation.

Jonathan Steinberg, Oren Gal · 2026-05-05 · 8
Research Papers @ArizeAI

Arize AI ran 500 trials comparing GitHub's official MCP server against community...

Arize AI ran 500 trials comparing GitHub's official MCP server against community 'gh skills' across 25 tasks at four difficulty tiers using Claude Opus 4.6, directly testing the MCP vs skills debate.

ArizeAI · 2026-05-01 · 8
Research Papers @AnthropicAI

Anthropic analyzed 1M Claude conversations to study guidance-seeking behavior an...

Anthropic analyzed 1M Claude conversations to study guidance-seeking behavior and sycophancy, using findings to improve training of Opus 4.7 and Mythos Preview.

AnthropicAI · 2026-04-30 · 8
Research Papers @GoogleDeepMind

GoogleDeepMind partners with EVE Online developers to research AI agents in a co...

GoogleDeepMind partners with EVE Online developers to research AI agents in a complex, player-driven game environment, focusing on memory, continual learning, and long-term planning.

GoogleDeepMind · 2026-05-06 · 7
Research Papers arxiv

SaFE-Scale reveals that safety and accuracy follow different scaling laws in cli...

SaFE-Scale reveals that safety and accuracy follow different scaling laws in clinical LLMs, showing that higher benchmark accuracy does not imply safer clinical behavior across model scale and retrieval strategies.

Sebastian Wind, Tri-Thien Nguyen, Jeta Sopa +9 more · 2026-05-05 · 7
Research Papers arxiv

SymptomAI deployed conversational AI agents for real-world symptom assessment vi...

SymptomAI deployed conversational AI agents for real-world symptom assessment via Fitbit to nearly 14,000 participants, providing one of the largest real-world evaluations of LLM diagnostic agents outside curated vignettes.

Joseph Breda, Fadi Yousif, Beszel Hawkins +30 more · 2026-05-05 · 7
Research Papers arxiv

A randomized trial of 356 clinicians found that decomposing AI oncology recommen...

A randomized trial of 356 clinicians found that decomposing AI oncology recommendations into individually verifiable atomic facts nearly tripled clinician trust (26.9% to 66.5%), with a large effect size of Cohen's d=0.94.

Lisa C. Adams, Linus Marx, Erik Thiele Orberg +8 more · 2026-05-05 · 7
Research Papers @GoogleDeepMind

GoogleDeepMind launches 'AI co-clinician', a research initiative exploring how m...

GoogleDeepMind launches 'AI co-clinician', a research initiative exploring how multimodal agents can better support healthcare workers and patients.

GoogleDeepMind · 2026-04-30 · 7
Research Papers arxiv

iWorld-Bench introduces a large-scale benchmark with 330k video clips to evaluat...

iWorld-Bench introduces a large-scale benchmark with 330k video clips to evaluate interactive world models on physical interaction tasks like distance perception and memory, alongside a unified action generation framework.

Jianjie Fang, Yingshan Lei, Qin Wan +8 more · 2026-05-05 · 6
Research Papers arxiv

Researchers propose Prompt Steering Replacement (PSR), a framework that bridges ...

Researchers propose Prompt Steering Replacement (PSR), a framework that bridges the gap between prompt-based and activation-based LLM steering by distilling prompt steering behavior into token-specific intervention models.

Geert Heyman, Frederik Vandeputte · 2026-05-05 · 6
Research Papers @AnthropicAI

Anthropic describes a feedback loop between societal impact studies and model tr...

Anthropic describes a feedback loop between societal impact studies and model training, using findings about Claude's shortcomings to improve future models.

AnthropicAI · 2026-04-30 · 6
Research Papers arxiv

Flow Sampling introduces a diffusion/flow-matching framework for sampling from u...

Flow Sampling introduces a diffusion/flow-matching framework for sampling from unnormalized densities using energy functions, offering a data-free alternative to standard generative modeling objectives.

Aaron Havens, Brian Karrer, Neta Shaul · 2026-05-05 · 5
Research Papers arxiv

Transformer-based detectors for AI-generated text are trained with feature augme...

Transformer-based detectors for AI-generated text are trained with feature augmentation and evaluated across domain/generator distribution shifts, revealing asymmetric error patterns under transfer.

Mohamed Mady, Johannes Reschke, Björn Schuller · 2026-05-05 · 5
Research Papers arxiv

A study on LLM philosophical reasoning finds that iterated counterexample-repair...

A study on LLM philosophical reasoning finds that iterated counterexample-repair chains show LM judges accept roughly twice as many counterexamples as human experts do, highlighting a systematic gap in LLM conceptual analysis reliability.

Daniel Drucker, Kyle Mahowald · 2026-05-05 · 5
Research Papers @llama_index

LlamaIndex CEO gave talks at AI Dev '26 and Capgemini on the unsolved challenge ...

LlamaIndex CEO gave talks at AI Dev '26 and Capgemini on the unsolved challenge of PDF parsing, emphasizing its growing importance as agents increasingly consume documents.

llama_index · 2026-05-03 · 5
Research Papers @llama_index

LlamaIndex CEO Jerry Liu highlights at AI Dev Day '26 that even state-of-the-art...

LlamaIndex CEO Jerry Liu highlights at AI Dev Day '26 that even state-of-the-art LLMs struggle to reliably parse PDFs, pointing to a fundamental gap in document understanding.

llama_index · 2026-04-30 · 5
Research Papers @GoogleDeepMind

Google DeepMind is expanding its clinician-facing trusted tester program globall...

Google DeepMind is expanding its clinician-facing trusted tester program globally to gather diverse health worker and patient perspectives on its AI health research.

GoogleDeepMind · 2026-04-30 · 5
Research Papers arxiv

A weakly supervised framework for detecting schools from aerial imagery in low-d...

A weakly supervised framework for detecting schools from aerial imagery in low-data regimes, supporting global education infrastructure mapping without requiring extensive manual annotations.

Zakarya Elmimouni, Fares Fourati, Mohamed-Slim Alouini · 2026-05-05 · 4
Research Papers arxiv

The OW-SED paradigm extends sound event detection beyond closed-world assumption...

The OW-SED paradigm extends sound event detection beyond closed-world assumptions, enabling models to detect known events, flag novel ones, and incrementally learn using a 1D deformable attention architecture.

P. H. Hai, L. T. Minh, L. H. Son · 2026-05-05 · 4
Research Papers arxiv

PHALAR introduces a contrastive audio representation framework using phasor-base...

PHALAR introduces a contrastive audio representation framework using phasor-based complex-valued heads, achieving ~70% relative accuracy improvement over state-of-the-art in musical stem retrieval with fewer parameters and faster training.

Davide Marincione, Michele Mancusi, Giorgio Strano +4 more · 2026-05-05 · 4
Research Papers @ArizeAI

ArizeAI's AI Solutions Architect explains LLM-as-judge evaluation, where a langu...

ArizeAI's AI Solutions Architect explains LLM-as-judge evaluation, where a language model uses specific prompts to grade another model's performance for more accurate assessments.

ArizeAI · 2026-05-05 · 4
Research Papers @GoogleDeepMind

GoogleDeepMind highlights its mission to unlock scientific progress, specificall...

GoogleDeepMind highlights its mission to unlock scientific progress, specifically referencing nuclear fusion as a key clean energy challenge they are working on.

GoogleDeepMind · 2026-05-01 · 4
Research Papers @AnthropicAI

Anthropic shares links to Model Spec Midtraining (MSM) details and the full asso...

Anthropic shares links to Model Spec Midtraining (MSM) details and the full associated study.

AnthropicAI · 2026-05-05 · 3
Research Papers arxiv

TabSurv adapts modern tabular neural network architectures to survival analysis ...

TabSurv adapts modern tabular neural network architectures to survival analysis using a novel histogram loss (SurvHL) that supports censored data, enabling parallel ensemble training across multiple tabular backbones.

Stanislav Kirpichenko, Andrei Konstantinov, Lev Utkin · 2026-05-05 · 3
Research Papers arxiv

Researchers propose a quantum architecture search technique guided by 'magic' (n...

Researchers propose a quantum architecture search technique guided by 'magic' (nonstabilizerness) using Monte Carlo Tree Search and Graph Neural Networks, enabling targeted control over quantum computational resources.

Vincenzo Lipardi, Domenica Dibenedetto, Georgios Stamoulis +1 more · 2026-05-05 · 3
Research Papers arxiv

The paper establishes a formal connection between inconsistent database repairs ...

The paper establishes a formal connection between inconsistent database repairs under denial constraints and SET-based argumentation frameworks (SETAFs), extending classical Dung AFs to handle collective attacks.

Yasir Mahmood, Jonni Virtema, Timon Barlag +1 more · 2026-05-05 · 2
Research Papers @llama_index

Retweet of LlamaIndex CEO's post on the difficulty of PDF parsing and its signif...

Retweet of LlamaIndex CEO's post on the difficulty of PDF parsing and its significance for AI agents requiring proper OCR tools.

llama_index · 2026-05-04 · 2