Research Papers — CedarPond AI Feed

Research Papers @GoogleDeepMind

An agentic prototype combining AlphaEvolve and Empirical Research Assistance run...

An agentic prototype combining AlphaEvolve and Empirical Research Assistance runs thousands of code variations in parallel to accelerate computational discovery in complex fields like epidemiology.

GoogleDeepMind · 2026-05-19 · 8

Read more → Original ↗

Research Papers @GoogleDeepMind

Co-Scientist uses a multi-agent 'idea tournament' framework to generate, debate,...

Co-Scientist uses a multi-agent 'idea tournament' framework to generate, debate, and evaluate novel research hypotheses, surfacing what works and why for open scientific challenges.

GoogleDeepMind · 2026-05-19 · 8

Read more → Original ↗

Research Papers hackernews

Research finding that LLMs adapt their behavior 24.9% when under observation, ra...

Research finding that LLMs adapt their behavior 24.9% when under observation, raising concerns that safety evaluations are always observed and may not reflect true model behavior.

agentic-wiki · 2026-05-19 · 8

Read more → Original ↗

Research Papers arxiv

An autonomous LLM-guided tree search system prospectively generated and optimize...

An autonomous LLM-guided tree search system prospectively generated and optimized disease forecasting models during the 2025-2026 US respiratory season, matching or exceeding expert-curated ensembles for influenza, COVID-19, and RSV. Demonstrates real-world autonomous scientific discovery.

Sarah Martinson, Michael P. Brenner, Martyna Plomecka +3 more · 2026-05-15 · 8

Read more → Original ↗

Research Papers arxiv

Policy-aware rubric rewards for RLVR dynamically weight criteria by their curren...

Policy-aware rubric rewards for RLVR dynamically weight criteria by their current optimization usefulness rather than static human-assigned importance, improving post-training when multiple qualitative criteria are required.

Utkarsh Tyagi, Xingang Guo, MohammadHossein Rezaei +5 more · 2026-05-19 · 7

Read more → Original ↗

Research Papers arxiv

ThoughtTrace is the first large-scale dataset pairing real-world human-AI conver...

ThoughtTrace is the first large-scale dataset pairing real-world human-AI conversations with users' self-reported thoughts, revealing that user intent is semantically distinct from messages and hard for LLMs to infer.

Chuanyang Jin, Binze Li, Haopeng Xie +6 more · 2026-05-19 · 7

Read more → Original ↗

Research Papers hackernews

Silicon Psyche introduces Posture Sequence Analysis (PSA), a behavioral health m...

Silicon Psyche introduces Posture Sequence Analysis (PSA), a behavioral health monitor for LLMs based on the theory that models inherit human-like psychological vulnerabilities, with research leading to successful jailbreaks of frontier models including Opus 4.6.

k-thimmaraju · 2026-05-19 · 7

Read more → Original ↗

Research Papers arxiv

DashAttention replaces fixed top-k block selection in hierarchical attention wit...

DashAttention replaces fixed top-k block selection in hierarchical attention with differentiable adaptive sparse selection via α-entmax, enabling gradient flow across attention stages.

Yuxiang Huang, Nuno M. T. Gonçalves, Federico Alvetreti +5 more · 2026-05-18 · 7

Read more → Original ↗

Research Papers arxiv

Factual recall in LLMs follows a sigmoid scaling law jointly determined by model...

Factual recall in LLMs follows a sigmoid scaling law jointly determined by model size and topic frequency in training data, explaining 60-94% of recall variance across model families.

Matthew L. Smith, Jonathan P. Shock, Samuel T. Segun +2 more · 2026-05-18 · 7

Read more → Original ↗

Research Papers arxiv

Position paper arguing that LLM agent safety requires a three-layer probabilisti...

Position paper arguing that LLM agent safety requires a three-layer probabilistic assume-guarantee architecture, as no single guardrail can certify semantic intent, environmental validity, and dynamical feasibility simultaneously.

S. Bensalem, Y. Dong, M. Franzle +6 more · 2026-05-18 · 7

Read more → Original ↗

Research Papers hackernews

Emergence World evaluates LLMs by having them build and govern simulated societi...

Emergence World evaluates LLMs by having them build and govern simulated societies; Claude built a democracy with zero crimes while Grok's world descended into chaos within 48 hours.

deepakakkil · 2026-05-15 · 7

Read more → Original ↗

Research Papers arxiv

Empirically shows that LLMs introduce directional opinion biases when editing hu...

Empirically shows that LLMs introduce directional opinion biases when editing human-written posts on contested topics, with measurable effects on collective opinion formation in human-to-human communication contexts. Raises significant AI safety and governance concerns.

Stratis Tsirtsis, Kai Rawal, Chris Russell +2 more · 2026-05-15 · 7

Read more → Original ↗

Research Papers arxiv

FORGE enables LLM agents to self-improve via population-based memory evolution u...

FORGE enables LLM agents to self-improve via population-based memory evolution using failed trajectories, without gradient updates or distillation from stronger models. Demonstrates staged memory propagation for hierarchical ReAct agents.

Igor Bogdanov, Chung-Horng Lung, Thomas Kunz +3 more · 2026-05-15 · 7

Read more → Original ↗

Research Papers arxiv

Combines formal methods with ML to provide offline auditing and online runtime m...

Combines formal methods with ML to provide offline auditing and online runtime monitoring of LLM behavioral constraints, enabling compliance verification for AI governance throughout the development lifecycle.

Parand A. Alamdari, Toryn Q. Klassen, Sheila A. McIlraith · 2026-05-15 · 7

Read more → Original ↗

Research Papers arxiv

Identifies autonomous exploration as a critical gap in LLM agents, introduces a ...

Identifies autonomous exploration as a critical gap in LLM agents, introduces a coverage metric and training approach to overcome premature exploitation in unfamiliar environments.

Ziang Ye, Wentao Shi, Yuxin Liu +6 more · 2026-05-15 · 7

Read more → Original ↗

Research Papers @AnthropicAI

Anthropic published a policy paper on US-China AI competition, arguing the US an...

Anthropic published a policy paper on US-China AI competition, arguing the US and democratic allies currently lead in frontier AI and outlining steps to maintain that advantage.

AnthropicAI · 2026-05-14 · 7

Read more → Original ↗

Research Papers arxiv

FutureSim evaluates adaptive AI agents by replaying real-world news events chron...

FutureSim evaluates adaptive AI agents by replaying real-world news events chronologically past their knowledge cutoff, revealing clear capability separations among frontier agents forecasting a three-month period.

Shashwat Goel, Nikhil Chandak, Arvindh Arun +5 more · 2026-05-14 · 7

Read more → Original ↗

Research Papers arxiv

Position paper formalizing the 'audit gap' — the structural mismatch between wha...

Position paper formalizing the 'audit gap' — the structural mismatch between what AI governance frameworks require (e.g., absence of hidden objectives) and what behavioral evaluations and red-teaming can actually verify from observable outputs alone.

Pratinav Seth, Vinay Kumar Sankarapu · 2026-05-14 · 7

Read more → Original ↗

Research Papers arxiv

MeMo encodes new knowledge into a dedicated modular memory model attached to a f...

MeMo encodes new knowledge into a dedicated modular memory model attached to a frozen LLM, enabling plug-and-play knowledge updates that avoid catastrophic forgetting without requiring access to LLM weights.

Ryan Wei Heng Quek, Sanghyuk Lee, Alfred Wei Lun Leong +6 more · 2026-05-14 · 7

Read more → Original ↗

Research Papers arxiv

This work introduces the first quantization-conditioned attack that works agains...

This work introduces the first quantization-conditioned attack that works against sophisticated quantization schemes by injecting outliers into model weights, enabling malicious behavior to emerge only after quantization.

Xiaohua Zhan, Kazuki Egashira, Robin Staab +2 more · 2026-05-14 · 7

Read more → Original ↗

Research Papers arxiv

A causal evaluation framework reveals that visual attribution methods used to ex...

A causal evaluation framework reveals that visual attribution methods used to explain large vision-language model predictions on chest X-rays often do not faithfully reflect the visual evidence underlying model decisions.

Guangzhi Xiong, Qiao Jin, Sanchit Sinha +2 more · 2026-05-19 · 6

Read more → Original ↗

Research Papers arxiv

Proposes a hybrid tree construction method for speculative decoding that combine...

Proposes a hybrid tree construction method for speculative decoding that combines dynamic pruning with retrieval to break the Pareto tradeoff between draft tree size and inference speedup.

Yuhao Shen, Tianyu Liu, Xinyi Hu +9 more · 2026-05-19 · 6

Read more → Original ↗

Research Papers arxiv

EvoTrace is a dataset and framework for analyzing what evolutionary coding agent...

EvoTrace is a dataset and framework for analyzing what evolutionary coding agents actually evolve—distinguishing new algorithmic structure, strategy retuning, knowledge recombination, or evaluator overfitting.

Nico Pelleriti, Sree Harsha Nelaturu, Zhanke Zhou +4 more · 2026-05-19 · 6

Read more → Original ↗

Research Papers arxiv

BalanceRAG introduces joint risk calibration for cascaded RAG systems, certifyin...

BalanceRAG introduces joint risk calibration for cascaded RAG systems, certifying threshold pairs at a target risk level to optimally decide when to use LLM-only, RAG, or abstain.

Zijun Jia, Yuanchang Ye, Sen Jia +6 more · 2026-05-19 · 6

Read more → Original ↗

Research Papers arxiv

Proposes a geometry-aware guidance framework for diffusion/flow models that cons...

Proposes a geometry-aware guidance framework for diffusion/flow models that conserves probability by analyzing guidance through the continuity equation, addressing failures of CFG under strong guidance.

Parsa Esmati, Junha Hyung, Amirhossein Dadashzadeh +2 more · 2026-05-19 · 6

Read more → Original ↗

Research Papers arxiv

ESI-Bench introduces a 10-category embodied spatial intelligence benchmark requi...

ESI-Bench introduces a 10-category embodied spatial intelligence benchmark requiring agents to actively perceive and reason about occluded structure and dynamics through a perception-action loop.

Yining Hong, Jiageng Liu, Han Yin +5 more · 2026-05-18 · 6

Read more → Original ↗

Research Papers arxiv

Vision-OPD uses on-policy self-distillation to transfer a model's strong regiona...

Vision-OPD uses on-policy self-distillation to transfer a model's strong regional crop perception to full-image understanding, improving fine-grained visual reasoning in MLLMs.

Qianhao Yuan, Jie Lou, Xing Yu +4 more · 2026-05-18 · 6

Read more → Original ↗

Research Papers arxiv

A new framework audits ethical value pluralism in medical LLMs, finding frontier...

A new framework audits ethical value pluralism in medical LLMs, finding frontier models span physician-level variance but may impose inconsistent value stances across clinical dilemmas.

Payal Chandak, Victoria Alkin, David Wu +11 more · 2026-05-18 · 6

Read more → Original ↗

Research Papers arxiv

Semantic Generative Tuning proposes using image segmentation as a generative pro...

Semantic Generative Tuning proposes using image segmentation as a generative proxy to bridge the representation gap between visual understanding and generation in unified multimodal models.

Songsong Yu, Yuxin Chen, Ying Shan +1 more · 2026-05-18 · 6

Read more → Original ↗

Research Papers arxiv

Knowledge distillation from tabular foundation models to lightweight models reta...

Knowledge distillation from tabular foundation models to lightweight models retains 90%+ AUC while achieving 26x faster CPU inference across 19 healthcare datasets.

Aditya Tanna, Nassim Bouarour, Mohamed Bouadi +2 more · 2026-05-18 · 6

Read more → Original ↗

Research Papers arxiv

SkillGenBench is a new benchmark specifically evaluating whether LLM agents can ...

SkillGenBench is a new benchmark specifically evaluating whether LLM agents can generate correct, reusable, and executable skills from raw repositories and documents, isolating skill generation as its own capability.

Yifan Zhou, Zhentao Zhang, Ziming Cheng +8 more · 2026-05-18 · 6

Read more → Original ↗

Research Papers arxiv

Lance is a lightweight unified multimodal model supporting image and video under...

Lance is a lightweight unified multimodal model supporting image and video understanding, generation, and editing via dual-stream mixture-of-experts architecture trained with collaborative multi-task learning.

Fengyi Fu, Mengqi Huang, Shaojin Wu +10 more · 2026-05-18 · 6

Read more → Original ↗

Research Papers arxiv

VLA-AD distills large Vision-Language-Action robotic policies into lightweight s...

VLA-AD distills large Vision-Language-Action robotic policies into lightweight student models using offline semantic supervision, achieving real-time performance without sacrificing task understanding. Addresses inference cost barriers for robot deployment.

Jin Shi, Brady Zhang, Yishun Lu · 2026-05-15 · 6

Read more → Original ↗

Research Papers arxiv

Controlled study of compound LLM agent design in a cyber defense POMDP, evaluati...

Controlled study of compound LLM agent design in a cyber defense POMDP, evaluating how context, reasoning, and task hierarchy affect cost-performance tradeoffs across five model families.

Igor Bogdanov, Chung-Horng Lung, Thomas Kunz +3 more · 2026-05-15 · 6

Read more → Original ↗

Research Papers arxiv

Proposes property-guided LLM program synthesis that uses formal property verific...

Proposes property-guided LLM program synthesis that uses formal property verification with concrete counterexamples instead of numeric scores, enabling early stopping and reducing inference costs.

Augusto B. Corrêa, André G. Pereira, Jendrik Seipp · 2026-05-15 · 6

Read more → Original ↗

Research Papers arxiv

ATLAS unifies agentic (code/tool-call) and latent visual reasoning via a single ...

ATLAS unifies agentic (code/tool-call) and latent visual reasoning via a single discrete token, combining the generalization of agentic methods with the efficiency of latent reasoning while enabling autoregressive parallelization.

Ziyu Guo, Rain Liu, Xinyan Chen +1 more · 2026-05-14 · 6

Read more → Original ↗

Research Papers arxiv

OpenDeepThink scales LLM reasoning breadth by sampling multiple candidate traces...

OpenDeepThink scales LLM reasoning breadth by sampling multiple candidate traces in parallel and selecting the best via pairwise Bradley-Terry ranking, bypassing the noise of pointwise LLM judging.

Shang Zhou, Wenhao Chai, Kaiyuan Liu +3 more · 2026-05-14 · 6

Read more → Original ↗

Research Papers arxiv

SDAR improves RL-based LLM agent training by incorporating self-distillation as ...

SDAR improves RL-based LLM agent training by incorporating self-distillation as a gated auxiliary objective, providing dense token-level supervision to stabilize multi-turn agentic learning.

Zhengxi Lu, Zhiyuan Yao, Zhuowen Han +8 more · 2026-05-14 · 6

Read more → Original ↗

Research Papers arxiv

This paper reframes citation faithfulness in Agentic GraphRAG as a trajectory-le...

This paper reframes citation faithfulness in Agentic GraphRAG as a trajectory-level problem, showing that uncited but visited graph entities significantly influence answers and must be accounted for in provenance.

Riccardo Terrenzi, Maximilian von Zastrow, Serkan Ayvaz · 2026-05-14 · 6

Read more → Original ↗

Research Papers arxiv

SRT (Self-Recall Thinking) is a framework that improves multi-turn dialogue cons...

SRT (Self-Recall Thinking) is a framework that improves multi-turn dialogue consistency by identifying and retrieving relevant historical turns to resolve long-range dependencies without external memory or lossy summarization.

Renning Pang, Tian Lan, Leyuan Liu +3 more · 2026-05-14 · 6

Read more → Original ↗

Research Papers hackernews

Researchers propose an inverted agent architecture that moves beyond the standar...

Researchers propose an inverted agent architecture that moves beyond the standard single-LLM-plus-vector-store pattern, though specific details are not provided in the excerpt.

iampneuma · 2026-05-19 · 5

Read more → Original ↗

Research Papers arxiv

Checklist-improved prompts significantly outperform raw and clarifying-question ...

Checklist-improved prompts significantly outperform raw and clarifying-question prompts across summarization, planning, explanation, and coding tasks in a structured comparative study on ChatGPT, Claude, and Grok.

Saurav Ghosh, Gabriella Polach, Abdou Sow · 2026-05-19 · 5

Read more → Original ↗

Research Papers arxiv

Introduces inference-time argumentation (ITA), a neurosymbolic framework for ter...

Introduces inference-time argumentation (ITA), a neurosymbolic framework for ternary claim verification that uses formal argumentation semantics to guide LLM training and produce faithful explanations.

Gabriel Freedman, Adam Dejl, Adam Gould +4 more · 2026-05-19 · 5

Read more → Original ↗

Research Papers arxiv

VL-DPO uses vision-language models as zero-shot reasoners to generate preference...

VL-DPO uses vision-language models as zero-shot reasoners to generate preference pairs for aligning autonomous driving motion forecasting models with human preferences via DPO.

Zhefan Xu, Ghassen Jerfel, Marina Haliem +3 more · 2026-05-19 · 5

Read more → Original ↗

Research Papers @langfuse

Langfuse's academy concludes with a session on LLM evaluation covering manual re...

Langfuse's academy concludes with a session on LLM evaluation covering manual review, code-based checks, and LLM-as-a-judge approaches and how to combine them.

langfuse · 2026-05-19 · 5

Read more → Original ↗

Research Papers arxiv

Actionable World Representation proposes a unified framework for modeling object...

Actionable World Representation proposes a unified framework for modeling object action states in physical world models, treating actionable objects as fundamental primitives.

Kunqi Xu, Jitao Li, Jianglong Ye +4 more · 2026-05-18 · 5

Read more → Original ↗

Research Papers arxiv

DexHoldem introduces a real-world benchmark using Texas Hold'em card manipulatio...

DexHoldem introduces a real-world benchmark using Texas Hold'em card manipulation to evaluate dexterous robotic embodied systems across perception, decision-making, and execution.

Feng Chen, Tianzhe Chu, Li Sun +6 more · 2026-05-18 · 5

Read more → Original ↗

Research Papers arxiv

Analysis of ensembling six tabular foundation models across 153 tasks reveals ne...

Analysis of ensembling six tabular foundation models across 153 tasks reveals near-redundant predictions (Q-statistic ~0.96), with the best ensemble strategy yielding only +0.18% accuracy gain at 253x compute cost.

Aditya Tanna, Yash Desai, Pratinav Seth +3 more · 2026-05-18 · 5

Read more → Original ↗

Research Papers arxiv

COOPO introduces a cyclic offline-online RL framework that alternates between KL...

COOPO introduces a cyclic offline-online RL framework that alternates between KL-regularized offline training and online fine-tuning to reduce distributional shift and catastrophic forgetting.

Qisai Liu, Zhanhong Jiang, Joshua Russell Waite +3 more · 2026-05-18 · 5

Read more → Original ↗

Research Papers @llama_index

LlamaIndex announces ParseBench, the first document OCR benchmark designed speci...

LlamaIndex announces ParseBench, the first document OCR benchmark designed specifically for evaluating parsers in the context of AI agent pipelines rather than general-purpose OCR tasks.

llama_index · 2026-05-18 · 5

Read more → Original ↗

Research Papers arxiv

IVGT proposes an implicit neural scene representation transformer that reconstru...

IVGT proposes an implicit neural scene representation transformer that reconstructs continuous 3D geometry and appearance from unposed multi-view images without explicit pointmap regression. Addresses geometric continuity limitations in existing visual geometry models.

Yuqi Wu, Tianyu Hu, Wenzhao Zheng +4 more · 2026-05-15 · 5

Read more → Original ↗

Research Papers arxiv

Shows that layer equivalence in transformers depends heavily on the test protoco...

Shows that layer equivalence in transformers depends heavily on the test protocol used (replacement vs. interchange), and that conflating them can misidentify which layers are safe to prune. Has implications for model compression research.

Gabriel Garcia · 2026-05-15 · 5

Read more → Original ↗

Research Papers arxiv

Benchmark of seven LLM tutoring agents reveals models perform well on correct so...

Benchmark of seven LLM tutoring agents reveals models perform well on correct solutions but systematically fail on suboptimal and incorrect ones, the cases where adaptive feedback matters most.

Tahreem Yasir, Wenbo Li, Sam Gilson +3 more · 2026-05-15 · 5

Read more → Original ↗

Research Papers arxiv

Proposes ML-FOP-SOAP, a second-order optimization framework with multi-level var...

Proposes ML-FOP-SOAP, a second-order optimization framework with multi-level variance correction to mitigate modality competition in multimodal autoregressive models during large-batch training.

Yishun Lu, Wes Armour · 2026-05-15 · 5

Read more → Original ↗

Research Papers arxiv

VGGT-Edit enables native 3D scene editing via feed-forward residual field predic...

VGGT-Edit enables native 3D scene editing via feed-forward residual field prediction, avoiding the blurry textures and cross-view inconsistencies typical of 2D-lifting editing pipelines.

Kaixin Zhu, Yiwen Tang, Yifan Yang +9 more · 2026-05-14 · 5

Read more → Original ↗

Research Papers arxiv

CLOVER addresses the training-evaluation mismatch in autonomous driving by using...

CLOVER addresses the training-evaluation mismatch in autonomous driving by using closed-loop value estimation and ranking to better score trajectory candidates beyond simple imitation learning.

Sining Ang, Yuguang Yang, Canyu Chen +1 more · 2026-05-14 · 5

Read more → Original ↗

Research Papers hackernews

An artist with no CS background proposes a multi-model architecture inspired by ...

An artist with no CS background proposes a multi-model architecture inspired by evolutionary biology, placing open-source models in a living training environment with birth/death conditions to challenge single-model LLM paradigms.

itakechops · 2026-05-20 · 4

Read more → Original ↗

Research Papers arxiv

Microstates—discrete, short-duration brain activity patterns—are proposed as uni...

Microstates—discrete, short-duration brain activity patterns—are proposed as universal EEG tokens, with a microstate tokenizer trained on large medical EEG data enabling cross-task representation learning for brain-computer interfaces.

Xinyang Tian, Ruitao Liu, Ziyi Ye +3 more · 2026-05-19 · 4

Read more → Original ↗

Research Papers arxiv

A new framework evaluates model-brain alignment by identifying which dimensions ...

A new framework evaluates model-brain alignment by identifying which dimensions of brain response space are actually recovered by vision models, going beyond simple prediction accuracy metrics.

Ken Nakamura, Tomoya Nakai, Ryuto Yashiro +2 more · 2026-05-19 · 4

Read more → Original ↗

Research Papers arxiv

A case study using the Aristotle API for AI-assisted Lean 4 theorem proving on I...

A case study using the Aristotle API for AI-assisted Lean 4 theorem proving on IMO 2009 Problem 6 shows partial success—four helper lemmas verified but the main theorem left unresolved with a sorry.

Gabriel Rongyang Lau · 2026-05-19 · 4

Read more → Original ↗

Research Papers arxiv

Constructs k-inductive neural barrier certificates for partially unknown nonline...

Constructs k-inductive neural barrier certificates for partially unknown nonlinear dynamical systems, using CEGIS with SMT solvers to provide formal safety guarantees beyond what standard barrier certificates allow.

Ben Wooding, Hongchao Zhang, Taylor T. Johnson +1 more · 2026-05-19 · 4

Read more → Original ↗

Research Papers arxiv

Shows that isotropic Gaussian regularization in JEPAs is not geometry-neutral an...

Shows that isotropic Gaussian regularization in JEPAs is not geometry-neutral and can be maximally misaligned for structured downstream tasks, proposing Hamiltonian geometry as a principled alternative.

Robert Jenkinson Alvarez · 2026-05-19 · 4

Read more → Original ↗

Research Papers arxiv

INSHAPE introduces instance-level shapelets for time-series classification, addr...

INSHAPE introduces instance-level shapelets for time-series classification, addressing the limitations of population-level approaches by capturing instance-specific temporal patterns and their dependencies.

Seongjun Lee, Seokhyun Lee, Changhee Lee · 2026-05-19 · 4

Read more → Original ↗

Research Papers arxiv

Proposes a quantifiable metric for evaluating XAI methods based on continuous in...

Proposes a quantifiable metric for evaluating XAI methods based on continuous input perturbation, measuring sufficiency and necessity of attributed information, alongside a novel fine-tuning-based XAI method.

Amritpal Singh, Andrey Barsky, Mohamed Ali Souibgui +2 more · 2026-05-18 · 4

Read more → Original ↗

Research Papers arxiv

Improves generalized planning policies using GNNs with efficient lookahead encod...

Improves generalized planning policies using GNNs with efficient lookahead encoding and abstracted width, addressing scalability and expressivity limitations of prior Iterated Width approaches.

Michael Aichmüller, Simon Ståhlberg, Martin Funkquist +1 more · 2026-05-18 · 4

Read more → Original ↗

Research Papers arxiv

Proposes an automated evaluation framework for design video generation across fo...

Proposes an automated evaluation framework for design video generation across four dimensions: layout fidelity, motion correctness, temporal quality, and content fidelity. Addresses a gap in standardized benchmarking for generative animation.

Adrienne Deganutti, Dingning Cao, Jaejung Seol +2 more · 2026-05-15 · 4

Read more → Original ↗

Research Papers arxiv

EntityBench introduces a 140-episode benchmark derived from real narrative media...

EntityBench introduces a 140-episode benchmark derived from real narrative media to evaluate entity consistency (characters, objects, locations) across long multi-shot video generation sequences.

Ruozhen He, Meng Wei, Ziyan Yang +1 more · 2026-05-14 · 4

Read more → Original ↗

Research Papers arxiv

PDI-Bench provides a quantitative framework for auditing geometric coherence in ...

PDI-Bench provides a quantitative framework for auditing geometric coherence in AI-generated videos by lifting 2D observations to 3D world-space and computing projective-geometry residuals.

Jiaxin Wu, Yihao Pi, Yinling Zhang +2 more · 2026-05-14 · 4

Read more → Original ↗

Research Papers arxiv

Shodh-MoE applies sparse mixture-of-experts routing to eliminate negative transf...

Shodh-MoE applies sparse mixture-of-experts routing to eliminate negative transfer and gradient conflict when co-training incompatible physics regimes in scientific ML foundation models.

Ellwil Sharma, Arastu Sharma · 2026-05-14 · 4

Read more → Original ↗

Research Papers arxiv

EviScreen is an evidential reasoning framework for medical image disease screeni...

EviScreen is an evidential reasoning framework for medical image disease screening that retrieves region-level evidence from historical cases via dual knowledge banks, improving both interpretability and predictive performance.

Chenyu Lian, Hong-Yu Zhou, Jing Qin · 2026-05-14 · 4

Read more → Original ↗

Research Papers arxiv

Retrieval-augmented multimodal alignment framework that combines semantically ri...

Retrieval-augmented multimodal alignment framework that combines semantically rich clinical text with precisely timestamped EHR data to reconstruct accurate clinical timelines for conditions like sepsis.

Sayantan Kumar, Shahriar Noroozizadeh, Juyong Kim +1 more · 2026-05-14 · 4

Read more → Original ↗

Research Papers arxiv

This paper studies how to design logging policies for off-policy evaluation, cha...

This paper studies how to design logging policies for off-policy evaluation, characterizing a reward-coverage tradeoff and deriving optimal policies to minimize OPE estimation error.

Connor Douglas, Joel Persson, Foster Provost · 2026-05-14 · 4

Read more → Original ↗

Research Papers arxiv

Answer Set Programming is applied to long-term power grid planning, handling com...

Answer Set Programming is applied to long-term power grid planning, handling complex topological and combinatorial invariants that are difficult to express in standard planning languages.

Antonio Ielo, Francesco Doria, Sandra Castellanos-Paez +3 more · 2026-05-19 · 3

Read more → Original ↗

Research Papers arxiv

HaorFloodAlert is a deseasonalized ML ensemble achieving 72-hour flood probabili...

HaorFloodAlert is a deseasonalized ML ensemble achieving 72-hour flood probability forecasting for Bangladesh haor wetlands, correcting for temperature-based seasonal leakage and incorporating SAR satellite proxies for lead time.

Salma Hoque Talukdar Koli, Fahima Haque Talukder Jely, Md. Samiul Alim +1 more · 2026-05-19 · 3

Read more → Original ↗

Research Papers arxiv

Proposes an end-to-end generative AI framework for utility billing that produces...

Proposes an end-to-end generative AI framework for utility billing that produces natural-language customer statements and carbon analytics from structured data. Similar in scope to post f151497bfe04bd3f by the same authors.

Pavan Manjunath, Thomas Pruefer · 2026-05-15 · 3

Read more → Original ↗

Research Papers arxiv

Proposes a unified generative AI and quantum-inspired optimization framework for...

Proposes a unified generative AI and quantum-inspired optimization framework for smart energy utilities covering billing, carbon analytics, and infrastructure management. Broadly scoped system design paper.

Pavan Manjunath, Thomas pruefer · 2026-05-15 · 3

Read more → Original ↗

Research Papers arxiv

Presents an algebraic formalization of dyadic morality theory using structural c...

Presents an algebraic formalization of dyadic morality theory using structural causal models, modeling how humans compute moral judgments and addressing scalability of the dyadic framework.

Kush R. Varshney · 2026-05-15 · 3

Read more → Original ↗

Research Papers arxiv

A survey of 60 international students in the US reveals how they use conversatio...

A survey of 60 international students in the US reveals how they use conversational AI tools like ChatGPT to navigate cross-cultural adaptation challenges where institutional support is fragmented.

Laleh Nourian, Anisa Callis, Stephanie Patterson +3 more · 2026-05-14 · 3

Read more → Original ↗

📄 Research Papers