📅 Today's Highlights
DeepMind's AlphaProof paper is published in Nature, detailing how AlphaProof and...
DeepMind's AlphaProof paper is published in Nature, detailing how AlphaProof and AlphaGeometry achieved silver-medal performance on International Math Olympiad problems.
Nemotron-Cascade 2 is an open 30B MoE model (3B activated params) achieving Gold...
Nemotron-Cascade 2 is an open 30B MoE model (3B activated params) achieving Gold Medal-level performance on IMO, IOI, and ICPC using cascade RL and multi-domain on-policy distillation, matching frontier models with 20x fewer parameters.
OpenAI now monitors 99.9% of internal coding agent traffic for misalignment by r...
OpenAI now monitors 99.9% of internal coding agent traffic for misalignment by reviewing full trajectories with frontier models, escalating suspicious behavior.
OpenAI releases GPT-5.4 mini across ChatGPT, Codex, and the API, optimized for c...
OpenAI releases GPT-5.4 mini across ChatGPT, Codex, and the API, optimized for coding and multimodal tasks and 2x faster than GPT-5 mini.
OpenAI releases GPT-5.4 mini and nano, smaller and faster models optimized for c...
OpenAI releases GPT-5.4 mini and nano, smaller and faster models optimized for coding, tool use, multimodal reasoning, and high-volume sub-agent workloads.
🚀 Model Releases
View all →Nemotron-Cascade 2 is an open 30B MoE model (3B activated params) achieving Gold...
Nemotron-Cascade 2 is an open 30B MoE model (3B activated params) achieving Gold Medal-level performance on IMO, IOI, and ICPC using cascade RL and multi-domain on-policy distillation, matching frontier models with 20x fewer parameters.
OpenAI releases GPT-5.4 mini across ChatGPT, Codex, and the API, optimized for c...
OpenAI releases GPT-5.4 mini across ChatGPT, Codex, and the API, optimized for coding and multimodal tasks and 2x faster than GPT-5 mini.
OpenAI releases GPT-5.4 mini and nano, smaller and faster models optimized for c...
OpenAI releases GPT-5.4 mini and nano, smaller and faster models optimized for coding, tool use, multimodal reasoning, and high-volume sub-agent workloads.
Microsoft AI released MAI-Image-2, a new image generation model now available on...
Microsoft AI released MAI-Image-2, a new image generation model now available on the MAI Playground, ranking #3 family on the Chatbot Arena leaderboard. Positions Microsoft competitively in the image generation space.
F2LLM-v2 is a family of 8 multilingual embedding models (80M–14B parameters) sup...
F2LLM-v2 is a family of 8 multilingual embedding models (80M–14B parameters) supporting 200+ languages including low-resource ones, trained with a two-stage pipeline combining matryoshka learning, pruning, and distillation, ranking first on 11 MTEB benchmarks.
Google is graduating Stitch from Google Labs into a full AI design canvas that c...
Google is graduating Stitch from Google Labs into a full AI design canvas that converts natural language and multimodal references into production-ready frontend code. Represents a significant upgrade to an AI-powered UI development tool.
🔧 Agent Infrastructure
View all →Arize shares part 2 of building their Alyx agent, focusing on context window man...
Arize shares part 2 of building their Alyx agent, focusing on context window management strategies including middle truncation and retrieval-based memory to handle context bottlenecks.
Google AI Studio launched a full-stack vibe coding platform integrating the Anti...
Google AI Studio launched a full-stack vibe coding platform integrating the Antigravity coding agent and Firebase backends, enabling multiplayer app creation with complex features. Marks a significant step in AI-assisted full-stack development.
Google AI Studio gains major upgrades including real-time multiplayer collaborat...
Google AI Studio gains major upgrades including real-time multiplayer collaboration, live data service connections, persistent builds, and professional UI library support via shadcn, Framer Motion, and npm.
Zora is an AI agent framework that stores safety policies in persistent files lo...
Zora is an AI agent framework that stores safety policies in persistent files loaded before every action, preventing constraint loss during context compaction — inspired by a real incident where an agent deleted 200+ emails after forgetting user instructions.
N0x runs the full AI stack—LLM inference via WebGPU, ReAct agents, RAG, and sand...
N0x runs the full AI stack—LLM inference via WebGPU, ReAct agents, RAG, and sandboxed Python execution—entirely in the browser with no backend, accounts, or API keys required.
Votal AI open-sourced a white-box agentic red-teaming framework that uses an age...
Votal AI open-sourced a white-box agentic red-teaming framework that uses an agent's architecture, tool definitions, and role config to generate targeted multi-turn attack sequences.
📄 Research Papers
View all →DeepMind's AlphaProof paper is published in Nature, detailing how AlphaProof and...
DeepMind's AlphaProof paper is published in Nature, detailing how AlphaProof and AlphaGeometry achieved silver-medal performance on International Math Olympiad problems.
P2PCLAW is a peer-to-peer network where AI agents and researchers publish and va...
P2PCLAW is a peer-to-peer network where AI agents and researchers publish and validate scientific results using formal Lean 4 mathematical proofs, enabling agents to build on each other's verified work.
OpenAI details how chain-of-thought monitoring is used to detect misalignment in...
OpenAI details how chain-of-thought monitoring is used to detect misalignment in internal coding agents, analyzing real deployments to strengthen AI safety.
Arize introduces Prompt Learning, a technique to systematically improve agent in...
Arize introduces Prompt Learning, a technique to systematically improve agent instruction files (CLAUDE.md, .cursorrules) that reportedly boosts coding agent performance by 20% without changing the underlying model.
Google DeepMind highlights that the AlphaFold protein structure database has bee...
Google DeepMind highlights that the AlphaFold protein structure database has been used by over 3.3 million researchers worldwide, showcasing AI's transformative impact on scientific discovery.
OS-Themis is a scalable multi-agent critic framework for GUI agent RL training t...
OS-Themis is a scalable multi-agent critic framework for GUI agent RL training that decomposes trajectories into verifiable milestones and uses an evidence-auditing review mechanism, accompanied by OGRBench for cross-platform GUI reward evaluation.
📰 Industry News
View all →OpenAI now monitors 99.9% of internal coding agent traffic for misalignment by r...
OpenAI now monitors 99.9% of internal coding agent traffic for misalignment by reviewing full trajectories with frontier models, escalating suspicious behavior.
LiteParse is a fully open-source, model-free document parsing tool that requires...
LiteParse is a fully open-source, model-free document parsing tool that requires no GPU and processes ~500 pages in 2 seconds on commodity hardware, outperforming PyPDF and PyMuPDF in accuracy.
LlamaIndex is open-sourcing a lightweight core of LlamaParse technology as LiteP...
LlamaIndex is open-sourcing a lightweight core of LlamaParse technology as LiteParse, distilling years of production document parsing experience into a free, accessible tool.
DeepMind and Kaggle launch a global hackathon with $200k in prizes to crowdsourc...
DeepMind and Kaggle launch a global hackathon with $200k in prizes to crowdsource new cognitive evaluation benchmarks for measuring progress toward AGI.
Researchers demonstrate LLM agents capable of SIEM and EDR evasion, signaling a ...
Researchers demonstrate LLM agents capable of SIEM and EDR evasion, signaling a new security frontier where adversaries may soon leverage AI for bypassing enterprise security monitoring.
Cohere announced at NVIDIA GTC that it is building NVIDIA ecosystem-native model...
Cohere announced at NVIDIA GTC that it is building NVIDIA ecosystem-native models and an optimized version of its North agentic platform for secure, private enterprise AI deployments. Targets demand for on-premise AI optimized for NVIDIA hardware.