🔧 Agent Infrastructure
An AI red teaming agent built on the Dreadnode SDK automates adversarial workflo...
An AI red teaming agent built on the Dreadnode SDK automates adversarial workflow construction using 45+ attacks and 450+ transforms, reducing manual red teaming from weeks to hours for agentic systems.
A framework for automated multi-agent system composition that replaces manual pl...
A framework for automated multi-agent system composition that replaces manual planning and agent selection with an LLM-driven planner, dynamic call graphs, and automated orchestration.
ArizeAI's launch of Alyx v2 revealed that small changes to prompts, tool descrip...
ArizeAI's launch of Alyx v2 revealed that small changes to prompts, tool descriptions, or model behavior can cause regressions multiple steps later in agent workflows, forcing a rethink of testing strategy.
Probus is a multi-agent vulnerability scanner that discovered and got merged rea...
Probus is a multi-agent vulnerability scanner that discovered and got merged real security fixes in Vercel AI SDK, n8n, and LangGraph, demonstrating practical agentic security research value.
OpenAI open-sources MRC (Multipath Reliable Connection), a new supercomputer net...
OpenAI open-sources MRC (Multipath Reliable Connection), a new supercomputer networking protocol via OCP designed to boost resilience and performance in large-scale AI training clusters.
Inerrata proposes a collective knowledge layer for coding agents, enabling them ...
Inerrata proposes a collective knowledge layer for coding agents, enabling them to share and reuse solutions across sessions via an Ontological Knowledge Network and MCP-based graph search. Addresses the persistent problem of agents losing learned context on session reset.
A swarm manager pattern is described where a top-level orchestrator loops over r...
A swarm manager pattern is described where a top-level orchestrator loops over running agent harnesses to ensure progress, providing the coordination layer that turns subagents into a manageable fleet.
Analysis of OpenClaw's swarm management system reveals that once agents spawn ot...
Analysis of OpenClaw's swarm management system reveals that once agents spawn other agents, the runtime must own swarm lifecycle including identity, queuing, routing, and recovery.
Cofounder 2 is announced as agent infrastructure designed to run an entire compa...
Cofounder 2 is announced as agent infrastructure designed to run an entire company autonomously, orchestrating agents across engineering, sales, marketing, ops, and design for the 'one person billion dollar company.'
Aurra introduces bi-temporal memory for AI agents, allowing them to track when f...
Aurra introduces bi-temporal memory for AI agents, allowing them to track when facts were known and when they changed, with LLM-powered auto-supersede for outdated memories. Addresses a core limitation in agent memory management.
OpenAI details how it rebuilt its WebRTC infrastructure to support real-time Voi...
OpenAI details how it rebuilt its WebRTC infrastructure to support real-time Voice AI with low latency, global scalability, and natural conversational turn-taking. A significant technical deep-dive into production voice AI systems.
Google DeepMind details a dual-agent safety architecture for clinical AI, where ...
Google DeepMind details a dual-agent safety architecture for clinical AI, where a Planner agent continuously monitors a Talker agent to enforce safe clinical boundaries.
MAKA is a physics-grounded multi-agent architecture for CNC machining decision s...
MAKA is a physics-grounded multi-agent architecture for CNC machining decision support that enforces physical plausibility, safety bounds, and full provenance traceability in high-stakes manufacturing workflows.
Experience-RAG Skill is a pluggable agent layer that dynamically selects retriev...
Experience-RAG Skill is a pluggable agent layer that dynamically selects retrieval strategies based on task type and experience memory, achieving strong nDCG scores across diverse retrieval benchmarks.
ChatGPT gains improved memory and personalization by leveraging saved memories, ...
ChatGPT gains improved memory and personalization by leveraging saved memories, past chats, files, and Gmail context, with transparency via 'memory sources' indicators.
Lessons from shipping Alyx v2: production traces became regression datasets, eva...
Lessons from shipping Alyx v2: production traces became regression datasets, evals became the shared language for agent behavior, and CI/CD gates guarded against prompt, tool, and model changes.
Perplexity launches a professional finance product enabling teams to connect lic...
Perplexity launches a professional finance product enabling teams to connect licensed data from Morningstar, PitchBook, and others, with 35 dedicated finance workflows.
Defines an evaluation harness for agentic systems as infrastructure that continu...
Defines an evaluation harness for agentic systems as infrastructure that continuously selects, scores, and routes evaluation results into alerts, CI, or annotation pipelines.
Duralang is a Python decorator library that wraps every LangChain LLM, tool, and...
Duralang is a Python decorator library that wraps every LangChain LLM, tool, and MCP call as a Temporal Activity, enabling durable, fault-tolerant execution of LLM workflows.
Agent-desktop is a CLI tool for AI agents that uses native OS accessibility APIs...
Agent-desktop is a CLI tool for AI agents that uses native OS accessibility APIs (instead of screenshot-based pixel clicking) for faster, cheaper, and more robust desktop automation.
Standardized agent telemetry enables a powerful feedback loop: instrument once, ...
Standardized agent telemetry enables a powerful feedback loop: instrument once, route traces anywhere, debug step by step, run evals on production behavior, and improve from real agent trajectories.
Portable traces are proposed as the key mechanism for understanding complex agen...
Portable traces are proposed as the key mechanism for understanding complex agent behavior, capturing the full chain of request rewrites, retrievals, tool calls, model invocations, and handoffs behind a simple-looking output.
ArizeAI shares a writeup and video on testing their Alyx agent using traces, eva...
ArizeAI shares a writeup and video on testing their Alyx agent using traces, evals, experiments, and CI/CD pipelines.
Perplexity Computer highlights full source traceability, allowing users to click...
Perplexity Computer highlights full source traceability, allowing users to click citations and access underlying SEC filings, earnings transcripts, and licensed data sources.
Perplexity Computer offers a sourcing/screening feature that takes target criter...
Perplexity Computer offers a sourcing/screening feature that takes target criteria and returns a matched company list with reasoning and signals used.
Describes an evaluation harness as a continuous system that catches regressions ...
Describes an evaluation harness as a continuous system that catches regressions early and integrates results into engineering workflows like CI/CD.
Perplexity Computer integrates with Microsoft Teams, enabling users to run resea...
Perplexity Computer integrates with Microsoft Teams, enabling users to run research, analysis, and document creation directly within their Teams workspace.
Highlights the critical observability gaps in multi-agent systems, specifically ...
Highlights the critical observability gaps in multi-agent systems, specifically around tracking running agents, ownership, result routing, and session recovery.
Vdiff is a CLI tool that combines deterministic analysis with LLM reasoning to h...
Vdiff is a CLI tool that combines deterministic analysis with LLM reasoning to help developers prioritize and review AI-generated code changes in PRs, reducing the review bottleneck.
Arize AI's CEO discussed the critical importance of shared standards in agent de...
Arize AI's CEO discussed the critical importance of shared standards in agent development at Google Cloud NEXT, highlighting interoperability as a foundation for scalable agent systems.
Git Shield is a local pre-commit/pre-push hook tool combining gitleaks for secre...
Git Shield is a local pre-commit/pre-push hook tool combining gitleaks for secret scanning and an OpenAI Privacy Filter for PII detection, designed to prevent data leaks during AI-assisted coding sessions.
OpenAI showcases Codex's ability to iteratively edit files (e.g., presentations)...
OpenAI showcases Codex's ability to iteratively edit files (e.g., presentations) within a single thread, enabling a draft-to-deck workflow with in-context revisions.
OpenAI highlights Codex's ease of use for everyday tasks — research, planning, d...
OpenAI highlights Codex's ease of use for everyday tasks — research, planning, docs, slides, and spreadsheets — via role-based onboarding and app integrations.
LlamaIndex and Render partnered to build a scalable distributed document process...
LlamaIndex and Render partnered to build a scalable distributed document processing pipeline using LlamaParse for parsing, classification, extraction, and retrieval.
A CLI tool (npx llm-safe-haven) that hardens AI coding agents with security conf...
A CLI tool (npx llm-safe-haven) that hardens AI coding agents with security configurations in under a minute. Targets developers running local AI agents who want quick security hardening.
Speq is a collaborative web-based product specification tool that uses AI to int...
Speq is a collaborative web-based product specification tool that uses AI to interrogate project requirements and outputs structured specs compatible with MCP-based agent handoffs.
Codex now supports importing settings, plugins, agents, and project configuratio...
Codex now supports importing settings, plugins, agents, and project configurations to streamline workflow migration.
Teaser post arguing that properly built evaluation systems shift teams from mode...
Teaser post arguing that properly built evaluation systems shift teams from model evaluation to full system operation.
Retweet of the Cofounder 2 announcement for AI-driven full-company agent orchest...
Retweet of the Cofounder 2 announcement for AI-driven full-company agent orchestration infrastructure.