Agent Infrastructure agents computer_use developer_tools open_source

Agent-desktop is a CLI tool for AI agents that uses native OS accessibility APIs (instead of screenshot-based pixel clic

Agent-desktop is a CLI tool for AI agents that uses native OS accessibility APIs (instead of screenshot-based pixel clicking) for faster, cheaper, and more robust desktop automation.

Original Post

Show HN: Agent-desktop – Native desktop automation CLI for AI agents I've been building computer-use tools for a while, and I quietly launched this about a month ago (122 Stars on GH). I figured it was worth sharing here.

Over the last few months, a lot of computer-use agents have come out: Codex, Claude Code, CUA, and others. Most of them seem to work roughly like this: 1. Take a screenshot 2. Have the model predict pixel coordinates 3. Click x,y 4. Take another screenshot 5. Repeat

That works, but it's slow, expensive in tokens, and fragile. If the UI shifts a few pixels, things break. And the model still doesn't know what any element actually is.

But the OS already exposes structured UI information:

  - macOS: Accessibility API
  - Windows: UI Automation
  - Linux: AT-SPI

Screen readers have used these APIs for years. On the web, Playwright beat screenshot scraping for the same reason: structured access is just a better abstraction than pixels.

So I built a desktop equivalent: agent-desktop.

It's a cross-platform CLI for structured desktop automation through the accessibility tree. One Rust binary, about 15 MB, no runtime dependencies. It exposes 53 commands with JSON output, so an LLM can inspect and operate native apps without screenshots or vision models. Inspired by agent-browser by Vercel Labs.

A typical loop looks like this:

  agent-desktop snapshot --app Slack -i --compact
  agent-desktop click @e12
  agent-desktop type @e5 "ship it"
  agent-d



  
    
      Source
      HACKERNEWS (hackernews)
      Author
      lahfir
      Date
      2026-05-02
      
      Relevance
      

  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  6


      
      Topics
      agents, computer_use, developer_tools, open_source
    
  

  
  View Original Post ↗
  

  
  
    Related Posts
    
      
      

  
    Agent Infrastructure
    arxiv
  
  
    An AI red teaming agent built on the Dreadnode SDK automates adversarial workflo...
  
  An AI red teaming agent built on the Dreadnode SDK automates adversarial workflow construction using 45+ attacks and 450+ transforms, reducing manual red teaming from weeks to hours for agentic systems.
  
    
      Raja Sekhar Rao Dheekonda, Will Pearce, Nick Landers
      ·
      2026-05-05
      
      ·
      

  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  8


      
    
    
      Read more →
      Original ↗
    
  


      
      

  
    Agent Infrastructure
    arxiv
  
  
    A framework for automated multi-agent system composition that replaces manual pl...
  
  A framework for automated multi-agent system composition that replaces manual planning and agent selection with an LLM-driven planner, dynamic call graphs, and automated orchestration.
  
    
      Kishan Athrey, Ramin Pishehvar, Brian Riordan +1 more
      ·
      2026-05-05
      
      ·
      

  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  7


      
    
    
      Read more →
      Original ↗
    
  


      
      

  
    Agent Infrastructure
    @ArizeAI
  
  
    ArizeAI's launch of Alyx v2 revealed that small changes to prompts, tool descrip...
  
  ArizeAI's launch of Alyx v2 revealed that small changes to prompts, tool descriptions, or model behavior can cause regressions multiple steps later in agent workflows, forcing a rethink of testing strategy.
  
    
      ArizeAI
      ·
      2026-05-05
      
      ·
      

  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  7


      
    
    
      Read more →
      Original ↗