Show HN: Understudy – Teach a desktop agent by demonstrating a task once

53 points - today at 5:04 PM

I built Understudy because a lot of real work still spans native desktop apps, browser tabs, terminals, and chat tools. Most current agents live in only one of those surfaces.

Understudy is a local-first desktop agent runtime that can operate GUI apps, browsers, shell tools, files, and messaging in one session. The part I'm most interested in feedback on is teach-by-demonstration: you do a task once, the agent records screen video + semantic events, extracts the intent rather than coordinates, and turns it into a reusable skill.

Demo video: https://www.youtube.com/watch?v=3d5cRGnlb_0

In the demo I teach it: Google Image search -> download a photo -> remove background in Pixelmator Pro -> export -> send via Telegram. Then I ask it to do the same for Elon Musk. The replay isn't a brittle macro: the published skill stores intent steps, route options, and GUI hints only as a fallback. In this example it can also prefer faster routes when they are available instead of repeating every GUI step.

Current state: macOS only. Layers 1-2 are working today; Layers 3-4 are partial and still early.

    npm install -g @understudy-ai/understudy
    understudy wizard

GitHub: https://github.com/understudy-ai/understudy

Happy to answer questions about the architecture, teach-by-demonstration, or the limits of the current implementation.

Source

Comments

rybosworld today at 8:54 PM

I have a hard time believing this is robust.

mahendra0203 today at 8:48 PM

Most intersting thing in your comment: the agent falling back to AppleScript when GUI gets unreliable. That's the real product insight. Don't be a "GUI automation" tool. Be a "get the task done by any means necessary" tool..

wuweiaxin today at 6:07 PM

The demonstration-based approach is interesting for the handoff problem. The hardest part of agentic automation isnt the first run -- its making the agent robust to the cases the demonstrator never showed it. How do you handle edge cases or failures mid-task? Does it fall back to asking the user, or does it have some recovery heuristic? Asking because we found that the failure mode surface matters more than happy-path coverage when you actually deploy these in production.

sethcronin today at 8:32 PM

Cool idea -- Claude Chrome extension as something like this implemented, but obviously it's restricted to the Chrome browser.

abraxas today at 6:24 PM

One more tool targeting OSX only. That platform is overserved with desktop agents already while others are underserved, especially Linux.

deleted today at 6:23 PM

jedreckoning today at 6:42 PM

cool idea. good idea doing a demo as well.

aiwithapex today at 6:19 PM

[dead]

webpolis today at 7:09 PM

[dead]

sukhdeepprashut today at 6:01 PM

2026 and we still pretend to not understand how llms work huh