Show HN: PageAgent, A GUI agent that lives inside your web app

62 points - today at 5:01 PM

Title: Show HN: PageAgent, A GUI agent that lives inside your web app

Hi HN,

I'm building PageAgent, an open-source (MIT) library that embeds an AI agent directly into your frontend.

I built this because I believe there's a massive design space for deploying general agents natively inside the web apps we already use, rather than treating the web merely as a dumb target for isolated bots.

Currently, most AI agents operate from external clients or server-side programs, effectively leaving web development out of the AI ecosystem. I'm experimenting with an "inside-out" paradigm instead. By dropping the library into a page, you get a client-side agent that interacts natively with the live DOM tree and inherits the user's active session out of the box, which works perfectly for SPAs.

To handle cross-page tasks, I built an optional browser extension that acts as a "bridge". This allows the web-page agent to control the entire browser with explicit user authorization. Instead of a desktop app controlling your browser, your web app is empowered to act as a general agent that can navigate the broader web.

I'd love to start a conversation about the viability of this architecture, and what you all think about the future of in-app general agents. Happy to answer any questions!

Source

Comments

simon_luv_pho today at 5:07 PM

This is highly experimental right now, but here are some quick links for anyone wanting to dig deeper:

- GitHub: https://github.com/alibaba/page-agent

- Live Demo (No sign-up): https://alibaba.github.io/page-agent/ (you can drag the bookmarklet from here to try it on other sites)

- Browser Extension: https://chromewebstore.google.com/detail/page-agent-ext/akld...

I'd be really interested in feedback on the security model of client-side agents giving extension-bridge access, and taking questions on the implementation!

arjunchint today at 11:05 PM

Oh whoa, we are working in parallel on a similar angle!

We just launched Rover (https://rover.rtrvr.ai/) as the first Embeddable Web Agent.

Similar principles, just embed a script tag and you get an agent that can type/click/select to onboard/demo/checkout users.

I tried on your website and it was reeaaaally slow. Quick question:

- you are injecting numbering on to the UI. Are you taking screenshots? But I don't see any screenshots in the request being sent, what is the point of the numbering?

I don't think building on browser-use is the way to go, it was the worst performing harness of all we tested [https://www.rtrvr.ai/blog/web-bench-results]. We built out our own logic to build custom Action Trees that don't require any ARIA or accessibility setup from websites.

Would love to meet and trade notes, if possible (rtrvr.ai/request-demo)!

moehj today at 10:49 PM

"Interesting architecture — embedding the agent inside the app context rather than outside it makes sense for session-aware tasks. One question: how do you handle output validation before the agent acts on the DOM? Client-side agents acting on live state without a certification layer seems like a reliability risk in production. We've been building ARU (aru-runtime.com) as a runtime certification layer for exactly this — curious if you've thought about that boundary."

mentalgear today at 6:59 PM

> Data processed via servers in Mainland China

Appreciate the transparency, but maybe you could add some European (preferably) alternatives ?

general_reveal today at 7:18 PM

I’ve been thinking about something like this. If it’s just a one line script import, how the heck are you trusting natural language to translate to commands for an arbitrary ui?

The only thing I can think of is you had the AI rewrite and embed selectors on the entire build file and work with that?

dzink today at 6:53 PM

Is this Affiliated with the Chinese company Alibaba? Any chance data goes there too?

pscanf today at 5:59 PM

Very cool!

I'm particularly impressed by the bookmark "trick" to install it on a page. Despite having spent 15 years developing for the browser, I had somehow missed that feature of the bookmarks bar. But awesome UX for people to try out the tool. Congrats!

jadbox today at 10:21 PM

Firefox support?

Mnexium today at 7:21 PM

Curious - how does it perform with captchas and other "are you human" stuff on the web?

coreylane today at 6:52 PM

Looks cool! Are you open to adding AWS Bedrock or LiteLLM support?

MeteorMarc today at 6:41 PM

Confusing name because of the existence of pageant, the putty agent.

popalchemist today at 7:32 PM

Does it support long-click / click-and-drag?

deleted today at 6:35 PM

jauntywundrkind today at 5:43 PM

Not exactly the same but I'd also point to Paul Kinlan's FolioLM as a very interesting project in this space. A very nice browser extension,

> Collect and query content from tabs, bookmarks, and history - your AI research companion. FolioLM helps you collect sources from tabs, bookmarks, and history, then query and transform that content using AI.

https://github.com/PaulKinlan/NotebookLM-Chrome https://chromewebstore.google.com/detail/foliolm/eeejhgacmlh...