I love your content, but I wish you'd make your blog theme responsive for wider screens/non-mobile. I prefer to read content like this on a large screen.
Showboat seems like it could actually be quite useful for humans too, just for making quick notes from a CLI without opening an editor. The "pop" command makes me wonder if there would be a benefit to also having an array-like in addition to the stack-like interface. It seems like it would be fairly trivial to generate an index of markdown blocks so that they could be edited individually.
I like the idea of Rodney, but I wonder if you might actually have better results by asking the agent to generate equivalent Selenium scripts instead. I'm specifically suggesting Selenium because it's been around so long so I assume there's a lot of Selenium in the LLMs training data, but there are other options that might work too.
cadamsdotcomtoday at 10:04 PM
Great to see you doing red/green TDD Simon!
Passing tests in your repo are great documentation of the tool at a microscopic level. And rerunning tests only burns tokens on failures (since passed tests just print a dot) so it’s token efficient too.
Some other neat tricks:
- For greater efficiency configure your test runner to print nothing (not even a dot/filename) for test successes. Agents don’t need progress dots, only the exit code & failure details
- Have your agent implement a 10ms timeout per test. pytest has hooks to do this. The agent will see tests time out and mock out all I/O and third party code - why test what one assumes third parties tested already! Your test suite is CPU-bound without a shared database, has no shared data and no tests that interfere with or depend on each other, so tests can run in parallel.
Hansenqtoday at 7:10 PM
I was a bit confused as to how everything works until I read it in detail. Really cool tools, but I think one thing that would help in the introduction is: saying explicitly that the generated .md document is for you (the user) to read through, observe the output of the CLI call, and ensure that the output matches what you would expect.
It's basically an automated test, but at a higher abstraction level and with manual verification--using CLI tools rather than a test harness. Really great work!
giancarlostorotoday at 7:06 PM
I'll be sure to try these out. I've been building my own alternative to Beads with a concept called "gates" which do not let you close tasks as complete until a gate passes. Would love to throw these in as "gates" for my current workflow.
johnfntoday at 7:39 PM
Out of curiosity, what is the advantage of using Rodney when Playwright has the same set of features and AI understands how to write a Playwright script very well?
Sharlintoday at 8:52 PM
I can't wait for tools that allow agents to hold stand-ups, retrospectives and sprint planning sessions, all facilitated by an agentic scrum master.
elibentoday at 6:35 PM
Very interesting! I encountered the problems these tools are trying to tackle just recently while trying to guide an agent into creating an in-browser tool for me. Closing the loop on a web interface isn't as simple as CLI-only tools. I should give this a try.
It's also interesting that you've shifted to Go for your agent-coded CLI tools, Simon.
sNyZZzzztoday at 8:35 PM
Using Markdown as both docs and executable output is cool, but I’m curious how it scales when agents hit more complex ui.
Different from the cli used for running tests etc that comes bundled with PlayWright
Sample use:
playwright-cli open https://demo.playwright.dev/todomvc/ --headed
playwright-cli type "Buy groceries"
playwright-cli press Enter
playwright-cli type "Water flowers"
playwright-cli press Enter
playwright-cli check e21
playwright-cli check e35
playwright-cli screenshot
nzoschketoday at 7:11 PM
go-rod has been instrumental to my agentic coding loops too. Some uses:
- E2E testing of browser components
- Taking screenshots before and after and having Claude look at them to double check things
- Driving it with an API and CLI as a headless browser
Will definitely give Rodney a look.
water-drummertoday at 8:02 PM
Wait, why should an LLM simply not just write directly to the markdown file instead of going through the extra step of using a cli tool which is basically `echo 'something' >> file.md` but with templates that should really be in a prompt instead of a being in a compiled binary? Did Claude come up with the idea for this as well?
Also, I am sure you must already know about Playwright mcp so why this? If your goal isn't to make the cli human-friendly, which is the only advantage clis have over mcps doing the same thing, then why not just use the mcp? It doesn't even handle multiple sessions and has a single global state file––this is slop.
measurablefunctoday at 7:14 PM
Google's antigravity does this automatically by creating Task & Walkthrough artifacts.
saberiencetoday at 6:39 PM
Sounds like both of these tools could be one shot by either Claude or Codex.
Or alternatively, just be a skill versus a tool.
My “agents” already demo stuff all the time by just being prompted to do so. I have notations in my standard Agents.md for how I want my documentation, testing etc.
789bc7wassadtoday at 7:30 PM
[dead]
limonstublechewtoday at 6:49 PM
[dead]
brian200today at 7:03 PM
[flagged]
toastaltoday at 7:01 PM
If agents can generate text so easily, why would they be limited to Markdown instead of reStructuredText, AsciiDoc, or LaTeX which have rich features that help users understand text? I can understand developers refusing to adopt proper formats for documentation, but this seems odd for the bots. It doesn’t even generate the correct syntax block in Markdown using “bash” instead of “sh-session”.