Components of a Coding Agent

162 points - yesterday at 1:16 PM

Comments

beshrkayali yesterday at 4:52 PM

> long contexts are still expensive and can also introduce additional noise (if there is a lot of irrelevant info)

I think spec-driven generation is the antithesis of chat-style coding for this reason. With tools like Claude Code, you are the one tracking what was already built, what interfaces exist, and why something was generated a certain way.

I built Ossature[1] around the opposite model. You write specs describing behavior, it audits them for gaps and contradictions before any code is written, then produces a build plan toml where each task declares exactly which spec sections and upstream files it needs. The LLM never sees more than that, and there is no accumulated conversation history to drift from. Every prompt and response is saved to disk, so traceability is built in rather than something you reconstruct by scrolling back through a chat. I used it over the last couple of days to build a CHIP-8 emulator entirely from specs[2]. I have some more example projects on GitHub[3]

1: https://github.com/ossature/ossature

2: https://github.com/beshrkayali/chomp8

3: https://github.com/ossature/ossature-examples

gburgett today at 12:41 AM

Loved this writeup. I have built an agent for a specific niche use case for my clients (not a coding agent) but the principles are similar. ive only implemented 1-4 so far. Going to work on long term memory next, but I worry about prompt injection issues when allowing the LLM to write its own notes.

Since my agent works over email, the core agent loop only processes one message then hits the send_reply tool to craft a response. Then the next incoming email starts the loop again from scratch, only injecting the actual replies sent between user and agent. This naturally prunes the context preventing the long context window problem.

I also had a challenge deciding what context needs injecting into the initial prompt vs what to put into tools. Its a tradeoff between context bloat and cost of tool lookups which can get expensive paying per token. Theres also caching to consider here.

Full writeup is here if anyone is interested: https://www.healthsharetech.com/blog/building-alice-an-empow...

armcat yesterday at 3:42 PM

I still find it incredible at the power that was unleashed by surrounding an LLM with a simple state machine, and giving it access to bash

Yokohiii yesterday at 4:54 PM

The example is really lean and straightforward. I don't use coding agents, but this is some good overview and should help everyone to understand that coding agents may have sophisticated outcomes, but the raw interaction isn't magical at all.

It's also a good example that you can turn any useful code component that requires 1k LOC into a mess of 500k LOC.

IceWreck yesterday at 9:22 PM

> This is speculative, but I suspect that if we dropped one of the latest, most capable open-weight LLMs, such as GLM-5, into a similar harness, it could likely perform on par with GPT-5.4 in Codex or Claude Opus 4.6 in Claude Code.

People have been doing that for over a year already? GLM officially recommends plugging into Claude Code https://docs.z.ai/devpack/tool/claude and any model can be plugged into Codex CLI (it's open source and can be set via config file).

hsaliak yesterday at 10:36 PM

Tool output truncation helps a lot and is one of the best ways to reduce context bloat. In my coding agent the context is assembled from SQLite. I suffix the message ID to rehydrate the truncated tool call if it’s needed and it works great. My exploration on context management is mostly documented here https://github.com/hsaliak/std_slop/blob/main/docs/CONTEXT_M...

zbyforgotpass yesterday at 7:56 PM

Isn't there a better word than harness? I understand the metaphor of leading and constraining a raw power - but I don't like it.

rbren yesterday at 10:09 PM

Strong article! I’ve been using the engine/car analogy for a while now.

If you want to play with the basic building blocks of coding agents, check out https://github.com/OpenHands/software-agent-sdk

MrScruff yesterday at 4:29 PM

Unless I'm misunderstanding what's being described here, running Claude Code with different backend models is pretty common.

https://docs.z.ai/scenario-example/develop-tools/claude

It doesn't perform on par with Anthropic's models in my experience.

crustycoder yesterday at 4:41 PM

A timely link - I've just spent the last week failing to get a ChatGPT Skill to produce a reproducible management reporting workflow. I've figured out why and this article pretty much confirms my conclusions about the strengths & weaknesses of "pure" LLMS, and how to work around them. This article is for a slightly different problem domain, but the general problems and architecture needed to address them seem very similar.

Sim-In-Silico today at 1:06 AM

[dead]

techpulselab today at 12:13 AM

[dead]

jeremie_strand yesterday at 10:03 PM

[dead]

clawfund yesterday at 10:42 PM

[dead]

redoh yesterday at 9:51 PM

[dead]

volume_tech today at 12:31 AM

[dead]

ryguz yesterday at 8:57 PM

[dead]

Adam_cipher yesterday at 4:58 PM

[flagged]

aplomb1026 yesterday at 5:32 PM

[dead]

nareyko yesterday at 1:30 PM

[dead]