All your agents are going async

68 points - last Monday at 11:18 AM

Comments

edg5000 today at 10:09 AM

There is nothing wrong with the HTTP layer, it's just a way to get a string into the model.

The problem is the industry obsession on concatenating messages into a conversation stream. There is no reason to do it this way. Every time you run inference on the model, the client gets to compose the context in any way they want; there are more things than just concatenating prompts and LLM ouputs. (A drawback is caching won't help much if most of the context window is composed dynamically)

Coding CLIs as well as web chat works well because the agent can pull in information into the session at will (read a file, web search). The pain point is that if you're appending messages a stream, you're just slowly filling up the context.

The fix is to keep the message stream concept for informal communication with the prompter, but have an external, persistent message system that the agent can interact with (a bit like email). The agent can decide which messages they want to pull into the context, and which ones are no longer relevant.

The key is to give the agent not just the ability to pull things into context, but also remove from it. That gives you the eternal context needed for permanent, daemonized agents.

_pdp_ today at 9:45 AM

Here is an interesting find.

Let's say that you have two agents running concurrently: A & B. Agent A decides to push a message into the context of agent B. It does that and the message ends up somewhere in the list of the message right at the bottom of the conversation.

The question is, will agent B register that a new message was inserted and will it act on it?

If you do this experiment you will find out that this architecture does not work very well. New messages that are recent but not the latest have little effect for interactive session. In other words, Agent A will not respond and say, "and btw, this and that happened" unless perhaps instructed very rigidly or perhaps if there is some other instrumentation in place.

Your mileage may vary depending on the model.

A better architecture is pull-based. In other words, the agent has tools to query any pending messages. That way whatever needs to be communicated is immediately visible as those are right at the bottom of the context so agents can pay attention to them.

An agent in that case slightly more rigid in a sense that the loop needs to orchestrate and surface information and there is certainly not one-size-fits-all solution here.

I hope this helps. We've learned this the hard way.

scotty79 today at 12:40 PM

It seems that people started spontaneously using chat apps (telegram and such) for durable channel between them and their async agents.

Maybe better somebody standardize that because well end up with agents sending rich payloads between themselves via telegram.

aledevv today at 9:56 AM

> All of these features are about breaking the coupling between a human sitting at a terminal or chat window and interacting turn-by-turn with the agent.

This means:

- less and less "man-in-the-loop"

- less and less interaction between LLMs and humans

- more and more automation

- more and more decision-making autonomy for agents

- more and more risk (i.e., LLMs' responsibility)

- less and less human responsibility

Problem:

Tasks that require continuous iteration and shared decision-making with humans have two possible options:

- either they stall until human input

- or they decide autonomously at our risk

Unfortunately, automation comes at a cost: RISK.

artisin today at 10:37 AM

So reinventing terminal multiplexing, except over proprietary chat/realtime transports instead of PTYs?

Yokohiii last Monday at 11:36 AM

this is a commercial sales pitch for something that doesn't exist

Havoc today at 9:58 AM

Struggling with this at the moment too - the second you have a task that is a blend of CI style pipeline, LLM processing and openclaw handing that data back and forth, maintaining state and triggering next step gets tricky. They're essentially different paradigms of processing data and where they meet there are impedance mismatches.

Even if I can string it together it's pretty fragile.

That said I don't really want to solve this with a SaaS. Trying really hard to keep external reliance to a minimum (mostly the llm endpoint)

mettamage today at 9:58 AM

> The interesting thing is what agents can do while not being synchronously supervised by a human.

I vibe coded a message system where I still have all the chat windows open but my agents run a command that finished once a message meant for them comes along and then they need to start it back up again themselves. I kept it semi-automatic like that because I'm still experimenting whether this is what I want.

But they get plenty done without me this way.

sebastiennight today at 9:33 AM

The idea of the "session" is an interesting solution, I'll be looking forward to new developments from you on this.

I don't think it solves the other half of the problem that we've been working on, which is what happens if you were not the one initiating the work, and therefore can't "connect back into a session" since the session was triggered by the agent in the first place.

serbrech today at 9:30 AM

I recognize the problem statement and decomposition of it. But not the solution. Especially saying that he sees the same problem being worked on by N people. And now that makes in N+1? I’ve been more interested by the protocols and standard that could truly solve this for everyone in a cross-compatible way. Some people have dabbled with atproto as the transport and “memory” storage for example.

sonink today at 11:37 AM

I was of the same view - but then there is this other trend which is putting sync back in favor. And that is that agents are becoming faster. If they are faster - it makes sense to stick around and maintain your 'context' about the task and supervise in real time. The other thing which might keep sync in fashion is that LLM providers are cutting back on cheap tokens. So you have a bigger incentive to stick around and make sure that your agent is not going astray.

The only place I use async now is when I am stepping away and there are a bunch of longer tasks on my plate. So i kick them off and then get to review them when ever I login next. However I dont use this pattern all that much and even then I am not sure if the context switching whenever I get back is really worth it.

Unless the agents get more reliable on long horizon tasks, it seems that async will have limited utility. But can easily see this going into videos feeding the twitter ai launch hype train.

htahir111 today at 9:41 AM

How would you differentiate between other tools like Temporal or Kitaru (https://kitaru.ai/) ?

TacticalCoder today at 9:55 AM

> ... and streaming the tokens back on the HTTP response as an SSE stream

> So how are folks solving this?

$5 per month dedicated server, SSH, tmux.

dist-epoch today at 10:20 AM

Can anybody explain why many times if you switch away from the chat app on the phone, the conversation can get broken?

Having long living requests, where you submit one, you get back a request_id, and then you can poll for it's status is a 20 year old solved problem.

Why is this such a difficult thing to do in practice for chat apps? Do we need ASI to solve this problem?

petesergeant today at 9:55 AM

at https://agentblocks.ai we just use Google-style LROs for this, do we really need a "durable transport for AI agents built around the idea of a session"?

potter098 today at 10:39 AM

[dead]

maxbeech today at 9:20 AM

[dead]