Idempotency Is Easy Until the Second Request Is Different
72 points - last Thursday at 11:02 AM
SourceComments
Here x is interpreted as state and f an action acting on the state.
State is in practice always subjected to side effects and concurrency. That's why if x is state then f can never be purely idempotent and the term has to be interpreted in a hand-wavy fashion which leads to confusions regarding attempts to handle that mismatch which again leads to rather meandering and confusing and way too long blog posts as the one we are seeing here.
*: I wonder how you can write such a lengthy text and not once even mention this. If you want to understand idempotency in a meaningful way then you have to reduce the scenario to a mathematical function. If you don't then you are left with a fuzzy concept and there isn't much point about philosophizing over just accepting how something is practically implemented; like this idempotency-key.
I’ve seen two separate engineers implement a “generic idempotent operation” library which used separate transactions to store the idempotency details without realizing the issues it had. That was in an organization of less than 100 engineers less than 5 years apart.
One other thing I would augment this with is Antithesis’ Definite vs Indefinite error definition (https://antithesis.com/docs/resources/reliability_glossary/#...). It helps to classify your failures in this way when considering replay behavior.
Idempotency is about state, not communication. Send the same payment twice and one of them should respond "payment already exists".
This rubs me the wrong way. It's stated as fact without any trace of evidence, it is probably false, and it seems to serve no purpose but to make struggling students feel worse (and make the author feel superior).
The user wants something + the system might fail = the user must be able to try again.
If the system does not try again, but instead parrots the text of the previous failure, why bother? You didn't build reliability into the system, you built a deliberately stale cache.
A lot little things you need to think of. For example.
Client sends a request. The database is temporarily down. The server catches the exception and records the key status as FAILED. The client retries the request (as they should for a 500 error). The server sees the key exists with status FAILED and returns the error again-forever. Effectively "burned" the key on a transient error.
others like:
- you may have Namespace Collisions for users... (data leaks) - when not using transactions only redis locking you have different set of problem - the client needs to be implmented correctly. Like client sees timout and generates a new key, and exactly once processing is broken - you may have race conditions with resource deletes - using UUID vs keys build from object attributes (different set of issues)
I mean the list can get very long with little details..
Auth, logging, and atomicity are all isolated concerns that should not affect the domain specific user contract with your API.
How you handle unique keys is going to vary by domain and tolerance-- and its probably not going to be the same in every table.
It's important to design a database schema that can work independently of your middleware layer.
From a cursory read, only the part up to "what if the second request comes while the first is running" is an idempotency problem, in which case all subsequent responses need to wait until the first one is generated.
Everything else is an atomicity issue, which is fine, let's just call it what it is.