Surprising economics of load-balanced systems

108 points - yesterday at 8:30 PM

Comments

bijowo1676 yesterday at 11:46 PM

the article offers a simplified world model: Poisson arrivals and infinite queue, which is fine as a math model.

In the real world however, the bursts can be correlated, due to factors like timeouts/retries, thundering herd, correlated bursts.

so the real economics of load-balanced system is a simple reliability story: being able to reasonably serve the peak traffic, which leads to over-provisioning of those systems.

using cloud allows some form of scale up/down of resources, but doesn't completely solve the problem. I think the migration away from synchronyous systems towards async systems and letting clients gradually absorb the delays is a better approach (rather than forcing infrastructure to be dynamically scaled up/down and be billed per request-second by your cloud provider)

mjb yesterday at 11:41 PM

A dead comment says:

> Of course, this assumes independent events. World Cup, super bowls, etc break these assumptions.

Yes, this is very true. The model here works for Poisson arrivals and exponential service time (the M/M), which are poor approximations of real-world traffic patterns (which tend to be non-stationary and non-ergodic, and include substantial seasonality). However, the frequency of that seasonality is typically rather low (e.g. daily cycles), and so these stronger assumptions are quite defensible for short time periods.

A better approach is to do simulation with real traffic patterns, or even with more sophisticated parametric models, and get better answers (e.g. https://stability-sim.systems/). The good news is that kind of simulation is cheaper to do than ever before.

Ylano today at 9:27 AM

80% utilization is not a universal statement

fabijanbajo today at 8:02 AM

The footnote on exponential vs. log-normal service times is the part I'd push on.. in production I almost never see exponential, and heavy tails change the picture. Curious if you've looked at how robust this is under realistic distributions

crypttales yesterday at 10:28 PM

Of course, this assumes independent events. World Cup, super bowls, etc break these assumptions.

Still, queuing theory is so cool.

resters today at 4:12 AM

It's not surprising if one has the mental model of the probability that the request gets enqueued. Then when you add variable time to process requests it becomes more clear why some requests can take unexpectedly long (there is a >0 probability that a request gets queued behind several of the slowest endpoints, for example). So even if 90% of the endpoints are fast and most of the requests aren't even queued, there will still be some that end up being quite slow.

megamalloc yesterday at 11:24 PM

What's conspicuously missing is the plot of performance when you do have a well tuned queue in front of the service. Yes, having a queue becomes less important the more backend servers you have, but here even with 10 servers the plot shows your latency remains >25% worse than it would be with a queue. Also missing is discussion of how the variance in processing times affects you when you rely on load balancing alone.

deleted today at 8:03 AM

nilsherzig yesterday at 10:56 PM

Why would anyone think that it would get linearly worse? What's the (wrong) assumption there?

jiggawatts today at 6:24 AM

The problem with this kind of theoretical analysis is that most load balancers don't work this way, especially the typical "cloud" HTTP or TCP load balancers, which are stateless and avoid this kind of central queuing logic like the plague because it doesn't scale to their levels.

For example, most cloud load balancers I've worked with are stateless, non-queuing, and allocate work to back-ends strictly randomly.

Traditional non-cloud load balancers can implement this kind of perfect queuing, but these settings are generally off by default even when available.

- NetScaler: surgeProtection + maxClient=1

- F5 BIG-IP LTM: request queuing + pool/member connectionLimit=1

- HAProxy: server maxconn 1 + timeout queue

- NGINX Plus: server max_conns=1 + queue

Envoy, Apache, and Traefik have partial or limited support.

Conversely, most multi-threaded web server frameworks already do this by default! For example, ASP.NET has essentially an internal "load balancer" with a perfect queue if you pretend each core is a "node" and the whole server is the "scale out system".

anchorapi today at 8:51 AM

[flagged]

ukanwat today at 5:46 AM

this whole result leans on one assumption he mentions and then sets aside: independent, stateless requests. that's really the load-bearing part. as soon as the c units share mutable state or have to coordinate, the M/M/c model stops applying and the pooling benefit goes with it. you trade it for coordination cost that grows with the number of pairs, not the number of workers. I hit this constantly with multi-agent LLM systems. people add agents expecting load-balancer style scaling and land in the opposite regime, because the work isn't independent, the agents are writing to shared state. so "pooling is cheap" is really "pooling independent work is cheap," and the independent part is where all the benefit actually comes from.

bigcat12345678 yesterday at 11:09 PM

Seemingly inconsequential article on hacker news and assume it probably is the kind of article that describes a profound idea with a naive title. And turns out it's actually very confusing as it puts overweight dramaticity over mundane intuition. Those type of writing belongs to literature sphere, not technology writing.