Surprising economics of load-balanced systems
108 points - yesterday at 8:30 PM
SourceComments
In the real world however, the bursts can be correlated, due to factors like timeouts/retries, thundering herd, correlated bursts.
so the real economics of load-balanced system is a simple reliability story: being able to reasonably serve the peak traffic, which leads to over-provisioning of those systems.
using cloud allows some form of scale up/down of resources, but doesn't completely solve the problem. I think the migration away from synchronyous systems towards async systems and letting clients gradually absorb the delays is a better approach (rather than forcing infrastructure to be dynamically scaled up/down and be billed per request-second by your cloud provider)
> Of course, this assumes independent events. World Cup, super bowls, etc break these assumptions.
Yes, this is very true. The model here works for Poisson arrivals and exponential service time (the M/M), which are poor approximations of real-world traffic patterns (which tend to be non-stationary and non-ergodic, and include substantial seasonality). However, the frequency of that seasonality is typically rather low (e.g. daily cycles), and so these stronger assumptions are quite defensible for short time periods.
A better approach is to do simulation with real traffic patterns, or even with more sophisticated parametric models, and get better answers (e.g. https://stability-sim.systems/). The good news is that kind of simulation is cheaper to do than ever before.
Still, queuing theory is so cool.
For example, most cloud load balancers I've worked with are stateless, non-queuing, and allocate work to back-ends strictly randomly.
Traditional non-cloud load balancers can implement this kind of perfect queuing, but these settings are generally off by default even when available.
- NetScaler: surgeProtection + maxClient=1
- F5 BIG-IP LTM: request queuing + pool/member connectionLimit=1
- HAProxy: server maxconn 1 + timeout queue
- NGINX Plus: server max_conns=1 + queue
Envoy, Apache, and Traefik have partial or limited support.
Conversely, most multi-threaded web server frameworks already do this by default! For example, ASP.NET has essentially an internal "load balancer" with a perfect queue if you pretend each core is a "node" and the whole server is the "scale out system".