FastCGI: 30 years old and still the better protocol for reverse proxies
360 points - yesterday at 4:16 PM
SourceComments
Though I'd like to make another protocol known: Web Application Socket (WAS). I designed it 16 years ago at my dayjob because I thought FastCGI still wasn't good enough.
Instead of packing bulk data inside frames on the main socket, WAS has a control socket plus two pipes (raw request+response body). Both the WAS application and the web server can use splice() to operate on a pipe, for example. No framing needed. Also, requests are cancellable and the three file descriptors can always be recovered.
Over the years, we used WAS for many of our internal applications, and for our web hosting environment, I even wrote a PHP SAPI for WAS. Quite a large number of web sites operate with WAS internally.
It's all open source:
- library: https://github.com/CM4all/libwas - documentation: https://libwas.readthedocs.io/en/latest/ - non-blocking library: https://github.com/CM4all/libcommon/tree/master/src/was/asyn... - our web server: https://github.com/CM4all/beng-proxy - WebDAV: https://github.com/CM4all/davos - PHP fork with WAS SAPI: https://github.com/CM4all/php-src
I remember the great FastCGI vs. SCGI vs. HTTP wars: I was founding a Web2.0 startup right at the time these technologies were gaining adoption, and so was responsible for setting up the frontend stack. HTTP won because of simplicity: instead of needing to introduce another protocol into your stack, you can just use HTTP, which you already needed to handle at the gateway. Now all sorts of complex network topologies became trivial: you could introduce multiple levels of reverse proxies if you ran out of capacity; you could have servers that specialized in authentication or session management or SSL termination or DDoS filtering or all the other cross-cutting concerns without them needing to know their position in the request chain; and you could use the same application servers for development, with a direct HTTP connection, as you did in production, where they'd sit behind a reverse proxy that handled SSL and authentication and abuse detection.
It also helped that nginx was lots faster than most FastCGI/SCGI modules of the time, and more robust. I'd initially setup my startup's stack as HTTP -> Lighttpd -> FastCGI -> Django, but it was way slower than just using nginx.
The use of HTTP was basically the web equivalent of the End-to-End Principle [1] for TCP/IP. It's the idea that the network and its protocols should be agnostic to what's being transmitted, and all application logic should be in nodes of the network that filter and redirect packets accordingly. This has been a very powerful principle and shouldn't be discarded lightly.
The observation the article makes is that for security, it's often better to follow the Principle of Least Privilege [2] rather than blindly passing information along. Allowlist your communications to only what you expect, so that you aren't unwittingly contributing to a compromise elsewhere in the network.
And the article is highlighting - not explicitly, but it's there - the tension between these two principles. E2E gives you flexibility, but with flexibility comes the potential for someone to use that flexibility to cause harm. PoLP gives you security, but at the cost of inflexibility, where your system can only do what you designed it to do and cannot easily adapt to new requirements.
[1] https://en.wikipedia.org/wiki/End-to-end_principle
[2] https://en.wikipedia.org/wiki/Principle_of_least_privilege
With widespread browser support for WHATWG streams, it's pretty easy to implement your own WebSockets over long-lived HTTP requests. Basically you just send a byte stream and prepend each message with a header, which can just be a size in many cases.
Advantages over WebSockets:
* No special path in your server layer like you need for WebSocket.
* Backpressure
* You get to take advantage of HTTP/2/3 improvements for free
* Lower framing overhead
Unfortunately AFAIK it's still not supported to still be streaming your request body while receiving the response, so you need a pair of requests for full bidirectional streaming.
Or you could use something like haproxy's proxy protocol (although that may not support all the information you want, and doesn't work for multiplexing).
Edit: actually the "Forwarded" header kind of fills that niche. Although you may want extensions for things like the client certificate.
It is a tiny binary protocol, with frames just as FastCGI. The reference server works with several languages, I've used it over the years mostly with Python but also Ruby and Perl. It is a small C executable with all the practical features one need for web hosting: Draining backends, autoscaling, logging, chrooted backends, everything.
Very few FastCGI servers are this mature. Unlike FastCGI, it has been extended to support websockets and async.
I have used it in production at several places for many years and have nothing but praise for it. It feels like this weird unknown secret for web operations. Unfortunately, it sees lesser use now in the cloud era, and development seems to have all but stopped. It still works and is still reliable but the writing is probably on the wall. However nothing comes close in terms of speed, simplicity, and features.
The scenario is we have our first party task lists and data viewers, but often users want to highly customize it. Say build a Kanban view or a custom dashboard with data filters and charts.
The box has a coding agent which means the user can code anything vs us building traditional report builder tools.
Goās stdlib has good support on both the server side and user space. The coding agent makes a page-name/main.go that talks CGI and the server delegates requests to it.
Itās all āperson scaleā data and page views so no real need to optimize with fast CGI even.
Whatās old is new again for agents!
I don't know if anything else in the RHEL distributions use FastCGI.
$ rpm -qi php-fpm | grep ^Summary
Summary : PHP FastCGI Process ManagerMost of the stuff I've done for reverse proxies has been pretty straightforward and just using the stuff built into Nginx, but I have to admit that it wouldn't have even occurred to me to use FastCGI if I needed something more elaborate.
I used FastCGI a bit about ten years ago to "convert" some C++ code I wrote to work on the web, but admittedly I haven't used it much since then.
Using fastcgi requires you write your app to serve fastcgi.
The upside of serving http/1.1 instead of fastcgi is that devs can instantly use their browser to test things instead of having to setup a reverse proxy on their machine.
The bad parts of http/1.1 are fixed equally well by both http/2.0 and fastcgi. So just use http/2.0 and you get the proper framing as well as browser support.
Can we just take a moment to appreciate the absurdity of HTTP headers for a moment? We have X-Forwarded-For, X-Real-IP, each CDN has their own custom flavored one. Some of them are a comma-separated list, and usually ends up having an IP of your own LB uselessly added in there (I know why, it's just not helpful). All of them might be inserted by a malicious user-agent. I guess nobody could agree on how all the various trusted servers in the pipeline should convey the important bit.
I guess it fits in quite well with the absurdity of the User-Agent header, which has come so far in absurdity that Apple decided to fully kill it by just sending utterly fake nonsense (false OS version, etc) in the name of "pRiVaCy."
It is less expressive than HTTP in ways that may or may not be important to your application; I prefer accurate URL handling.
I am doing a typical http thing, but I wonder, has anyone used fastcgi in Caddy?
https://caddyserver.com/docs/caddyfile/directives/reverse_pr...