SSH certificates: the better SSH experience
170 points - today at 9:52 AM
SourceComments
I'm guilty of it too. My blog post from 15 years ago is nowhere near as good as OP's post, but if I though me of 15 years ago lived up to my standards of today, I'd be really disappointed: https://blog.habets.se/2011/07/OpenSSH-certificates.html
Especially at a BigCo, where there are different environments, with different passwords, and password expiry/rotation/complexity rules.
Like, when asking for help, or working together... you say to them "ok, lets ssh to devfoo1234", and they do it, and then type in their password, and maybe get it wrong, then need to reset it, or whatever... and it takes half a minute or more for them to just ssh to some host. Maybe there are several hosts involved, and it all multiplies out.
I mention to them "you know... i never use ssh passwords, i don't actually know my devfoo1234 password... maybe you should google for ssh-keygen, set it up, let me know if you have any problems?" and they're like "oh yeah, thats cool. i should do that sometime later!".... and then they never do, and they are forever messing with passwords.
I just don't get it.
With my own machines I can just physically check that the server host key matches what the ssh client sees. Once TOFU looks good I'm all set with that host because I don't change any of the keys ever.
In a no-frills corporate unix environment it's enough to have a list of the internal servers' public keys listed on an internal website, accessible via SSL so it's effectively signed by a known corporate identity. You only need to check this list once to validate carrying out TOFU after which you can trust future connections.
In settings with huge fleet of machines or in a very dynamic environment where new machines are rolled out all the time it probably makes things easier to use certificates. Of course, certificates come with some extra work and some extra features so the amount of benefit depend on the case. But at this scale TOFU is breaking down bad on multiple levels so you can't afford a strong opinion against certificates, really.
I wish web browsers could remember server TLS host keys easily too and at least notify me whenever they change even if they'd still accept the new keys via ~trusted CAs.
[1] https://www.zscaler.com/resources/security-terms-glossary/wh...
Also, I've never had a security issue due to TOFU, have you?
The system shifts from many small local states to one highly coupled control point. That control point has to be correct and reachable all the time. When it isn’t, failures go wide instead of narrow.
Example: a few boxes get popped and start hammering the CA. Now what? Access is broken everywhere at once.
Common friction points:
1. your signer that has to be up and correct all the time
2. trust roots everywhere (and drifting)
3. TTL tuning nonsense (too short = random lockouts, too long = what was the point)
4. limited on-box state makes debugging harder than it should be
5. failures tend to fan out instead of staying contained
Revocation is also kind of a lie. Just waiting for expiry and hoping that’s good enough.What actually happens is people reintroduce state anyway: sidecars, caches, agents… because you need it.
We went the opposite direction:
1. nodes pull over outbound HTTPS
2. local authorized_keys is the source of truth locally
3. users/roles are visible on the box
4. drift fixes itself quickly
5. no inbound ports, no CA signatures (WELL, not strictly true*!)
You still get central control, but operation and failure modes are local instead of "everyone is locked out right now."That’s basically what we do at Userify (https://userify.com). Less elegant than certs, more survivable at 2am. Also actually handles authz, not just part of authn.
And the part that usually gets hand-waved with SSH CAs:
1. creating the user account
2. managing sudo roles
3. deciding what happens to home directories on removal
4. cleanup vs retention for compliance/forensics
Those don’t go away - they're just not part of the certificate solution.* (TLS still exists here, just at the transport layer using the system trust store. That channel delivers users, keys, and roles. The rest is handled explicitly instead of implied.)
I've set up a couple of yubikeys as SSH CAs on hosts I manage. I use them to create short lived certs (say 24h) at the start of the day. This way i only have to enter the yubikey pin once a day.
I could not find an easy way to limit maximum certificate lifetime in openssh, except for using the AuthorizedPrincipalCommand, which feels very fragile.
Does anyone else have any experience with a similar setup? How do you limit cert max lifetime?
There really needs to be a definitive best practices guide published by a trusted authority.
Openssh supports checking the DNSSEC signature in the client, in theory, but it's a configure option and I'm not sure if distros build with it.
Is that yet another problem that I need to solve with syncthing?
This sentence is a bit of a red flag, it looks like the author is making a (subtle) mistake in the category of too much security, or at least misjudging the amount of security (objectively measurable entropy) needed. This is of course a less consequential error than too little entropy/security measures, but still if one wants to be a cybersecurity professional, especially one with influence, they must know exactly the right amount needed, because our resources are limited, and each additional bit of entropy and security step not only costs time of the admin that implements it, but of the users that have to follow it, and this can even impact security itself by fatiguing the user and causing them to circumvent measures or ignore alerts.
On to specifically what's wrong:
Either a key file or a password can be used to log in to a server or authenticate to any service in general. Besides the technical implementation, the main difference is whether the secret is stored on the device, or in the user's brain. One is not more correct than the other, there's a lot of tradeoffs, one can ensure more bits and is more ergonomic, the other is not stored on device so it cannot be compromised that way.
That said a 2FA approach, in whatever format, is (generally speaking) safer than any individual method, in that the two secrets are necessary to be granted access. In this scenario one needs both the file and the password to authenticate, even if the password is 4 digits long, that increases the security of the system when compared to no password. An attacker would have to setup a brute force attempt along with a way to verify the decryption was successful. If local decryption confirmation is not possible, then such a brute force attack would require to submit erroneous logins to the server potentially activating tripwires or alerting a monitoring admin.
There's nothing special about the second factor authorization being equal or equivalent in entropy to the first, and there's especially no requirement that a password have more entropy when it's a second authorization, in fact it's the other way around.
tl;dr You can consider each security mechanism in the wider context rather than in isolation and you will see security fatigue go down without compromising security.