SSH certificates: the better SSH experience

170 points - today at 9:52 AM

Comments

thomashabets2 today at 1:10 PM

Every couple of months someone re-discovers SSH certificates, and blogs about them.

I'm guilty of it too. My blog post from 15 years ago is nowhere near as good as OP's post, but if I though me of 15 years ago lived up to my standards of today, I'd be really disappointed: https://blog.habets.se/2011/07/OpenSSH-certificates.html

tacostakohashi today at 6:49 PM

One constant source of amazement for me is people not using ssh keys / using passwords with ssh.

Especially at a BigCo, where there are different environments, with different passwords, and password expiry/rotation/complexity rules.

Like, when asking for help, or working together... you say to them "ok, lets ssh to devfoo1234", and they do it, and then type in their password, and maybe get it wrong, then need to reset it, or whatever... and it takes half a minute or more for them to just ssh to some host. Maybe there are several hosts involved, and it all multiplies out.

I mention to them "you know... i never use ssh passwords, i don't actually know my devfoo1234 password... maybe you should google for ssh-keygen, set it up, let me know if you have any problems?" and they're like "oh yeah, thats cool. i should do that sometime later!".... and then they never do, and they are forever messing with passwords.

I just don't get it.

yason today at 6:03 PM

Despite the drawbacks of its grassroot nature TOFU goes a looooong way.

With my own machines I can just physically check that the server host key matches what the ssh client sees. Once TOFU looks good I'm all set with that host because I don't change any of the keys ever.

In a no-frills corporate unix environment it's enough to have a list of the internal servers' public keys listed on an internal website, accessible via SSL so it's effectively signed by a known corporate identity. You only need to check this list once to validate carrying out TOFU after which you can trust future connections.

In settings with huge fleet of machines or in a very dynamic environment where new machines are rolled out all the time it probably makes things easier to use certificates. Of course, certificates come with some extra work and some extra features so the amount of benefit depend on the case. But at this scale TOFU is breaking down bad on multiple levels so you can't afford a strong opinion against certificates, really.

I wish web browsers could remember server TLS host keys easily too and at least notify me whenever they change even if they'd still accept the new keys via ~trusted CAs.

aquafox today at 5:53 PM

I work in a corporate setting and the money and time we wasted because of Zscaler and its SSL inspection [1] is beyond your wildest imagination. Whenever I see a "SSL certificate problem: self-signed certificate in certificate chain" error, I know I'm in trouble.

[1] https://www.zscaler.com/resources/security-terms-glossary/wh...

Tepix today at 2:07 PM

The author lists all the advantes of CA certificates, yet doesn't list the disadvantages. OTOH, all the many steps required to set it up make the disadvantages rather obvious.

Also, I've never had a security issue due to TOFU, have you?

linsomniac today at 1:54 PM

In our dev/stg environment we reinstall half our machines every morning (largely to test our machine setup automation), and SSH host certificates make that so much nicer than having to persist host keys or remove/replace them in known_hosts. Highly recommended.

gnufx today at 6:45 PM

Life is easier if you can use Kerberos SSO, i.e. GSSAPIAuthentication in OpenSSH. (If we're talking certificates, presumably it is OpenSSH, or does anything else implement them?)

jamiesonbecker today at 4:26 PM

SSH certs quietly hurt in prod. Short-lived creds + centralized CA just moves complexity upward without solving the core problem: user management.

The system shifts from many small local states to one highly coupled control point. That control point has to be correct and reachable all the time. When it isn’t, failures go wide instead of narrow.

Example: a few boxes get popped and start hammering the CA. Now what? Access is broken everywhere at once.

Common friction points:

     1. your signer that has to be up and correct all the time
     2. trust roots everywhere (and drifting)
     3. TTL tuning nonsense (too short = random lockouts, too long = what was the point)
     4. limited on-box state makes debugging harder than it should be
     5. failures tend to fan out instead of staying contained

Revocation is also kind of a lie. Just waiting for expiry and hoping that’s good enough.

What actually happens is people reintroduce state anyway: sidecars, caches, agents… because you need it.

We went the opposite direction:

     1. nodes pull over outbound HTTPS
     2. local authorized_keys is the source of truth locally
     3. users/roles are visible on the box
     4. drift fixes itself quickly
     5. no inbound ports, no CA signatures (WELL, not strictly true*!)

You still get central control, but operation and failure modes are local instead of "everyone is locked out right now."

That’s basically what we do at Userify (https://userify.com). Less elegant than certs, more survivable at 2am. Also actually handles authz, not just part of authn.

And the part that usually gets hand-waved with SSH CAs:

     1. creating the user account
     2. managing sudo roles
     3. deciding what happens to home directories on removal
     4. cleanup vs retention for compliance/forensics

Those don’t go away - they're just not part of the certificate solution.

* (TLS still exists here, just at the transport layer using the system trust store. That channel delivers users, keys, and roles. The rest is handled explicitly instead of implied.)

bobo56539 today at 3:03 PM

With the recent wave of npm hacks stealing private keys, I wanted to limit key's lifetimes.

I've set up a couple of yubikeys as SSH CAs on hosts I manage. I use them to create short lived certs (say 24h) at the start of the day. This way i only have to enter the yubikey pin once a day.

I could not find an easy way to limit maximum certificate lifetime in openssh, except for using the AuthorizedPrincipalCommand, which feels very fragile.

Does anyone else have any experience with a similar setup? How do you limit cert max lifetime?

longislandguido today at 5:06 PM

This discussion is full of schizo solutions to "secure" SSH, most of which make no practical sense or have no technical basis.

There really needs to be a definitive best practices guide published by a trusted authority.

gunapologist99 today at 3:54 PM

Anyone tried out Userify? It creates/removes ssh pubkeys locally so (like a CA) no authn server needs to be online. But unlike certs, active sessions and processes are terminated when the user access is revoked.

Thom2000 today at 1:30 PM

Sadly services such as Github don't support these so it's mostly good for internal infrastructure.

jcalvinowens today at 2:46 PM

You can also address TOFU to some extent using SSHFP DNS records.

Openssh supports checking the DNSSEC signature in the client, in theory, but it's a configure option and I'm not sure if distros build with it.

moviuro today at 3:40 PM

All those articles about SSH certificates fall short of explaining how the revocation list can/should be published.

Is that yet another problem that I need to solve with syncthing?

https://man.openbsd.org/ssh-keygen.1#KEY_REVOCATION_LISTS

sqbic today at 4:22 PM

I've had very good experiences with SSH Communication Security company's (the guys who invented SSH) PrivX product to manage secure remote access, including SSH certificates and also cert based Windows authentication. It supports other kinds of remote targets too, via webui or with native clients. Great product.

TZubiri today at 5:29 PM

>then I don’t need to type the target user’s password; instead I enter the key’s passphrase, a hopefully much more complicated combination of words, to unlock the private key.

This sentence is a bit of a red flag, it looks like the author is making a (subtle) mistake in the category of too much security, or at least misjudging the amount of security (objectively measurable entropy) needed. This is of course a less consequential error than too little entropy/security measures, but still if one wants to be a cybersecurity professional, especially one with influence, they must know exactly the right amount needed, because our resources are limited, and each additional bit of entropy and security step not only costs time of the admin that implements it, but of the users that have to follow it, and this can even impact security itself by fatiguing the user and causing them to circumvent measures or ignore alerts.

On to specifically what's wrong:

Either a key file or a password can be used to log in to a server or authenticate to any service in general. Besides the technical implementation, the main difference is whether the secret is stored on the device, or in the user's brain. One is not more correct than the other, there's a lot of tradeoffs, one can ensure more bits and is more ergonomic, the other is not stored on device so it cannot be compromised that way.

That said a 2FA approach, in whatever format, is (generally speaking) safer than any individual method, in that the two secrets are necessary to be granted access. In this scenario one needs both the file and the password to authenticate, even if the password is 4 digits long, that increases the security of the system when compared to no password. An attacker would have to setup a brute force attempt along with a way to verify the decryption was successful. If local decryption confirmation is not possible, then such a brute force attack would require to submit erroneous logins to the server potentially activating tripwires or alerting a monitoring admin.

There's nothing special about the second factor authorization being equal or equivalent in entropy to the first, and there's especially no requirement that a password have more entropy when it's a second authorization, in fact it's the other way around.

tl;dr You can consider each security mechanism in the wider context rather than in isolation and you will see security fatigue go down without compromising security.

Serhii-Set today at 5:24 PM

[dead]