SSH At Scale: CAs and Principals
If you manage Linux servers over the Internet, you use SSH to connect to them. SSH lets you have a remote shell on a host over an encrypted channel so that an attacker cannot watch what you are doing over the network. In this blog post, I’m going to talk about using SSH at scale across thousands of posts.
Phase 0: Passwords
When you get started with SSH for the first time, you likely won’t have keys set up and will instead use passwords to authenticate to your servers. It will look something like this:
email@example.com’s passsword: ********
You use SSH to connect to the server, type in your password, and you’re good to go. That’s fine for small scale, such as managing a single server, but it doesn’t come without downsides. Specifically, you won’t be able to easily use a tool such as Ansible nor do code checkins with Git.
And that’s actually a bigger problem than it sounds, because if you make it harder to use a tool, that tool will be used far less often. This can lead to things such as configuration drift due to Ansible being run less often, or giant code pushes happening once a day if Git is being run less. And giant code pushes are a particular problem, because if other engineers have written code, you’ll have to do a merge, and if a bug presents itself, you’ll now have to think back to what you did 8 hours ago, not 8 minutes ago. Having to type in a password every single time will also slow down the rate of deployment, which in turn slows down the rate of product releases. Not good.
Seriously, don’t use SSH with a password for any reason other than as a stepping step to using keys. And that brings us to…
Phase 1: SSH Keys
This is what 99% of the Internet uses. It involves creating a public/private keypair with ssh-keygen, and placing your public key on the remote host. Then, when you SSH in, the SSH server challenges your client to sign something with your private key, which is does, the signature is checked against the public key stored on the server, and if it matches, you can successfully login.
This is what a sample session might look like:
$ ssh firstname.lastname@example.org
Last login: Sat Jan 25 11:04:18 2020 from [REDACTED]
# Welcome !
This approach is good for managing up to maybe a few dozen servers in a small organization. Beyond that number, or if you are in a larger organization, things start to get cumbersome as servers are added and removed and people join and leave the company.
So how do you manage SSH user keys in such an environment? That’s a trick question–you don’t. Instead, you have to take another approach. Mainly…
Phase 2: Signed Keys
Something else that SSH can do is a chain of trust, wherein public key A can be signed by private key B, and if you have the public key of B, you can verify that signature and trust public key A.
Creating Your Own CA and Signing Keys With It
So let’s create a CA, create a key, and sign the key with the CA. We’ll start by creating the CA — which you won’t do on a server, you will do it on your own machine:
ssh-keygen -t ecdsa -C "The CA" -N "" -f ca
This will create the files ca and ca.pub, which are the private and public key respectively.
Now, create a keypair for yourself:
ssh-keygen -t ecdsa -C "My Key" -N "" -f my-key
This will create the files my-key and my-key.pub, which are your private and public keys. If we weren’t using a CA, my-key.pub is what would get copied to the server that you want to SSH into. But we’re going to do things a little differently here.
Now comes the important part, we are going to sign your key (my-key.pub) with the CA’s private key:
ssh-keygen -s ./ca -I testing-my-ca -n dmuth,splunk -V +1w -z 1 ./my-key.pub
There’s a lot going on there, and I want to briefly explain the options that were used in that command:
- -s: This is the private key that we are using to sign my-key.pub.
- -I: This is the “key identifier”, which can be any arbitrary string, and it’s used to show which key was used to sign the user’s certificate.
- -n: This is one or more “principals” that are included in the signature. I will explain this in more detail further down.
- -V: How long is the signature valid for? +1w means one week, but in a real world environment, the time could be much shorter, perhaps as short as a few hours. This would ensure that keys would have to be periodically renewed, and a compromised user key would cease to work after it has expired.
- -z: The serial number of the signed key. If there is infrastructure that handles key signings, it’s a good idea to increment this by one with each signing so that it is clear which version of a signed key is being used.
You can now view the created signed public key with this command:
ssh-keygen -L -f ./my-key-cert.pub
Type: email@example.com user certificate
Public key: ECDSA-CERT SHA256:IRxpHtLNIl1oNIVyEpNWhnkHKxQo76klbLzGFlgt8aM
Signing CA: ECDSA SHA256:sk9wvYVdg2mwqpYaMZVSc2IelgQHAVcUMQM8h12aqEc
Key ID: “testing-my-ca”
Valid: from 2020–01–25T12:49:00 to 2020–02–01T12:50:02
Go to The Principal’s Office
Let’s talk about principals in SSH, as they are a new concept. When keys are being signed, one or more principals can be specified. A principal is an arbitrary string that can allow access to a specific host, or even a specific user on a specific host.
Going by the example above, that key has two principals: “dmuth”, and “splunk”. Those could be to allow access to users by those names on 1 or more hosts, or perhaps allow access to specific hosts. It’s really up to the configuration of sshd on each host how it reacts to specific principals.
If what I just sound sounds a bit vague and hard to follow, that’s because it is! So instead of dwelling on this topic, let’s jump into sshd configuration followed by an actual demo!
Configure SSHD to Allow CA-signed Keys
Remember that CA keypair we created earlier? We will need to deploy the public key to every machine in our fleet. While this requires specific effort, it’s a one-time thing (unless you create a new CA). For this example, let’s put the key into the file /etc/ssh/ca.pub. Then, we’ll need to add these lines into /etc/ssh/sshd_conifg:
# Any key signed with this key can log in
# Tell the server where to get a list of authorized Principals for each user.
That second line tells sshd that if a key is properly signed the CA, it should then look in the /etc/ssh/auth_principals directory for a file by the name that was specified in the SSH command, and load the list of authorized principals for that user from the file, at one user per line. Consider this example:
mkdir -p /etc/ssh/auth_principals/
echo -e “splunk\ndmuth\n” > /etc/ssh/auth_principals/splunk
echo -e “dmuth\n” > /etc/ssh/auth_principals/dmuth
After setting up that configuration, the key that we signed above could ssh into the splunk user or the dmuth user of the host that it was deployed on. All the user needs is a key signed by the CA with the principals set, and they’re good to go.
Testing This Out On Your Own
I covered a lot of stuff in this post, and I know that the first time I read about SSH CAs, key signing, and principals, it was mostly over my head. To better understand how things worked, I decided to simulate my own environment in Docker. That way, I could have one container with an SSH server in its “out of the box” configuration, and another container that checked for signed certificates.
I have open sourced my series of Docker container, placed it up on GitHub, and included instructions on how to test the installation, as well as play around with it on your own. It can be found over here.
Putting It All Together
So what does this all mean? It makes sense to use CAs when you have the following sorts of setups:
- Many thousands of servers, where baking the CA’s public key into the disk image or initial setup scripts can save you from having deploy/remove SSH public keys as they change. Facebook is a great example of this.
- A large organization with lots of engineers, where you want to save yourself the effort of deploying a new SSH public key every time someone needs access to a server.
- An organization with a very strong security posture, and you need to limit the length of time that someone can log into a server without reauthenticating themselves via something like 2FA.
I hope this post made the use of SSH CAs and principals clearer, or at least less unclear.
Do you have any thoughts on how to use CAs in SSH or how you use them in your organization? Let me know in the comments!