Homebrew SSH Certificate Authority for the Terraform Libvirt Provider

Ever SSH’ed into a freshly installed server and gotten the following annoying message?

The authenticity of host 'host.tld (1.2.3.4)' can't be established.
ED25519 key fingerprint is SHA256:eUXGdm1YdsMAS7vkdx6dOJdOGHdem5gQp4tadCfdLB8.
Are you sure you want to continue connecting (yes/no)?

Or even more annoying:

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
The fingerprint for the ED25519 key sent by the remote host is
SHA256:eUXGdm1YdsMAS7vkdx6dOJdOGHdem5gQp4tadCfdLB8.
Please contact your system administrator.
Add correct host key in /home/user/.ssh/known_hosts to get rid of this message.
Offending ED25519 key in /home/user/.ssh/known_hosts:3
  remove with:
  ssh-keygen -f "/etc/ssh/ssh_known_hosts" -R "1.2.3.4"
ED25519 host key for 1.2.3.4 has changed and you have requested strict checking.
Host key verification failed.

Could it be that the programmers at OpenSSH simply like to annoy us with these confusing messages? Maybe, but these warnings also serve as a way to notify users of a potential Man-in-the-Middle (MITM) attack. I won’t go into the details of this problem, but I refer you to this excellent blog post. Instead, I would like to talk about ways to solve these annoying warnings.

One obvious solution is simply to add each host to your known_hosts file. This works okay when managing a handful of servers, but becomes unbearable when managing many servers. In my case, I wanted to quickly spin up virtual machines using Duncan Mac-Vicar’s Terraform Libvirt provider, without having to accept their host key before connecting. The solution? Issuing SSH host certificates using an SSH certificate authority.

SSH Certificate Authorities vs. the Web

The idea of an SSH certificate authority (CA) is quite easy to grasp, if you understand the web’s Public Key Infrastructure (PKI). Just like with the web, a trusted party can issue certificates that are offered when establishing a connection. The idea is, just by trusting the trusted party, you trust every certificate they issue. In the case of the web’s PKI, this trusted party is bundled and trusted by your browser or operating system. However, in the case of SSH, the trusted party is you! (Okay you can also trust your own web certificate authority) With this great power, comes great responsibility which we will abuse heavily in this article.

SSH Certificate Authority for Terraform

So, let’s start with a plan. I want to spawn virtual machines with Terraform which which are automatically provisioned with a SSH host certificate issued by my CA. This CA will be another host on my private network, issuing certificates over SSH.

Fetching the SSH Host Certificate

First we generate an SSH key pair in Terraform. Below is the code for that:

resource "tls_private_key" "debian" {
  algorithm = "ED25519"
}

data "tls_public_key" "debian" {
  private_key_pem = tls_private_key.debian.private_key_pem
}

Now that we have an SSH key pair, we need to somehow make Terraform communicate this with the CA. Lucky for us, there is a way for Terraform to execute an arbitrary command with the external data feature. We call this script below:

data "external" "cert" {
  program = ["bash", "${path.module}/get_cert.sh"]

  query = {
    pubkey   = trimspace(data.tls_public_key.debian.public_key_openssh)
    host     = var.name
    cahost   = var.ca_host
    cascript = var.ca_script
    cakey    = var.ca_key
  }
}

These query parameters will end up in the script’s stdin in JSON format. We can then read these parameters, and send them to the CA over SSH. The result must as well be in JSON format.

#!/bin/bash
set -euo pipefail
IFS=$'\n\t'

# Read the query parameters
eval "$(jq -r '@sh "PUBKEY=\(.pubkey) HOST=\(.host) CAHOST=\(.cahost) CASCRIPT=\(.cascript) CAKEY=\(.cakey)"')"

# Fetch certificate from the CA
# Warning: extremely ugly code that I am to lazy to fix
CERT=$(ssh -o ConnectTimeout=3 -o ConnectionAttempts=1 root@$CAHOST '"'"$CASCRIPT"'" host "'"$CAKEY"'" "'"$PUBKEY"'" "'"$HOST"'".dmz')

jq -n --arg cert "$CERT" '{"cert":$cert}'

We see that a script is called on the remote host that issues the certificate. This is just a simple wrapper around ssh-keygen, which you can see below.

#!/bin/bash
set -euo pipefail
IFS=$'\n\t'

host() {
	CAKEY="$2"
	PUBKEY="$3"
	HOST="$4"

	echo "$PUBKEY" > /root/ca/"$HOST".pub
	ssh-keygen -h -s /root/ca/keys/"$CAKEY" -I "$HOST" -n "$HOST" /root/ca/"$HOST".pub
	cat /root/ca/"$HOST"-cert.pub
	rm /root/ca/"$HOST"*.pub
}

"$1" "$@"

Appeasing the Terraform Gods

So nice, we can fetch the SSH host certificate from the CA. We should just be able to use it right? We can, but it brings a big annoyance with it: Terraform will fetch a new certificate every time it is run. This is because the external feature of Terraform is a data source. If we were to use this data source for a Terraform resource, it would need to be updated every time we run Terraform. I have not been able to find a way to avoid fetching the certificate every time, except for writing my own resource provider which I’d rather not. I have, however, found a way to hack around the issue.

The idea is as follows: we can use Terraform’s ignore_changes to, well, ignore any changes of a resource. Unfortunately, we cannot use this for a data source, so we must create a glue null_resource that supports ignore_changes. This is shown in the code snipppet below. We use the triggers property simply to copy the certificate in; we don’t use it for it’s original purpose.

resource "null_resource" "cert" {
  triggers = {
    cert = data.external.cert.result["cert"]
  }

  lifecycle {
    ignore_changes = [
      triggers
    ]
  }
}

And voilà, we can now use null_resource.cert.triggers["cert"] as our certificate, that won’t trigger replacements in Terraform.

Setting the Host Certificate with Cloud-Init

Terraform’s Libvirt provider has native support for Cloud-Init, which is very handy. We can give the host certificate directly to Cloud-Init and place it on the virtual machine. Inside the Cloud-Init configuration, we can set the ssh_keys property to do this:

ssh_keys:
  ed25519_private: |
    ${indent(4, private_key)}
  ed25519_certificate: "${host_cert}"

I hardcoded this to ED25519 keys, because this is all I use.

This works perfectly, and I never have to accept host certificates from virtual machines again.

Caveats

A sharp eye might have noticed the lifecycle of these host certificates is severely lacking. Namely, the deployed host certificates have no expiration date nore is there revocation function. There are ways to implement these, but for my home lab I did not deem this necessary at this point. In a more professional environment, I would suggest using Hashicorp’s Vault.

This project did teach me about the limits and flexibility of Terraform, so all in all a success! All code can be found on the git repository here.