What happens when you got an unsecured database lying around somewhere? Bots will attack it. Everyone’s got to enable TLS to secure their internet-facing services. There’s simply no excuse now that you can buy them for cheap and with crypto, and even for free (Let’s Encrypt).
But for some reason, you go to these cloud frameworks of the future like Kubernetes, and securing all the infrastructure is like trying to break ice with a screwdriver! I will probably never understand why enabling TLS on services inside pods is as hard as running the marathon. But the best I can do is make sure you don’t have to go through the torment I did trying to make this work.
Our victim test subject in this blog will be PostgreSQL, but this will work with most other K8s apps out there. This cluster will be just a bunch or servers, so sorry, it looks like we’re stuck with the local
storage class with all its flaws for now.
First, We Must Pass Through Hell
A word of warning before we begin, because this is important to mention. Do not reboot your Kubernetes nodes at any point while your cluster is running.
If you do that, then your pods and cluster storage on that particular node will be wiped out, and then you have to re-create them all over again. Trust me, you do not want to experience that agony of figuring out how to restore your cluster. If you’ve got only one control plane, you can’t.
And if your cluster is just a bunch of separate servers on the interwebs joined together like mine, then it is guaranteed that you will only have one control plane. There is no point in trying to create more, because you need to buy a hardware load balancer for that.
Don’t even try to be smart and attempt to configure HAProxy or some other software load balancer to distribute traffic across your WAN. This includes the public internet. You’re just wasting your time and possibly money on more servers.
Let that sink into your head before reading. And whatever you do, don’t run worker nodes on Windows, OK? That’s literally just asking for trouble.
Our Quest To Enable TLS
Our journey begins with Helm – not to be confused with Help, the despair DevOps people feel when their software is fighting against them. We are clearly not masochists to write long YAML files for every last app we want to deploy, now are we?
I am going to assume your cluster is up and running, maybe has a few nodes joined to it too, and has kubectl
installed. And if you’be been doing funny stuff to your cluster and it’s only half-working, then go back and read the above section.
After installing Helm, we will add the repository of surprisingly one of the most hassle-free artifacts in this blog: Bitnami. This basically makes it easy to deploy stuff to Kubernetes – on paper – and then enable TLS on them.
Adding repos in Helm is similar to adding them in package managers. You just run this command:
helm repo add bitnami https://charts.bitnami.com/bitnami
And that’s it. Just like that, you can set up a wealth of packages that would’ve otherwise taken ages with a package manager. Pretty cool, huh? If only enabling TLS on all these services was that easy!
How Not To Enable TLS
First, let me show you how not to enable TLS on your pods. I’m going to go through all the ways your deployment can go wrong if you try these approaches.
Failure Method 1: Read The Docs
Of course, nothing could go wrong with following the official instructions, right? If they didn’t work, why would Bitnami post them there, now think about that, will ya?
At least my cluster did not implode after following these instructions (thank goodness). But I did end up with a completely non-functioning PostgreSQL service. PostgreSQL could not verify the certificate I fed it.
This is not some self-signed certificate, mind you. This is a certificate bought from a proper Certificate Authority. If you’ve read the TLS certificate tutorial, you will know that I have an AlphaSSL cert from GlobalSign.
It makes you wonder what good are certificates if most stuff doesn’t work with them, doesn’t it?
For reference, these are the options to enable TLS that Bitnami tells you to add to the helm install
command:
tls.certificatesSecret
– Name of an existing secret that contains the certificatestls.certFilename
– Certificate filenametls.certKeyFilename
– Certificate key filenametls.certCAFilename
– CA Certificate filename
This stuff will not work. Here’s the log output that you will get if you attempt to set these parameters:
[root@api1 ~]# kubectl logs postgres-notatether-api-postgresql-0postgresql 13:05:55.03 postgresql 13:05:55.03 Welcome to the Bitnami postgresql container postgresql 13:05:55.03 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-postgresql postgresql 13:05:55.03 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-postgresql/issues postgresql 13:05:55.03 postgresql 13:05:55.04 DEBUG ==> Configuring libnss_wrapper... postgresql 13:05:55.05 INFO ==> ** Starting PostgreSQL setup ** postgresql 13:05:55.07 INFO ==> Validating settings in POSTGRESQL_* env vars.. postgresql 13:05:55.08 INFO ==> Loading custom pre-init scripts... postgresql 13:05:55.08 INFO ==> Initializing PostgreSQL database... postgresql 13:05:55.08 DEBUG ==> Ensuring expected directories/files exist... postgresql 13:05:55.09 INFO ==> pg_hba.conf file not detected. Generating it... postgresql 13:05:55.09 INFO ==> Generating local authentication configuration postgresql 13:05:55.10 INFO ==> Deploying PostgreSQL with persisted data... postgresql 13:05:55.11 INFO ==> Configuring replication parameters postgresql 13:05:55.13 INFO ==> Configuring fsync postgresql 13:05:55.14 INFO ==> Configuring TLS chmod: changing permissions of '/opt/bitnami/postgresql/certs/notatether.key': Read-only file system postgresql 13:05:55.14 WARN ==> Could not set compulsory permissions (600) on file /opt/bitnami/postgresql/certs/notatether.key postgresql 13:05:55.16 INFO ==> Configuring synchronous_replication postgresql 13:05:55.17 INFO ==> Enabling TLS Client authentication postgresql 13:05:55.21 INFO ==> Loading custom scripts... postgresql 13:05:55.21 INFO ==> Enabling remote connections postgresql 13:05:55.23 INFO ==> ** PostgreSQL setup finished! ** postgresql 13:05:55.25 INFO ==> ** Starting PostgreSQL ** 2022-03-26 13:05:55.265 GMT [1] LOG: pgaudit extension initialized 2022-03-26 13:05:55.271 GMT [1] LOG: starting PostgreSQL 14.2 on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit 2022-03-26 13:05:55.271 GMT [1] LOG: listening on IPv4 address "0.0.0.0", port 5432 2022-03-26 13:05:55.271 GMT [1] LOG: listening on IPv6 address "::", port 5432 2022-03-26 13:05:55.272 GMT [1] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432" 2022-03-26 13:05:55.274 GMT [108] LOG: database system was shut down at 2022-03-26 07:47:22 GMT 2022-03-26 13:05:55.277 GMT [1] LOG: database system is ready to accept connections 2022-03-26 13:06:03.221 GMT [121] LOG: could not accept SSL connection: certificate verify failed 2022-03-26 13:06:13.197 GMT [128] LOG: could not accept SSL connection: certificate verify failed 2022-03-26 13:06:23.207 GMT [135] LOG: could not accept SSL connection: certificate verify failed 2022-03-26 13:06:33.212 GMT [142] LOG: could not accept SSL connection: certificate verify failed 2022-03-26 13:06:33.267 GMT [149] LOG: could not accept SSL connection: certificate verify failed 2022-03-26 13:06:43.190 GMT [156] LOG: could not accept SSL connection: certificate verify failed 2022-03-26 13:06:43.231 GMT [163] LOG: could not accept SSL connection: certificate verify failed 2022-03-26 13:06:53.207 GMT [170] LOG: could not accept SSL connection: certificate verify failed 2022-03-26 13:06:53.263 GMT [177] LOG: could not accept SSL connection: certificate verify failed
And then the pod will restart itself forever in an attempt to get different results without changing anything. It’s the classic definition of insanity (Einstein would be proud).
Fanatic OpenSSL Verification After Enabling TLS
At this point, you must be wondering, why does it do this? Well that’s because the OpenSSL library verifying this certificate doesn’t trust anything by default. It doesn’t even trust root certificates – it calls them unsigned. Heck, it doesn’t even bundle the root CA certificates with itself.
So the result is even if you buy a fancy shmancy certificate and make a pod use it, it will not be verified. Maybe the situation is different with a traditional install, but those are hard to move around and upgrade. That’s the whole reason why we are using pods in the first place. Scalability is obviously a benefit as well but you don’t get that with the local storage class, remember?
We are going to take a slight detour from enabling TLS to verifying X.509 certificates with OpenSSL. You have to see how picky it is to understand the verification process. In this experiment:
notatether.crt
– my certificategsalphasha2g2r1.crt
– intermediate certificateGlobalSign_Root_CA.crt
– root certificate
# Naive attempt with just my cert $ openssl verify notatether.crt CN = *.notatether.com error 20 at 0 depth lookup: unable to get local issuer certificate error notatether.crt: verification failed # Another naive attempt with my intermediate as the CAfile $ openssl verify -CAfile gsalphasha2g2r1.crt notatether.crt C = BE, O = GlobalSign nv-sa, CN = AlphaSSL CA - SHA256 - G2 error 2 at 1 depth lookup: unable to get issuer certificate error notatether.crt: verification failed # Root certificate as the CAfile $ openssl verify -CAfile /usr/share/ca-certificates/mozilla/GlobalSign_Root_CA.crt notatether.crt CN = *.notatether.com error 20 at 0 depth lookup: unable to get local issuer certificate error notatether.crt: verification failed # Both my intermediate and my root as the CAfile $ openssl verify -CAfile gsalphasha2g2r1.crt -CAfile /usr/share/ca-certificates/mozilla/GlobalSign_Root_CA.crt notatether.crt CN = *.notatether.com error 20 at 0 depth lookup: unable to get local issuer certificate error notatether.crt: verification failed # Intermediate as the CA file and root cert before my cert # (I saw this on a blog) $ openssl verify -CAfile gsalphasha2g2r1.crt /usr/share/ca-certificates/mozilla/GlobalSign_Root_CA.crt notatether.crt C = BE, O = GlobalSign nv-sa, OU = Root CA, CN = GlobalSign Root CA error 18 at 0 depth lookup: self signed certificate error /usr/share/ca-certificates/mozilla/GlobalSign_Root_CA.crt: verification failed C = BE, O = GlobalSign nv-sa, CN = AlphaSSL CA - SHA256 - G2 error 2 at 1 depth lookup: unable to get issuer certificate error notatether.crt: verification failed # Intermediate as the CAfile and root cert untrusted $ openssl verify -CAfile gsalphasha2g2r1.crt -untrusted /usr/share/ca-certificates/mozilla/GlobalSign_Root_CA.crt notatether.crt C = BE, O = GlobalSign nv-sa, CN = AlphaSSL CA - SHA256 - G2 error 2 at 1 depth lookup: unable to get issuer certificate error notatether.crt: verification failed # Root as the CAfile and intermediate cert untrusted $ openssl verify -CAfile /usr/share/ca-certificates/mozilla/GlobalSign_Root_CA.crt -untrusted gsalphasha2g2r1.crt notatether.crt notatether.crt: OK
After quite a bit of desperation, I managed to get it to work. I had to place the root certificate as the CAfile, and mark my intermediate cert as untrusted.
I had even tried at one point to shove the root cert next to my cert in the verification process. That didn’t work out because then it failed to verify the root certificate and thought it was self-signed.
I did not even have the intermediate cert at first, so I had to find a way to get it. I discovered that the URL is embedded inside my own certificate:
$ openssl x509 -text -in notatether.crt Certificate: Data: ... Issuer: C = BE, O = GlobalSign nv-sa, CN = AlphaSSL CA - SHA256 - G2 ... X509v3 extensions: X509v3 Key Usage: critical Digital Signature, Key Encipherment Authority Information Access: CA Issuers - URI:http://secure.globalsign.com/cacert/gsalphasha2g2r1.crt OCSP - URI:http://ocsp2.globalsign.com/gsalphasha2g2 # Downloading the "CA Issuers" URI using wget $ wget http://secure.globalsign.com/cacert/gsalphasha2g2r1.crt # Encode the certificate in Base64 PEM format because it's in DER format $ openssl x509 -inform der -in ~/gsalphasha2g2r1.crt -out ~/gsalphasha2g2r1.crt
Kubernetes does not make this easy
However, there is no guarantee that Bitnami charts or or the Kubernetes pods will do any of this for you. In fact, the only reason I had the root certificate in the first place was because this was on another box that had Firefox installed on it (/usr/share/ca-certs/mozilla
).
Now technically this would mean that I pass the root certificate in the CAfile parameter, but… there’s no place for the intermediate cert. We saw what happened in the verification process when you put two CAfiles for verification. And leaving the intermediate out will also cause it to fail. Bitnami Helm charts do not have options for untrusted, intermediate certificates. That means you are effectively stuck in a grim “verification failed” loop.
Failure Method 2: Make Traffic Pass Through stunnel
Stunnel is a handy program that can encrypt traffic that’s being sent to any service were not designed for enabling TLS. It runs as a systemd
service. I have used it in the past to enable TLS for many traditional programs. But microservices are a different beast all together, and stunnel would not avail with that setup.
The way this will work is that we create the pod, but with no TLS. Then, we will expose a service’s IP and port to a Localhost port. Finally, we listen on the real port and interface we want to get traffic from, and send the traffic to the Localhost port.
It sounds complex, doesn’t it? That’s why it was doomed for failure.
How to expose K8s service ports
There are two well-known ways to expose a Kubernetes service IP and port to your localhost
interface. Kuberentes recommends the first one, but I find the second one to be more durable.
The recommended way to expose a port is to use kubectl port-forward
like this:
kubectl port-forward --namespace default svc/postgres-notatether-api-postgresql 15432:5432
This only really works for Localhost by the way. It becomes really messy when you want to make it listen on 0.0.0.0
– all interfaces. That’s why I prefer the second method of exposing ports:
socat tcp-listen:15432,bind=127.0.0.1,reuseaddr,fork tcp:$(kubectl describe svc/postgres-notatether-api-postgresql | grep IP: | awk '{print $2;}'):5432
This uses socat to listen on localhost, port 15432, and forward all traffic to the pod’s IP address at port 5432. To make it listen on just localhost, I use bind=127.0.0.1
. If I wanted to make it listen on all interfaces instead, I would remove this part (and the comma before it).
You may have to install socat and stunnel from your package manager first. On Ubuntu the stunnel package is called stunnel4
. Also, both of these programs have to run in the background, so you need to open another terminal and run one of them there.
Setting up stunnel
Next we need to create our stunnel configuration file. And the reason why we need a second relay to send traffic to the first relay is because kubectl port-forward
and socat
don’t encrypt traffic at all, and we can’t just place the service IP address inside the stunnel
config file because it keeps changing. That is, unless you are using load balancers. And if you are one of those poor shmucks with just a bunch of servers, then there’s no shortcut. You’ll have to follow along with me too.
# /etc/stunnel/stunnel.conf pid = /run/stunnel.pid [k8s-postgres] cert = /etc/stunnel/notatether.crt key = /etc/stunnel/notatether.key accept = 0.0.0.0:5432 connect = 127.0.0.1:15432
It doesn’t take a rocket scientist to figure out that this accepts traffic at port 5432, all interfaces and relays them to port 15432. Also, the private key and certificate need to be copied into the /etc/stunnel
folder.
Now we will start the stunnel service on systemd:
systemctl enable --now stunnel
The good news – the PostgreSQL pod doesn’t crash repeatedly. The bad news – now when you connect to PostgreSQL from a completely different machine, you get this error:
psql: error: connection to server at "my.host" (1.2.3.4), port 5432 failed: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request.
Most likely because PostgresSQL does not understand any of the TLS traffic coming towards it.
So, how did I make all this work?
You will not believe what did make Postgres enable TLS because it seems to counter-intuitive at first. But this is the combination of options that did the trick:
tls.enabled
– Enable TLS traffic supporttls.autoGenerated
– Generate automatically self-signed TLS certificates
Obviously, I had to pass tls.enabled
to even use the first method, but the magic is in the tls.autoGenerated
option. This encrypts traffic using self-signed certificates made inside the container. And miraculously, it doesn’t complain!
Usually, alarm bells go off when OpenSSL sees a self-signed certificate. Bitnami isn’t even trying to hide the fact. And there are no relays for port forwarding either. I actually thought this was another option that would fail.
Just make sure you pass a password for postgres
though, especially if your data folder is not empty. Otherwise you’ll get locked out of your own database.
Enabling TLS – So what did we learn today?
There are quite a few lessons we can take home from this pain.
- Don’t always trust the docs
- Never double relay ports
- Use self-signed certs and not your own
- Don’t load-balance a bunch of servers unless you want trouble.