Categories
Code Life

Enable TLS For K8s Apps – It Almost Made Me Surrender

What happens when you got an unsecured database lying around somewhere? Bots will attack it. Everyone’s got to enable TLS to secure their internet-facing services. There’s simply no excuse now that you can buy them for cheap and with crypto, and even for free (Let’s Encrypt).

But for some reason, you go to these cloud frameworks of the future like Kubernetes, and securing all the infrastructure is like trying to break ice with a screwdriver! I will probably never understand why enabling TLS on services inside pods is as hard as running the marathon. But the best I can do is make sure you don’t have to go through the torment I did trying to make this work.

Our victim test subject in this blog will be PostgreSQL, but this will work with most other K8s apps out there. This cluster will be just a bunch or servers, so sorry, it looks like we’re stuck with the local storage class with all its flaws for now.

First, We Must Pass Through Hell

A word of warning before we begin, because this is important to mention. Do not reboot your Kubernetes nodes at any point while your cluster is running.

If you do that, then your pods and cluster storage on that particular node will be wiped out, and then you have to re-create them all over again. Trust me, you do not want to experience that agony of figuring out how to restore your cluster. If you’ve got only one control plane, you can’t.

And if your cluster is just a bunch of separate servers on the interwebs joined together like mine, then it is guaranteed that you will only have one control plane. There is no point in trying to create more, because you need to buy a hardware load balancer for that.

Don’t even try to be smart and attempt to configure HAProxy or some other software load balancer to distribute traffic across your WAN. This includes the public internet. You’re just wasting your time and possibly money on more servers.

Cloud backup photo created by rawpixel.com – www.freepik.com

Load balancing only works on LANs.

Anywhere else, and it’s just a failure point.

Ali Sherief

Let that sink into your head before reading. And whatever you do, don’t run worker nodes on Windows, OK? That’s literally just asking for trouble.

Our Quest To Enable TLS

Our journey begins with Helm – not to be confused with Help, the despair DevOps people feel when their software is fighting against them. We are clearly not masochists to write long YAML files for every last app we want to deploy, now are we?

I am going to assume your cluster is up and running, maybe has a few nodes joined to it too, and has kubectl installed. And if you’be been doing funny stuff to your cluster and it’s only half-working, then go back and read the above section.

After installing Helm, we will add the repository of surprisingly one of the most hassle-free artifacts in this blog: Bitnami. This basically makes it easy to deploy stuff to Kubernetes – on paper – and then enable TLS on them.

Adding repos in Helm is similar to adding them in package managers. You just run this command:

helm repo add bitnami https://charts.bitnami.com/bitnami

And that’s it. Just like that, you can set up a wealth of packages that would’ve otherwise taken ages with a package manager. Pretty cool, huh? If only enabling TLS on all these services was that easy!

How Not To Enable TLS

First, let me show you how not to enable TLS on your pods. I’m going to go through all the ways your deployment can go wrong if you try these approaches.

Failure Method 1: Read The Docs

enable TLS by reading the docs
Analyst photo created by pressfoto – www.freepik.com

Of course, nothing could go wrong with following the official instructions, right? If they didn’t work, why would Bitnami post them there, now think about that, will ya?

At least my cluster did not implode after following these instructions (thank goodness). But I did end up with a completely non-functioning PostgreSQL service. PostgreSQL could not verify the certificate I fed it.

This is not some self-signed certificate, mind you. This is a certificate bought from a proper Certificate Authority. If you’ve read the TLS certificate tutorial, you will know that I have an AlphaSSL cert from GlobalSign.

It makes you wonder what good are certificates if most stuff doesn’t work with them, doesn’t it?

For reference, these are the options to enable TLS that Bitnami tells you to add to the helm install command:

  • tls.certificatesSecret – Name of an existing secret that contains the certificates
  • tls.certFilename – Certificate filename
  • tls.certKeyFilename – Certificate key filename
  • tls.certCAFilename – CA Certificate filename

This stuff will not work. Here’s the log output that you will get if you attempt to set these parameters:

[[email protected] ~]# kubectl logs postgres-notatether-api-postgresql-0postgresql 13:05:55.03 
postgresql 13:05:55.03 Welcome to the Bitnami postgresql container
postgresql 13:05:55.03 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-postgresql
postgresql 13:05:55.03 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-postgresql/issues
postgresql 13:05:55.03 
postgresql 13:05:55.04 DEBUG ==> Configuring libnss_wrapper...
postgresql 13:05:55.05 INFO  ==> ** Starting PostgreSQL setup **
postgresql 13:05:55.07 INFO  ==> Validating settings in POSTGRESQL_* env vars..
postgresql 13:05:55.08 INFO  ==> Loading custom pre-init scripts...
postgresql 13:05:55.08 INFO  ==> Initializing PostgreSQL database...
postgresql 13:05:55.08 DEBUG ==> Ensuring expected directories/files exist...
postgresql 13:05:55.09 INFO  ==> pg_hba.conf file not detected. Generating it...
postgresql 13:05:55.09 INFO  ==> Generating local authentication configuration
postgresql 13:05:55.10 INFO  ==> Deploying PostgreSQL with persisted data...
postgresql 13:05:55.11 INFO  ==> Configuring replication parameters
postgresql 13:05:55.13 INFO  ==> Configuring fsync
postgresql 13:05:55.14 INFO  ==> Configuring TLS
chmod: changing permissions of '/opt/bitnami/postgresql/certs/notatether.key': Read-only file system
postgresql 13:05:55.14 WARN  ==> Could not set compulsory permissions (600) on file /opt/bitnami/postgresql/certs/notatether.key
postgresql 13:05:55.16 INFO  ==> Configuring synchronous_replication
postgresql 13:05:55.17 INFO  ==> Enabling TLS Client authentication
postgresql 13:05:55.21 INFO  ==> Loading custom scripts...
postgresql 13:05:55.21 INFO  ==> Enabling remote connections
postgresql 13:05:55.23 INFO  ==> ** PostgreSQL setup finished! **

postgresql 13:05:55.25 INFO  ==> ** Starting PostgreSQL **
2022-03-26 13:05:55.265 GMT [1] LOG:  pgaudit extension initialized
2022-03-26 13:05:55.271 GMT [1] LOG:  starting PostgreSQL 14.2 on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
2022-03-26 13:05:55.271 GMT [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2022-03-26 13:05:55.271 GMT [1] LOG:  listening on IPv6 address "::", port 5432
2022-03-26 13:05:55.272 GMT [1] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2022-03-26 13:05:55.274 GMT [108] LOG:  database system was shut down at 2022-03-26 07:47:22 GMT
2022-03-26 13:05:55.277 GMT [1] LOG:  database system is ready to accept connections
2022-03-26 13:06:03.221 GMT [121] LOG:  could not accept SSL connection: certificate verify failed
2022-03-26 13:06:13.197 GMT [128] LOG:  could not accept SSL connection: certificate verify failed
2022-03-26 13:06:23.207 GMT [135] LOG:  could not accept SSL connection: certificate verify failed
2022-03-26 13:06:33.212 GMT [142] LOG:  could not accept SSL connection: certificate verify failed
2022-03-26 13:06:33.267 GMT [149] LOG:  could not accept SSL connection: certificate verify failed
2022-03-26 13:06:43.190 GMT [156] LOG:  could not accept SSL connection: certificate verify failed
2022-03-26 13:06:43.231 GMT [163] LOG:  could not accept SSL connection: certificate verify failed
2022-03-26 13:06:53.207 GMT [170] LOG:  could not accept SSL connection: certificate verify failed
2022-03-26 13:06:53.263 GMT [177] LOG:  could not accept SSL connection: certificate verify failed

And then the pod will restart itself forever in an attempt to get different results without changing anything. It’s the classic definition of insanity (Einstein would be proud).

Fanatic OpenSSL Verification After Enabling TLS

Free Stock photos by Vecteezy

At this point, you must be wondering, why does it do this? Well that’s because the OpenSSL library verifying this certificate doesn’t trust anything by default. It doesn’t even trust root certificates – it calls them unsigned. Heck, it doesn’t even bundle the root CA certificates with itself.

So the result is even if you buy a fancy shmancy certificate and make a pod use it, it will not be verified. Maybe the situation is different with a traditional install, but those are hard to move around and upgrade. That’s the whole reason why we are using pods in the first place. Scalability is obviously a benefit as well but you don’t get that with the local storage class, remember?

We are going to take a slight detour from enabling TLS to verifying X.509 certificates with OpenSSL. You have to see how picky it is to understand the verification process. In this experiment:

  • notatether.crt – my certificate
  • gsalphasha2g2r1.crt – intermediate certificate
  • GlobalSign_Root_CA.crt – root certificate
# Naive attempt with just my cert
$ openssl verify notatether.crt
CN = *.notatether.com
error 20 at 0 depth lookup: unable to get local issuer certificate
error notatether.crt: verification failed

# Another naive attempt with my intermediate as the CAfile
$ openssl verify -CAfile gsalphasha2g2r1.crt notatether.crt
C = BE, O = GlobalSign nv-sa, CN = AlphaSSL CA - SHA256 - G2
error 2 at 1 depth lookup: unable to get issuer certificate
error notatether.crt: verification failed

# Root certificate as the CAfile
$ openssl verify -CAfile /usr/share/ca-certificates/mozilla/GlobalSign_Root_CA.crt notatether.crt
CN = *.notatether.com
error 20 at 0 depth lookup: unable to get local issuer certificate
error notatether.crt: verification failed

# Both my intermediate and my root as the CAfile
$ openssl verify -CAfile gsalphasha2g2r1.crt -CAfile /usr/share/ca-certificates/mozilla/GlobalSign_Root_CA.crt notatether.crt   
CN = *.notatether.com
error 20 at 0 depth lookup: unable to get local issuer certificate
error notatether.crt: verification failed

# Intermediate as the CA file and root cert before my cert
# (I saw this on a blog)
$ openssl verify -CAfile gsalphasha2g2r1.crt /usr/share/ca-certificates/mozilla/GlobalSign_Root_CA.crt notatether.crt 
C = BE, O = GlobalSign nv-sa, OU = Root CA, CN = GlobalSign Root CA
error 18 at 0 depth lookup: self signed certificate
error /usr/share/ca-certificates/mozilla/GlobalSign_Root_CA.crt: verification failed
C = BE, O = GlobalSign nv-sa, CN = AlphaSSL CA - SHA256 - G2
error 2 at 1 depth lookup: unable to get issuer certificate
error notatether.crt: verification failed

# Intermediate as the CAfile and root cert untrusted
$ openssl verify -CAfile gsalphasha2g2r1.crt -untrusted /usr/share/ca-certificates/mozilla/GlobalSign_Root_CA.crt notatether.crt 
C = BE, O = GlobalSign nv-sa, CN = AlphaSSL CA - SHA256 - G2
error 2 at 1 depth lookup: unable to get issuer certificate
error notatether.crt: verification failed

# Root as the CAfile and intermediate cert untrusted
$ openssl verify -CAfile /usr/share/ca-certificates/mozilla/GlobalSign_Root_CA.crt -untrusted gsalphasha2g2r1.crt notatether.crt 
notatether.crt: OK

After quite a bit of desperation, I managed to get it to work. I had to place the root certificate as the CAfile, and mark my intermediate cert as untrusted.

I had even tried at one point to shove the root cert next to my cert in the verification process. That didn’t work out because then it failed to verify the root certificate and thought it was self-signed.

I did not even have the intermediate cert at first, so I had to find a way to get it. I discovered that the URL is embedded inside my own certificate:

$ openssl x509 -text -in notatether.crt
Certificate:
    Data:
        ...
        Issuer: C = BE, O = GlobalSign nv-sa, CN = AlphaSSL CA - SHA256 - G2
        ...
        X509v3 extensions:
            X509v3 Key Usage: critical
                Digital Signature, Key Encipherment
            Authority Information Access: 
                CA Issuers - URI:http://secure.globalsign.com/cacert/gsalphasha2g2r1.crt
                OCSP - URI:http://ocsp2.globalsign.com/gsalphasha2g2

# Downloading the "CA Issuers" URI using wget
$ wget http://secure.globalsign.com/cacert/gsalphasha2g2r1.crt

# Encode the certificate in Base64 PEM format because it's in DER format
$ openssl x509 -inform der -in ~/gsalphasha2g2r1.crt -out ~/gsalphasha2g2r1.crt

Kubernetes does not make this easy

However, there is no guarantee that Bitnami charts or or the Kubernetes pods will do any of this for you. In fact, the only reason I had the root certificate in the first place was because this was on another box that had Firefox installed on it (/usr/share/ca-certs/mozilla).

Now technically this would mean that I pass the root certificate in the CAfile parameter, but… there’s no place for the intermediate cert. We saw what happened in the verification process when you put two CAfiles for verification. And leaving the intermediate out will also cause it to fail. Bitnami Helm charts do not have options for untrusted, intermediate certificates. That means you are effectively stuck in a grim “verification failed” loop.

Failure Method 2: Make Traffic Pass Through stunnel

Stunnel is a handy program that can encrypt traffic that’s being sent to any service were not designed for enabling TLS. It runs as a systemd service. I have used it in the past to enable TLS for many traditional programs. But microservices are a different beast all together, and stunnel would not avail with that setup.

The way this will work is that we create the pod, but with no TLS. Then, we will expose a service’s IP and port to a Localhost port. Finally, we listen on the real port and interface we want to get traffic from, and send the traffic to the Localhost port.

It sounds complex, doesn’t it? That’s why it was doomed for failure.

How to expose K8s service ports

Exposing service ports to enable TLS
Free Stock photos by Vecteezy

There are two well-known ways to expose a Kubernetes service IP and port to your localhost interface. Kuberentes recommends the first one, but I find the second one to be more durable.

The recommended way to expose a port is to use kubectl port-forward like this:

kubectl port-forward --namespace default svc/postgres-notatether-api-postgresql 15432:5432

This only really works for Localhost by the way. It becomes really messy when you want to make it listen on 0.0.0.0 – all interfaces. That’s why I prefer the second method of exposing ports:

socat tcp-listen:15432,bind=127.0.0.1,reuseaddr,fork tcp:$(kubectl describe svc/postgres-notatether-api-postgresql | grep IP: | awk '{print $2;}'):5432

This uses socat to listen on localhost, port 15432, and forward all traffic to the pod’s IP address at port 5432. To make it listen on just localhost, I use bind=127.0.0.1. If I wanted to make it listen on all interfaces instead, I would remove this part (and the comma before it).

You may have to install socat and stunnel from your package manager first. On Ubuntu the stunnel package is called stunnel4. Also, both of these programs have to run in the background, so you need to open another terminal and run one of them there.

Setting up stunnel

Next we need to create our stunnel configuration file. And the reason why we need a second relay to send traffic to the first relay is because kubectl port-forward and socat don’t encrypt traffic at all, and we can’t just place the service IP address inside the stunnel config file because it keeps changing. That is, unless you are using load balancers. And if you are one of those poor shmucks with just a bunch of servers, then there’s no shortcut. You’ll have to follow along with me too.

# /etc/stunnel/stunnel.conf
pid = /run/stunnel.pid

[k8s-postgres]
cert = /etc/stunnel/notatether.crt
key = /etc/stunnel/notatether.key
accept = 0.0.0.0:5432
connect = 127.0.0.1:15432

It doesn’t take a rocket scientist to figure out that this accepts traffic at port 5432, all interfaces and relays them to port 15432. Also, the private key and certificate need to be copied into the /etc/stunnel folder.

Now we will start the stunnel service on systemd:

systemctl enable --now stunnel

The good news – the PostgreSQL pod doesn’t crash repeatedly. The bad news – now when you connect to PostgreSQL from a completely different machine, you get this error:

psql: error: connection to server at "my.host" (1.2.3.4), port 5432 failed: server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.

Most likely because PostgresSQL does not understand any of the TLS traffic coming towards it.

So, how did I make all this work?

You will not believe what did make Postgres enable TLS because it seems to counter-intuitive at first. But this is the combination of options that did the trick:

  • tls.enabled – Enable TLS traffic support
  • tls.autoGenerated – Generate automatically self-signed TLS certificates

Obviously, I had to pass tls.enabled to even use the first method, but the magic is in the tls.autoGenerated option. This encrypts traffic using self-signed certificates made inside the container. And miraculously, it doesn’t complain!

Usually, alarm bells go off when OpenSSL sees a self-signed certificate. Bitnami isn’t even trying to hide the fact. And there are no relays for port forwarding either. I actually thought this was another option that would fail.

Just make sure you pass a password for postgres though, especially if your data folder is not empty. Otherwise you’ll get locked out of your own database.

Enabling TLS – So what did we learn today?

There are quite a few lessons we can take home from this pain.

  • Don’t always trust the docs
  • Never double relay ports
  • Use self-signed certs and not your own
  • Don’t load-balance a bunch of servers unless you want trouble.
Ali Sherief

By Ali Sherief

Editor-in-chief and serial coder & blogger.