Site Upgrade

My site was down for much of today as I finally took the plunge and rebuilt my image from ground up. This was something I had considered doing for some time now. The old site had been through a couple of ubuntu upgrades and was chugging along, but I always thought it would be good to do a fresh install. However I never really wanted to dump the time into it. Then I saw this article: https://www.anandtech.com/show/14284/amazon-offers-another-amd-epyc-powered-instance-t3a

I have been interested in AMD’s latest processors as they seem to be killing it with their Ryzen line. I checked the pricing and saw that the price for running my site was about half of the cost of what I was paying for my old t2 instance. Additionally at the same price level I was getting more CPU cores with the new instances. I hadn’t even realized that Amazon had rolled out their regular T3s or these new T3a instances. I figured it would be an easy switch, stop my instance, change the instance type and start up the new instance.

I attempted that with no luck. It seems that since I setup my site Amazon had come out with a new Network card driver, which my Linux kernel didn’t have. I switched back to my T2, and decided I would start building my new image on the side. When I had time I would spin up my new image and start working on the Amazon setup. I was happy to see there is now IPv6 support available. All in all there was just lots of new stuff to play with. I finally reached the point yesterday where I decided to backup my site and finally go for it. I had hit a stopping point where I brought my nginx config over (which had all my SSL settings), but I didn’t have my DNS pointed to the new machine in order to pull down new certificates. Without the certificates nginx wouldn’t launch, and at the same time because of HSTS preloading I can’t test an unencrypted version of my site. I pointed to the new IP this morning in my DNS settings and then proceeded to start working on restoring the backup of my site while I waited for the old DNS cache entries to expire so that I would pull down my new certificates.

Right around that time I had to depart as we had mother’s day plans and thus the site needed to remain down. When I got back on tonight I finally finished restoring my backup of wordpress and we are back. Now that I am back up I realized that my SSL Labs score dropped so I have been tweaking my configuration to get back up to my high A+ score.

Future projects now will include figuring out IPv6 routing. I was able to assign an IPv6 address to my instance and VPC, but when I try to SSH to that address it seems to time out. When I get that working I would like to setup DNS to allow getting to my site solely with IPv6. Anyway that is all for now as I need to go to bed, but more updates coming soon hopefully.

Site Upgrade

I decided to upgrade my site to the new version of Ubuntu as I haven’t done that for a couple of years. It is always a nice thing to work on when I am on vacation as it is the sort of thing that I don’t really get around to normally when I am busy. What a pain that ended up being.

The Upgrade for the OS itself went very smoothly as it seems to normally do so for Ubuntu. But the upgrade to the newer version of PHP broke everything with my site. As I think back actually I think this happened last time when I went from Ubuntu 14.04 to 16.04 as well and it jumped from php5 to php7. I ended up with about a 3 hour outage trying to sort everything out.

The big issue I saw was the default user that php was using was different than the nginx user so it couldn’t write to the Unix Domain Socket. I also noticed all the configuration advice for nginx was very different than when I set up this site. It seems like things are laid out better with the whole snippets of different configurations instead of sort of everything going in the default.conf. At some point I may want to start over again from a blank AMI image and import my content into it. Then I could setup nginx a little bit more modern. Seems like there are lets encrypt plugins for it too, so I am wondering if I could have it auto-renew my certificates.

Another thing I could do if I redid the site, would be to switch to mariadb. I have heard it is supposed to be faster than mysql it might be fun to mess with something different. That being said I probably won’t get to that this winter as I am currently working on some content for a talk and I also want to spend a little time doing some machine learning classes on Coursera before I get back to work.

I did take advantage of the time in the config files to figure out how to tighten up my SSL Labs score. I found that I was missing just 1 item to push my key exchange test from a 90 to a 100 so I implemented that. I was hoping to be able to turn on TLSv1.3 as well, but unfortunately Ubuntu 18.04.1 ships with a version of OpenSSL that is too old to support it. I saw on a mailing list that it is supposed to be coming though so hopefully soon I will be able to update to that.

Google Kubernetes Engine

Introduction

I have been messing around with Google Kubernetes Engine for the last few weeks now (as we are deploying my new app to it) and I have to say overall I am impressed. There has been a lot of talk about Kubernetes for a while and at first I was wondering if it was just the next piece of tech being over-hyped like so many things. Having used it now for a month I can say I understand why people are so excited about it. The learning curve is steep, but once you climb it, you will really appreciate the power of the platform.

As stated in my previous post, I have been building a Micro-service architecture in Golang and for deployment we decided to go with GKE. Go seems to be extremely friendly for docker containers, I have been using Alpine as my base and the container size of each service has been really tiny (around 10MB or less). This is quite a difference compared to Java containers that end up very large when you think of all the jar files that go into a typical spring boot app. There are a few things that you need to do to build your Go app for docker. You need to disable CGO so and tell the build to use Go’s networking for DNS resolution and not to rely on GLIBC’s as Alpine is built on musl libc. The other great thing about Go is the cross compiler is built into it, so you are an environment variable away from being able to compile your app for Linux even when running on a Mac as I do. The only other thing I do is add ca-certificates to my alpine base for SSL connections.

History

Initially when I started building my backend I was using Spring Cloud Gateway and Consul to handle load balancing and service discovery. When I went to bring our services to the cloud I discovered those would no longer be needed as Kubernetes has built in load balancing and service discovery. The integration between Spring Cloud and Consul is great, and what happens is you can just register your app name in consul and spring cloud will automatically route to it based on name. I have an Auth service named auth and I would hit spring cloud at http://localhost:8080/auth and it would look up the auth service in consul and route the request there automatically to my auth service which was running on 9999 at the /. I wanted to keep the same sort of approach in my Ingress and service discovery in Kubernetes.

Kubernetes the beginning…

 Initially I started using the default Google Ingress which behind the scenes provisions a Google Cloud Load Balancer (HTTP/HTTPS) load balancer and then should route the different requests to the different services based on your kubernetes service and Ingress. I was having issues getting the Google Ingress to work correctly with the URL rewriting that I wanted to maintain from my previous design and I discovered a second issue with it, which was it doesn’t support Websockets. We are planning on doing a lot of app communication between the backend and front end via a  websocket and this was going to be a painful limitation for us. We considered using Server Sent Events to push events to the client and rest calls for the messages from the client, but this wasn’t ideal.

I did some digging around and discovered that you can install NGINX as your ingress and behind the scenes it provisions a Google Cloud Load Balancer that is a TCP Loadbalancer. The advantage of this is that we can now send websockets into our cluster. Instead of our SSL terminating at the GCLB, it would now terminate at our Ingress when we were ready for it. As soon as I switched to NGINX all the URL rewriting issues I was seeing went away.

TLS Configuration

Once I had reached a point of having traffic into my cluster from the outside, I decided it was time to provision TLS. I discovered Certmanager which allowed me to configure my certificates through Let’s Encrypt my preferred certificate issuer. When I ran the SSL Labs test against my cluster I discovered that SSL was configured really well and I scored an A+ on their test. The only issue I am having is trying to figure out how to enable support for TLSv1.3 I haven’t been able to get the Ingress to support that even after messing with the NGINX configmap. It is supposed in the current version of the NGINX Ingress, but it is disabled by default and I am still fighting that part of the config.

Configuration examples

The final challenge that I faced was supporting multiple URLs for my cluster and just routing the Ingress based on what URL was being requested. I created a fanout Ingress which worked great but I struggled to find a config that would allow me to have multiple secrets for TLS depending on the URL. I finally found that config and thought I would share it here (which sensitive details changed).

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
annotations:
kubernetes.io/ingress.class: nginx
nginx.ingress.kubernetes.io/ssl-redirect: "true"
nginx.ingress.kubernetes.io/rewrite-target
: "/"
nginx.org/websocket-services
: haskovec-api-service
name: nginx-haskovec-api-ingress
spec:
rules:
- host: api.haskovec.com
http:
paths:
- backend:
serviceName: haskovec-api-service
servicePort: 9999
path: /auth
- backend:
serviceName: haskovec-api-service
servicePort: 9998
path: /service2
- host: service2.haskovec.com
http:
paths:
- backend:
serviceName: service2-service
servicePort: 9999
path: /auth
- backend:
serviceName: service2-service
servicePort: 9998
path: /service2
# This section is only required if TLS is to be enabled for the Ingress
tls:
- secretName: api-certificate-secret
hosts:
- api.haskovec.com
- secretName: service2-certificate-secret
hosts:
- service2.haskovec.com


The above Ingress shows using multiple hosts connecting to multiple pods behind a service and routing based on the URL and domain name. It also shows what multiple TLS certificate secret stores look like.

To create your certificates with Certmanger you will need to configure an issuer  as below (again details changed):

apiVersion: certmanager.k8s.io/v1alpha1
kind: Issuer
metadata:
name: letsencrypt-production
spec:
acme:
# The ACME server URL
server: https://acme-v02.api.letsencrypt.org/directory
#Email address for acme registration
email: jeff@haskovec.com
#Name of a secret used to store the ACME account private key
privateKeySecretRef:
name: letsencrypt-production
#Enable the http-01 challenge provider
http01: {}

And a certificate:

apiVersion: certmanager.k8s.io/v1alpha1
kind: Certificate
metadata:
name: api.haskovec.com
spec:
secretName: api-certificate-secret
commonName: api.haskovec.com
dnsNames:
- api.haskovec.com
issuerRef:
name: letsencrypt-production
kind: Issuer
acme:
config:
- http01:
ingressClass: nginx
domains:
- api.haskovec.com

You configure an additional certificate as above for each domain name that you are getting certificates for. And just like that certmanager automagically goes out to let’s encrypt gets certs for that domain name and stores it in the referenced secret.

Conclusion

All things considered I am blown away by working in a Kubernetes environment. Google takes away the pain of actually provisioning your cluster so you can focus on your app. Deployments are a breeze, I just push all my new docker containers, updated my deployment yaml file and apply it, and kubernetes does a rolling update of all my services. This is allowing us to basically have infrastructure like we are a company with a huge devops team when in reality we have no devops engineers. I will definitely be using this on my projects going forward!

SSL Certificates and Google Domains

Recently I ported my domain hosting from Godaddy to Google Domains. My main reason for doing so was to save money. Domain names on Godaddy cost $3 more per year, plus they charge you for privacy on whois searches whereas Google includes that for free. It was a fairly easy process to transfer my domain names in, but configuring the DNS was a little bit weird as their zone file editing interface was different that godaddy’s. However I thought I had it all good and working so I was happy with my setup.

Then last night one of my friends mentioned that he had just renewed his SSL certificate. That got me thinking I only had about 2 weeks left on my certificate and I needed to do that as well. I had mentioned previously that I switched my SSL certificates to Let’s Encrypt. The great thing about Let’s Encrypt is that it is free, and once you get it setup, less hassle to renew that other free certificate sites that I had used in the past. The drawback seems to be that they only issue certificates that are good for 90 days. I updated my Let’s Encrypt software (it is based out of a Github repository so there is normally a new version when you need to renew.) When I ran the renew command it failed on www.haskovec.com. That is when I realized that I had misconfigured my cname record on Google Domains DNS setting and www. was completely broken only haskovec.com worked. It took me a little while to figure out. On Godaddy I think I would do the cname of www and point it to @ or * I don’t recall which they used. Google doesn’t let you point the cname to @. Aftering some googling I found out that on their DNS setup you have to point to your hostname that is registered at @. So it ends up being cname www and it points to haskovec.com. Once I got that taken care of my certificate renewed without any issue.

While I was in there messing around I decided I would disable TLS 1.0. Doing so means dropping support for a ton of browsers including IE10. But it is widely considered as the next protocol to be hacked and at this point pretty much everyone supports 1.2 and the handful of readers I have I expect to be running current browsers (whether on their phones or computers.) I reran the Qualys SSL Test to make sure that I hadn’t broken anything. All looked well with the higher score now on the protocol section and many more test browsers that failed. In the course of running that test I noticed the HSTS preloading test that they are doing now. I didn’t even realize such a thing existed. I did some research and added the preload header to my HSTS header on my server and put my site on the preload list for Chrome. We shall see if that works or if I meet all the requirements, but I think I do. While I was editing my headers I noticed that I was doing the domain name rewriting wrong if the person came in through https://www.haskovec.com/ The code was working if they either came in without the www or they came in on www without the https. So it ended up being a useful night as I found 2 issues in my server config I was able to fix. In the course of writing this post I realized I should add the other cnames I have registered for this domain to my SSL certificate so that will be the next thing on my agenda.