August 30 2024

Adventures with Kubernetes

Here is a little background on this article... Several years ago I decided to start self-hosting my own Kubernetes cluster not knowing the complexities and pitfalls of doing so. This is a kind of compilation of things I've learned along the way and wanted to pass them on to anyone facing the same challenges.

If you've never self-hosted Kubernetes then you may not be aware of the critical pieces of it that keep it running, Etcd being one of them. Etcd is the distributed key value store for the clusters data, without it can't function. Here are some of the lessons I've learned running my own cluster over the years.

Etcd has native tools for backing it up and restoring it.

One of the tools I have used for backup and recovery is Etcdctl, a command line tool for mananging Etcd. The concept is pretty simple you take a snapshot of the database and save it to a file. However, where it gets complicated is that if you are running a multi-node control plane cluster you have to use that snapshot and restore everyone of the Etcd nodes.

Here is a reference to the documentation on it:
Restoring overwrites some snapshot metadata (specifically, the member ID and cluster ID); the member loses its former identity. This metadata overwrite prevents the new member from inadvertently joining an existing cluster. Therefore in order to start a cluster from a snapshot, the restore must start a new logical cluster.

In most tutorials for self-hosting Kubernetes they will go with a basic three node setup with one control plane and two workers. This is fine in most cases but if you are looking for HA it won't work, the moment you have to shutdown or reboot that control plane your cluster will become unavailable. These are my tips on setting up an HA cluster and moving from a single control plane to many.

Use a Load Balancer

An external load balancer is a great way to make not only your control plane highly available but also your workers. In my setup I use the HAProxy package in pfSense to load balance my control planes and workers behind a virtual IP. Below is a diagram of how I have setup my control planes and nodes. Port 6443 is load balancing the kubernetes API endpoints for the control planes and 80 and 443 are for all nodes.

HAProxy Kubernetes
HAProxy Diagram
Update your control plane components and kubelet

On each control plane there are kubeconfig files for each of the Kubernetes services that run as Pods in the cluster. These can be found in the /etc/kubernetes directory and need to be updated (example below) to use the load balancer to contact the kubernetes API endpoints. But the first step before this is updating the certificate with the IP of the load balancer as a SAN so the kube-api servers won't reject the traffic. An entire walkthrough of how to do this I found on Scott Lowe's website

kube-controller-manager config example
apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: 
    server: https://kube.example.com:6443
    name: kubernetes
contexts:
- context:
    cluster: kubernetes
    user: system:kube-controller-manager
    name: system:kube-controller-manager@kubernetes
current-context: system:kube-controller-manager@kubernetes
kind: Config
preferences: {}
users:
- name: system:kube-controller-manager
  user:
    client-certificate-data: 
    client-key-data:

Storage can be a challenge

Unlike a Kubernetes cluster hosted in a public cloud, a self-hosted cluster doesn't have the same storage provider options. Automatically provisioning persistent volumes doesn't work so if you require storage that you don't want tied to a specific node then NFS is probably the best option.

The networking with Kubernetes can sometimes be a little complicated and easy to break. Below are some of my tips from breaking my clusters many times and having to diagnose and repair it as well as things to make your life easier when exposing applications.

Keep your CNI updated

CNI stands for Container Network Interface and allows the pods/containers in your cluster to communicate. You should check your CNI version whenever you update the cluster version of Kubernetes to ensure it's on a supported version. I personally use Calico but there are several out there to pick from CNCF CNIs

Use an Ingress Controller

If you aren't familiar with Ingress here is a link to the documentation on it to get started.
An ingress controller can be kind of a headache to setup to begin with but it will save you time in the end. You may be asking yourself what is an ingress controller and what does it do? What it does is acts as a layer 7 load balancer looking at the HTTP requests and routing the traffic to backend services based on the URL.


August 30 2018

More Equipment!

Another server joins the cause!

A Dell PowerEdge R710 Gen 11 was graciously given to me and I really appreciate it as it can handle way more memory(way more than 8GB haha) than my current XCP-ng server. Only catch is it came with 3GB of memory so I'll be purchasing 48GB of memory to replace that and get it up and running.

In addition to the R710 I also received a T710, which if you don't know the difference between the two, the T710 is a tower version of the R710 and can handle a little less memory than the R710 but has more drive capacity. The T710 will take some more time to get up and running but eventually it will be another XCP-ng server.

Once the T710 is up I plan on turning the old XCP-ng server into a FreeNAS storage platform for backups and possibly for VM storage.


July 19 2018

Rack Setup

The rack is setup and has a couple pieces of equipment in it.

So far this is what I have setup.


July 15 2018

First Iteration of Server Lab

I've begun putting together my first iteration of my home server lab. The lab is moving to a different room in the house and I have a 42U rack that I plan on using for network equipment and servers that are rack mount.

I plan on purchasing a rack mount PowerEdge server and putting XCP-ng on it, which is a spin off of XenServer. After that I'll start putting a plan together on what VMs I want to run on it. I know for sure I'll be setting up a dev server for my website.