Diving Deeper Into Operator Framework, Part 1

Diving Deeper Into Operator Framework, Part 1

Overview

When managing and operating complex software on Kubernetes, it's very common to think about the operator pattern which can simplify a lot of things for day 1 and day 2, from the SRE/Operator perspective, with the cost of coding.

Believe it or not, there are a lot of boilerplate code that we need to spend time with, if you'd build Kubernetes operators from scratch.

Fortunately, there are a few great operator frameworks in the OSS Kubernetes community that can really help us out.

Before you start struggling with "oh, which one should I go with", let me share a little bit about what I understand:

  1. Both are built on top of controller-runtime library;
  2. Both are tools for scaffolding and code generation to bootstrap a new operator project fast and the folder structures scaffolded are highly similar;
  3. Both contributors are active in Kubernetes' Slack channel #kubernetes-operators and are contributing heavily to upstream controller-runtime;
  4. Maybe more in common that I'm not aware of.

In short, they have many in common.

So if you simply just want to try building an operator for learning, pick any of them and you're good to rock.

In this blog series, I'll dive deeper into Operator SDK and hopefully you can figure out the differences along the way.

There are two part of this blog series:

Quick Start

The best place to start with Operator SDK should be here: https://sdk.operatorframework.io/docs/.

And we'll focus on building the operator by Golang, in a Mac. So your mileage may slightly vary but I guarantee that you'd still easily get the idea.

Preparation

We need a Kubernetes cluster, after all, we're developing Kubernetes stuff.

The simplest way I can think of is to spin up a kind cluster.

Prerequisites: Docker Desktop is up and running.

 1# Install kind CLI if you haven't
 2$ brew install kind
 3
 4# Prepare a simple config file
 5$ cat > kind-cluster.yaml <<EOF
 6kind: Cluster
 7apiVersion: kind.x-k8s.io/v1alpha4
 8nodes:
 9- role: control-plane
10- role: worker
11EOF
12
13# And then create the cluster
14$ kind create cluster --config kind-cluster.yaml

Once the cluster is created, the current context will be automatically changed to the newly provisioned kind cluster.

 1# Let's check the nodes
 2$ kubectl get nodes
 3NAME                 STATUS   ROLES                  AGE     VERSION
 4kind-control-plane   Ready    control-plane,master   3h47m   v1.21.1
 5kind-worker          Ready    <none>                 3h47m   v1.21.1
 6
 7# And the pods in kube-system
 8$ kubectl get pods -n kube-system
 9NAME                                         READY   STATUS    RESTARTS   AGE
10coredns-558bd4d5db-jk8h8                     1/1     Running   0          3h47m
11coredns-558bd4d5db-klnjx                     1/1     Running   0          3h47m
12etcd-kind-control-plane                      1/1     Running   0          3h47m
13kindnet-7sx5m                                1/1     Running   0          3h47m
14kindnet-f9pj7                                1/1     Running   0          3h46m
15kube-apiserver-kind-control-plane            1/1     Running   0          3h47m
16kube-controller-manager-kind-control-plane   1/1     Running   0          3h47m
17kube-proxy-d2w44                             1/1     Running   0          3h46m
18kube-proxy-lvxhm                             1/1     Running   0          3h47m
19kube-scheduler-kind-control-plane            1/1     Running   0          3h47m
20
21# Yes, the "nodes" are actually Docker container -- that's why it's called "kind", Kubernetes in Docker
22$ docker ps
23CONTAINER ID   IMAGE                  COMMAND                  CREATED       STATUS       PORTS                       NAMES
24c18e7a78bec6   kindest/node:v1.21.1   "/usr/local/bin/entr…"   4 hours ago   Up 4 hours   127.0.0.1:50057->6443/tcp   kind-control-plane
25b4f4856f7181   kindest/node:v1.21.1   "/usr/local/bin/entr…"   4 hours ago   Up 4 hours                               kind-worker

Installing operator-sdk

1# Install operator-sdk by brew, if you haven't
2$ brew install operator-sdk
3
4# Check the installed version
5$ operator-sdk version
6operator-sdk version: "v1.7.2", commit: "6db9787d4e9ff63f344e23bfa387133112bda56b", kubernetes version: "1.19.4", go version: "go1.15.5", GOOS: "darwin", GOARCH: "amd64"

Initializing an operator

Let's initialize an operator, say namely "memcached-operator".

 1$ mkdir memcached-operator && cd memcached-operator
 2$ operator-sdk init --domain example.com --repo github.com/example/memcached-operator
 3
 4# See what we got
 5$ tree
 6.
 7├── Dockerfile
 8├── Makefile
 9├── PROJECT
10├── config
11│   ├── default
12│   │   ├── kustomization.yaml
13│   │   ├── manager_auth_proxy_patch.yaml
14│   │   └── manager_config_patch.yaml
15│   ├── manager
16│   │   ├── controller_manager_config.yaml
17│   │   ├── kustomization.yaml
18│   │   └── manager.yaml
19│   ├── manifests
20│   │   └── kustomization.yaml
21│   ├── prometheus
22│   │   ├── kustomization.yaml
23│   │   └── monitor.yaml
24│   ├── rbac
25│   │   ├── auth_proxy_client_clusterrole.yaml
26│   │   ├── auth_proxy_role.yaml
27│   │   ├── auth_proxy_role_binding.yaml
28│   │   ├── auth_proxy_service.yaml
29│   │   ├── kustomization.yaml
30│   │   ├── leader_election_role.yaml
31│   │   ├── leader_election_role_binding.yaml
32│   │   ├── role_binding.yaml
33│   │   └── service_account.yaml
34│   └── scorecard
35│       ├── bases
36│       │   └── config.yaml
37│       ├── kustomization.yaml
38│       └── patches
39│           ├── basic.config.yaml
40│           └── olm.config.yaml
41├── go.mod
42├── go.sum
43├── hack
44│   └── boilerplate.go.txt
45└── main.go
46
4710 directories, 29 files

Note: as of writing, the operator-sdk v1.7.2 is compatible with Golang version of 1.13 <= version < 1.16, so if you're with Golang version >=1.16, try adding the --skip-go-version-check flag in operator-sdk init command.

Creating APIs

We just initialized it with an empty structure. Let's create a simple API named "Memcached":

 1# Create an API named "Memcached"
 2$ operator-sdk create api --group cache --version v1alpha1 --kind Memcached --resource --controller
 3
 4# See what we got again
 5$ tree
 6.
 7├── Dockerfile
 8├── Makefile
 9├── PROJECT
10├── api
11│   └── v1alpha1
12│       ├── groupversion_info.go
13│       ├── memcached_types.go
14│       └── zz_generated.deepcopy.go
15├── bin
16│   └── controller-gen
17├── config
18│   ├── crd
19│   │   ├── kustomization.yaml
20│   │   ├── kustomizeconfig.yaml
21│   │   └── patches
22│   │       ├── cainjection_in_memcacheds.yaml
23│   │       └── webhook_in_memcacheds.yaml
24│   ├── default
25│   │   ├── kustomization.yaml
26│   │   ├── manager_auth_proxy_patch.yaml
27│   │   └── manager_config_patch.yaml
28│   ├── manager
29│   │   ├── controller_manager_config.yaml
30│   │   ├── kustomization.yaml
31│   │   └── manager.yaml
32│   ├── manifests
33│   │   └── kustomization.yaml
34│   ├── prometheus
35│   │   ├── kustomization.yaml
36│   │   └── monitor.yaml
37│   ├── rbac
38│   │   ├── auth_proxy_client_clusterrole.yaml
39│   │   ├── auth_proxy_role.yaml
40│   │   ├── auth_proxy_role_binding.yaml
41│   │   ├── auth_proxy_service.yaml
42│   │   ├── kustomization.yaml
43│   │   ├── leader_election_role.yaml
44│   │   ├── leader_election_role_binding.yaml
45│   │   ├── memcached_editor_role.yaml
46│   │   ├── memcached_viewer_role.yaml
47│   │   ├── role_binding.yaml
48│   │   └── service_account.yaml
49│   ├── samples
50│   │   ├── cache_v1alpha1_memcached.yaml
51│   │   └── kustomization.yaml
52│   └── scorecard
53│       ├── bases
54│       │   └── config.yaml
55│       ├── kustomization.yaml
56│       └── patches
57│           ├── basic.config.yaml
58│           └── olm.config.yaml
59├── controllers
60│   ├── memcached_controller.go
61│   └── suite_test.go
62├── go.mod
63├── go.sum
64├── hack
65│   └── boilerplate.go.txt
66└── main.go
67
6817 directories, 43 files

As you can see, more stuff have been generated for us -- that's exactly the value for such a tool to scaffold and generate boilerplate code for us to bootstrap a new operator project fast.

Coding the operator

Well, not really the focus for now.

But in order to judge whether it really works, let's log something out at least:

1# Edit controllers/memcached_controller.go
2$ vi controllers/memcached_controller.go

Change this:

1func (r *MemcachedReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
2	_ = r.Log.WithValues("memcached", req.NamespacedName)
3
4	// your logic here
5
6	return ctrl.Result{}, nil
7}

to:

1func (r *MemcachedReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
2	_ = r.Log.WithValues("memcached", req.NamespacedName)
3
4	// your logic here
5	r.Log.Info("great, the Reconcile is really triggered!")
6
7	return ctrl.Result{}, nil
8}

You may refer to this doc and this file for how to implement the Controller in a step-by-step basis, with a complete sample implementation.

(Optionally) Creating webhook

In the Kubernetes world, there are 3 kinds of webhooks:

Currently, controller-runtime supports admission webhooks and CRD conversion webhooks, so does the Operator SDK.

The webhooks are optional but very powerful if they suit your needs. But let's ignore webhooks for now.

Running the operator locally

Even we've done nothing on coding the exact reconciliation logic, but guess what, the code generated is workable annd deployable.

Since we now focus on the end-to-end process, we can proceed.

Typically, we may want to run it outside of Kubernetes as a "normal" Golang project, to have a better development experience within the so-called "inner loop":

1# Firstly, we have to install the CRDs
2$ make install
3
4# Then run it as a "normal" go project
5$ make run

Note: if you check out the Makefile you will realize that make run is just to execute go run ./main.go

Now the operator is actually working, watching the "Memcached" CRs in the default cluster we're now connected to, which is the cluster powered by kind.

And we can open another console to create a CR to see what will happen:

 1# Create a CR, which can refer to the example under config/samples/cache_v1alpha1_memcached.yaml
 2$ kubectl apply -f - <<EOF
 3apiVersion: cache.example.com/v1alpha1
 4kind: Memcached
 5metadata:
 6  name: memcached-sample
 7spec:
 8  # Add fields here
 9  foo: bar
10EOF

We should be able to see a log message like:

12021-06-04T15:06:22.493Z	INFO	controllers.Memcached	great, the Reconcile is really triggered!

As you can imagine, once you've implemented the reconciliation logic properly, this will work out for you.

Now let's delete the CR:

1$ kubectl delete Memcached/memcached-sample

We should be able to see the same log message again, indicating that the reconciliation logic can be triggered properly for you to handle the events.

Building and running the operator

Let's pretend that we're happy with what we've developed, it's common to build and publish the operator as a container to a desired container repository like Docker Hub, or Quay.io, or something on prem. The choice is yours!

Here I'm going to push to my Docker Hub account, you may change it to yours. In this case, it's recommended to update the default Makefile with below changes:

1-IMG ?= controller:latest
2+IMG ?= $(IMAGE_TAG_BASE):v$(VERSION)
3
4-IMAGE_TAG_BASE ?= example.com/memcached-operator
5+IMAGE_TAG_BASE ?= brightzheng100/memcached-operator

So that we can avoid setting the IMG in the commands all the times.

 1# Docker build and push
 2# Instead of: make docker-build docker-push IMG="brightzheng100/memcached-operator:0.0.1"
 3$ make docker-build docker-push
 4
 5# Then deploy it
 6# Instead of: make deploy IMG="brightzheng100/memcached-operator:0.0.1"
 7$ make deploy
 8
 9# Check it out
10$ kubectl get pods -n memcached-operator-system
11NAME                                                     READY   STATUS    RESTARTS   AGE
12memcached-operator-controller-manager-7c759576bb-fwzfl   2/2     Running   0          49s
13
14# Tail the logs
15$ kubectl logs memcached-operator-controller-manager-7c759576bb-fwzfl --all-containers -f -n memcached-operator-system

Similarlly, we can open a new console and create the CR to test it out:

 1# Create a CR, which can refer to the example under /config/samples/cache_v1alpha1_memcached.yaml
 2$ kubectl apply -f - <<EOF
 3apiVersion: cache.example.com/v1alpha1
 4kind: Memcached
 5metadata:
 6  name: memcached-sample
 7spec:
 8  # Add fields here
 9  foo: bar
10EOF
11
12# Check it out
13$ kubectl get memcached
14NAME               AGE
15memcached-sample   40s
16
17# Then delete it
18$ kubectl delete Memcached/memcached-sample

And yes, we should be able to see exactly the log message twice, indicating that the Reconcile method has been properly triggered twice for the corresponding CR events.

Clean Up

1$ make undeploy

Conclusion

As you've seen, building Kubernetes operators can be significantly simplified because of the operator frameworks like Operator SDK.

The project, the APIs, the webhooks can be generated by one command each so we can focus on implementing the operator's business logic.

Our software can be packaged and managed by our operators, which is super cool...but if we think of "who will monitor the monitors": are there any further practices to package, distrubute, manage and operate our operators?

That's exactly my coming blog is going to cover. Stay tuned!