Kubernetes Init Containers

Kubernetes Init containers. Alright, I’m just going to tell the truth here. When I first started reading about them, I didn’t get it. I thought to myself, “with all the other stuff they could be doing right now at this early stage of Kubernetes, what the hell were they thinking? Seriously?” But that’s because I just didn’t get it. I didn’t see the value. I mean, don’t get me wrong, Init containers are good for many reasons. Transferring state between Pets, detecting databases are up prior to starting an app, Configuring PVCs with information the primary app needs etc etc. These are all important things but there are already work arounds for this stuff. Entrypoint anyone?

And then I read one line in the petset documentation (of all places) and I had a Aha! moment.

“…allows you to run docker images from third-party vendors without modification.”

That is a HUGE reason for Init Containers and in my mind should be the biggest validation of their need as a broader Kubernetes use-case.

At Pearson we have to modify existing docker images all the time to fit our needs. Whether its clustering Consul, modding Fluentd, seeding Cassandra or setting up discovery for ElasticSearch clustering. These are all things we have done and had to create our own custom images to manage. And in some cases requiring a private docker repository to do so. Hell, half the stuff I’ve written about has caused me to put out our Dockerfiles just so you could take advantage of them. If we had init containers in the first place, it would have been a lot less code and a lot more “hey, go pull this init container and use it” in my blog posts.

Alright with that, I’m actually just going to point you to the documentation on this one. Its pretty good and gives you exactly what you need to get started.

Kubernetes Init Containers

One key thing to remember, Init containers for a given app run in serial.

Now my team has to go back to work rewriting all our old shit.

Kubernetes/PaaS: Automated Test Framework

First off, mad props go out to Ben Somogyi and Martin Devlin. They have been digging deep on this and have made great progress. I wanted to make sure I call them out and all the honors go to them. I just have the honor of telling you about it.

You might be thinking right about now, “why an automated test framework? Doesn’t Kubernetes team test their own stuff already?” Of course they do but we have a fair number of apps/integrations to make sure out platform components all work together with Kubernetes. Take for example, when we upgrade Kubernetes, deploy a new stackstorm integration or add some authentication capability. All of these things need to be tested to ensure our platform works every time.

At what point did we decide we needed an automated test framework? Right about the time we realized we were committing so much back to our project that we couldn’t keep up with the testing. Prior to this time, we tested each PR requiring 2 +1s (minus the author) to allow a PR to get merged. What we found was we were spending so much time testing (thoroughly?) that we were loosing valuable development time. We are a pretty small dev shop. Literally 5 (+3 Ops) guys developing new features into our PaaS. So naturally there is a balancing act here. Do we spend more time writing test cases or actually testing ourselves? There comes a tipping when it makes more sense to write test cases and automate it and use people for other things. We felt like we hit that point.

Here is what our current test workflow looks like. Its subject to change but this is our most recent iteration.

QA Automation Workflow

Notice we are running TravisCI to kick everything off. If you have read our other blog posts, you know we also have a Jenkins plugin and you are probably thinking, ‘why Travis when you already have written your own Jenkins plugin?’ Its rather simple really. We use TravisCI to kick off tests through Github which deploys a completely new AWS VPC / Kubernetes Cluster from scratch, runs a series of tests to make sure it came up properly, all the endpoints are available and the deploys Jenkins into a namespace which kicks off a series of internal tests on the cluster.

Basically TravisCI is for external / infrastructure testing to make sure Terraform/Ansible run correctly and all the external dependencies come up and Jenkins to deploy / test at the container level for internal components.

If you haven’t already read it, you may consider reading Kubernetes A/B Cluster Deploys because we are capable of deploying two completely separate clusters inside the same AWS VPC for the purpose of A/B migrations.

Travis looks at any pull requests (PR) being made to our dev branch. For each PR TravisCI will run through the complete QA automation process. Below are the highlights. You can look at the image above for details.

1. create a branch from the PR and merge in the dev branch

2. Linting/Unit tests

3. Cluster deploy

  • If anything fails during deploy of the VPC, paasA or paasB, the process will fail, and tear down the environment with the logs of it in TravisCI build logs.

Here is an example of one of our builds that is failing from the TravisCI.

Screen Shot 2016-08-27 at 1.30.03 PM

4. Test paasA with paasB

  • Smoke Test
  • Deploy ‘Testing’ containers into paasB
  • Retrieve tests
  • Execute tests against paasA
  • Capture results
  • Publish back to Travis

5. Destroy environment

 

One massive advantage of having A and B clusters is we can use one to test the other. This enables a large portion of our testing automation to exist in containers. Thus making our test automation parallel, fast and scalable to a large extent.

The entire process takes about 25 minutes. Not too shabby for literally building an entire environment from the ground up and running tests against it and we don’t expect the length of time to change much. In large part because of the parallel testing. This is a from scratch, completely automated QA automation framework for PaaS. I’m thinking 25-30 minutes is pretty damn good. You?

Screen Shot 2016-08-27 at 1.44.41 PM

 

Alright get to the testing already.

First is our helper script for setting a few params like timeouts and numbers of servers for each type. anything in ‘${}’ is a Terraform variable that we inject on Terraform deploy.

helper.bash

#!/bin/bash

## Statics

#Long Timeout (For bootstrap waits)
LONG_TIMEOUT=<integer_seconds>

#Normal Timeout (For kubectl waits)
TIMEOUT=<integer_seconds>

# Should match minion_count in terraform.tfvars
MINION_COUNT=${MINION_COUNT}

LOADBALANCER_COUNT=${LOADBALANCER_COUNT}

ENVIRONMENT=${ENVIRONMENT}

## Functions

# retry_timeout takes 2 args: command [timeout (secs)]
retry_timeout () {
  count=0
  while [[ ! `eval $1` ]]; do
    sleep 1
    count=$((count+1))
    if [[ "$count" -gt $2 ]]; then
      return 1
    fi
  done
}

# values_equal takes 2 values, both must be non-null and equal
values_equal () {
  if [[ "X$1" != "X" ]] || [[ "X$2" != "X" ]] && [[ $1 == $2 ]]; then
    return 0
  else
    return 1
  fi
}

# min_value_met takes 2 values, both must be non-null and 2 must be equal or greater than 1
min_value_met () {
  if [[ "X$1" != "X" ]] || [[ "X$2" != "X" ]] && [[ $2 -ge $1 ]]; then
    return 0
  else
    return 1
  fi
}

 

You will notice we have divided our high level tests by Kubernetes resource types. Services, Ingresses, Pods etc etc

First we test a few things to make sure our minions and loadbalancer (minions) came up. Notice we are using kubectl for much of this. May as well, its there and its easy.

If you want to know more about what we mean by load balancer minions.

instance_counts.bats

#!/usr/bin/env bats

set -o pipefail

load ../helpers

# Infrastructure

@test "minion count" {
  MINIONS=`kubectl get nodes --selector=role=minion --no-headers | wc -l`
  min_value_met $MINION_COUNT $MINIONS
}

@test "loadbalancer count" {
  LOADBALANCERS=`kubectl get nodes --selector=role=loadbalancer --no-headers | wc -l`
  values_equal $LOADBALANCERS $LOADBALANCERS
}

 

pod_counts.bats

#!/usr/bin/env bats

set -o pipefail

load ../helpers

@test "bitesize-registry pods" {
  BITESIZE_REGISTRY_DESIRED=`kubectl get rc bitesize-registry --namespace=default -o jsonpath='{.spec.replicas}'`
  BITESIZE_REGISTRY_CURRENT=`kubectl get rc bitesize-registry --namespace=default -o jsonpath='{.status.replicas}'`
  values_equal $BITESIZE_REGISTRY_DESIRED $BITESIZE_REGISTRY_CURRENT
}

@test "kube-dns pods" {
  KUBE_DNS_DESIRED=`kubectl get rc kube-dns-v18 --namespace=kube-system -o jsonpath='{.spec.replicas}'`
  KUBE_DNS_CURRENT=`kubectl get rc kube-dns-v18 --namespace=kube-system -o jsonpath='{.status.replicas}'`
  values_equal $KUBE_DNS_DESIRED $KUBE_DNS_CURRENT
}

@test "consul pods" {
  CONSUL_DESIRED=`kubectl get rc consul --namespace=kube-system -o jsonpath='{.spec.replicas}'`
  CONSUL_CURRENT=`kubectl get rc consul --namespace=kube-system -o jsonpath='{.status.replicas}'`
  values_equal $CONSUL_DESIRED $CONSUL_CURRENT
}

@test "vault pods" {
  VAULT_DESIRED=`kubectl get rc vault --namespace=kube-system -o jsonpath='{.spec.replicas}'`
  VAULT_CURRENT=`kubectl get rc vault --namespace=kube-system -o jsonpath='{.status.replicas}'`
  values_equal $VAULT_DESIRED $VAULT_CURRENT
}

@test "es-master pods" {
  ES_MASTER_DESIRED=`kubectl get rc es-master --namespace=default -o jsonpath='{.spec.replicas}'`
  ES_MASTER_CURRENT=`kubectl get rc es-master --namespace=default -o jsonpath='{.status.replicas}'`
  values_equal $ES_MASTER_DESIRED $ES_MASTER_CURRENT
}

@test "es-data pods" {
  ES_DATA_DESIRED=`kubectl get rc es-data --namespace=default -o jsonpath='{.spec.replicas}'`
  ES_DATA_CURRENT=`kubectl get rc es-data --namespace=default -o jsonpath='{.status.replicas}'`
  values_equal $ES_DATA_DESIRED $ES_DATA_CURRENT
}

@test "es-client pods" {
  ES_CLIENT_DESIRED=`kubectl get rc es-client --namespace=default -o jsonpath='{.spec.replicas}'`
  ES_CLIENT_CURRENT=`kubectl get rc es-client --namespace=default -o jsonpath='{.status.replicas}'`
  values_equal $ES_CLIENT_DESIRED $ES_CLIENT_CURRENT
}

@test "monitoring-heapster-v6 pods" {
  HEAPSTER_DESIRED=`kubectl get rc monitoring-heapster-v6 --namespace=kube-system -o jsonpath='{.spec.replicas}'`
  HEAPSTER_CURRENT=`kubectl get rc monitoring-heapster-v6 --namespace=kube-system -o jsonpath='{.status.replicas}'`
  values_equal $HEAPSTER_DESIRED $HEAPSTER_CURRENT
}

 

service.bats

#!/usr/bin/env bats

set -o pipefail

load ../helpers

# Services

@test "kubernetes service" {
  retry_timeout "kubectl get svc kubernetes --namespace=default --no-headers" $TIMEOUT
}

@test "bitesize-registry service" {
  retry_timeout "kubectl get svc bitesize-registry --namespace=default --no-headers" $TIMEOUT
}

@test "fabric8 service" {
  retry_timeout "kubectl get svc fabric8 --namespace=default --no-headers" $TIMEOUT
}

@test "kube-dns service" {
  retry_timeout "kubectl get svc kube-dns --namespace=kube-system --no-headers" $TIMEOUT
}

@test "kube-ui service" {
  retry_timeout "kubectl get svc kube-ui --namespace=kube-system --no-headers" $TIMEOUT
}

@test "consul service" {
  retry_timeout "kubectl get svc consul --namespace=kube-system --no-headers" $TIMEOUT
}

@test "vault service" {
  retry_timeout "kubectl get svc vault --namespace=kube-system --no-headers" $TIMEOUT
}

@test "elasticsearch service" {
  retry_timeout "kubectl get svc elasticsearch --namespace=default --no-headers" $TIMEOUT
}

@test "elasticsearch-discovery service" {
  retry_timeout "kubectl get svc elasticsearch-discovery --namespace=default --no-headers" $TIMEOUT
}

@test "monitoring-heapster service" {
  retry_timeout "kubectl get svc monitoring-heapster --namespace=kube-system --no-headers" $TIMEOUT
}

 

ingress.bats

#!/usr/bin/env bats

set -o pipefail

load ../helpers

# Ingress

@test "consul ingress" {
  retry_timeout "kubectl get ing consul --namespace=kube-system --no-headers" $TIMEOUT
}

@test "vault ingress" {
  retry_timeout "kubectl get ing vault --namespace=kube-system --no-headers" $TIMEOUT
}

Now that we have a pretty good level of certainty the cluster stood up as expected, we can begin deeper testing into the various components and integrations within our platform. Stackstorm, Kafka, ElasticSearch, Grafana, Keycloak, Vault and Consul. AWS endpoints, internal endpoints, port mappings, security……….. and the list goes on.  All core components that our team provides our customers.

Stay tuned for more as it all begins to fall into place.

Kubernetes: A/B Cluster Deploys

Everything thing mentioned has been POCed and proven to work so far in testing. We run an application called Pulse in which we demoed it staying up throughout this A/B migration process.

Recently the team went through an exercise on how to deploy/manage a complete cluster upgrade. There were a couple options discussed along with what it would take to accomplish both.

  • In-situ upgrade – the usual
  • A/B upgrade –  a challenge

In the end, we chose to move forward with A/B upgrades. Keeping in mind the vast majority of our containers are stateless and thus quite easy to move around. Stateful containers are a bit bigger beast but we are working on that as well.

We fully understand A/B migrations will be difficult and In-situ will be required at some point but what the hell. Why not stretch ourselves a bit right?

So here is the gist:

Build a Terraform/Ansible code base that can deploy an AWS VPC with all the core components. Minus the databases in this picture this is basically our shell. Security groups, two different ELBs for live and pre-live, a bastion box, subnets and routes, our dns hosted zone and a gateway.

Screen Shot 2016-08-05 at 1.05.59 PM

This would be its own Terraform apply. Allowing our Operations folks to manage security groups, some global dns entries, any VPN connections, bastion access etc etc without touching the actual Kubernetes clusters.

We would then have a separate Terraform apply that will stand up what we call paasA. Which includes an Auth server, our Kubernetes nodes for load balancing (running ingress controllers), master-a, and all of our minions with the kubernetes ingress-controllers receiving traffic through frontend-live ELB.

Screen Shot 2016-08-05 at 1.06.31 PM

Once we decide to upgrade, we would spin up paasB. which is essentially a duplicate of paasA running within the same VPC.

Screen Shot 2016-08-05 at 1.06.41 PM

When paasB comes up, it gets added to the frontend pre-live ELB for smoke testing, end-to-end testing and the like.

Once paasB is tested to our satisfaction, we make the switch to the live ELB while preserving the ability to switch back if we find something major preventing a complete cut-over.

Screen Shot 2016-08-05 at 1.06.54 PM

We then bring down paasA and wwwaaaahhhllllllaaaaaa, PaaS upgrade complete.

Screen Shot 2016-08-05 at 1.07.25 PM

Now I think its obvious I’m way over simplifying this so lets get into some details.

ELBs – They stay up all the time. Our Kubernetes minions running nginx-controllers get labelled in AWS. So we can quickly update the ELBs whether live or prelive to point at the correct servers.

S3 buckets – We store our config files and various execution scripts in S3 for configuring our minions and the master. In this A/B scenario, each Cluster (paasA and paas) have their own S3 bucket where their config files are stored.

Auth Servers – Our paas deploys include our Keycloak Auth servers. We still need to work through how we transfer all the configurations or IF we choose to no longer deploy auth servers as apart of the Cluster deploy but instead as apart of the VPC.

Masters – New as apart of cluster deploy in keeping with true A/B.

Its pretty cool to run two clusters side by side AND be able to bring them up and down individually. But where this gets really awesome is when we can basically take all applications running in paasA and deploy them into paasB. I’m talking about a complete migration of assets. Secrets, Jenkins, Namespaces, ACLs, Resource Quotas and ALL applications running on paasA minus any self generated items.

To be clear, we are not simply copying everything over. We are recreating objects using one cluster as the source and the other cluster as the destination. We are reading JSON objects from the Kubernetes API and using those objects along with their respective configuration to create the same object in another cluster. If you read up on Ubernetes, you will find their objectives are very much in-line with this concept. We also have ZERO intent of duplicating efforts long term. The reality is, we needed this functionality before the Kubernetes project could get there. As Kubernetes federation continues to mature, we will continue to adopt and change. Even replacing our code with theirs. With this in mind, we have specifically written our code to perform some of these actions in a way that can very easily be removed.

Now you are thinking, why didn’t you just contribute back to the core project? We are in several ways. Just not this one because we love the approach the Kubernetes team is already taking with this. We just needed something to get us by until they can make theirs production ready.

Now with that I will say we have some very large advantages that enable us to accomplish something like this. Lets take Jenkins for example. We run Jenkins in every namespace in our clusters. Our Jenkins machines are self-configuring and for the most part stateless. So while we have to copy infrastructure level items like Kubernetes Secrets to paasB, we don’t have to copy each application. All we have to do is spin up the Jenkins container in each namespace and let them deploy all the applications necessary for their namespace. All the code and configuration to do so exists in Git repos. Thus PaaS admins don’t need to know how each application stack in our PaaS is configured. A really really nice advantage.

Our other advantage is, our databases currently reside outside of Kubernetes (except some mongo and cassandra containers in dev) on virtual machines. So we aren’t yet worried about migration of stateful data sets thus it has made our work around A/B cluster migrations a much smaller stepping stone. We are however placing significant effort into this area. We are getting help from the guys at Jetstack.io around storage and we are working diligently with people like @chrislovecnm to understand how we can bring database containers into production. Some of this is reliant upon new features like Petsets and some of it requires changes in how various databases work. Take for example Cassandra snitches where Chris has managed to create a Kubernetes native snitch. Awesome work Chris.

So what about Consul, its stateful right? And its in your cluster yes?

Well that’s a little different animal. Consul is a stateful application in that it runs in a cluster. So we are considering two different ways by which to accomplish this.

  1. Externalize our flannel overlay network using aws-vpc and allow the /16s to route to one another. Then we could essentially create one Consul cluster across two Kubernetes clusters, allow data to sync and then decommission the consul containers on the paasA.
  2. Use some type of small application to keep two consul clusters in sync for a period of time during paas upgrade.

Both of the options above have benefits and limitations.

Option 1:

  • Benefits:
    • could use a similar method for other clustered applications like Cassandra.
    • would do a better job ensuring the data is synced.
    • could push data replication to the cluster level where it should be.
  • Limitations:
    • we could essentially bring down the whole Consul cluster with a wrong move. Thus some of the integrity imagined in a full A/B cluster migration would be negated.

Option 2:

  • Benefits:
    • keep a degree of separation between each Kubernetes cluster during upgrade so one can not impact the other.
    • pretty easy to implement
  • Limitations:
    • specific implementation
    • much more to keep track of
    • won’t scale for other stateful applications

I’m positive the gents on my team will call me out on several more but this is what I could think off off the top.

We have already implemented Option #2 in a POC of our A/B migration.

But we haven’t chosen a firm direction with this yet. So if you have additional insight, please comment back.

Barring stateful applications, what are we using to migrate all this stuff between clusters? StackStorm. We already have it performing other automation tasks outside the cluster, we have python libraries k8sv1 and k8sv1beta for the Kubernetes API endpoints and its quite easy to extract the data and push it into another cluster. Once we are done with the POC we’ll be pushing this to our stackstorm repo here. @peteridah you rock.

In our current POC, we migrate everything. In our next POC, we’ll enable the ability to deploy specific application stacks from one cluster to another. This will also provide us the ability to deploy an application stack from one cluster into another for things like performance testing or deep breach management testing.

Lastly we are working through how to go about stateful container migrations. There are many ideas floating around but we would really enjoy hearing yours.

For future generations:

  • We will need some sort of metadata framework for those application stacks that span multiple namespaces to ensure we duplicate an environment properly.

 

To my team-

My hat off to you. Martin, Simas, Eric, Yiwei, Peter, John, Ben and Andy for all your work on this.

Migrate Docker Registry to GCloud Container Registry

Recently we chose to migrate our container registry to GCloud for the following reasons:

  1. We didn’t want to host it ourselves anymore.
  2. We wanted to distribute our docker images world wide for consumption in our Multi-Region scenario.
  3. We run Google Apps/Email and we could just hook into that for permissions to the registry.
  4. Its as close as we could find to a native docker push/pull scenario without spending stupid amounts of money.
  5. Endless number of repositories which was important considering we already have 30 right now and we are just getting started.
  6. We only get charged for storage consumption and egress requests (some caveats apply).
  7. Our old registry was only accessible from within our Platform and developers requested access so they could run images locally.

As was written in a historical post we also evaluated AWS ECR and took a high level look at several other docker image storage opportunities.

What we found is Google Cloud is doing some great things like providing excellent search opportunities we were missing with our own registry.

Its quite easy to search repositories, images and tags. Even though several of the search capabilities are in an alpha state, I’ve found they work quite well.

List images in a repo:

gcloud alpha container images list --repositories=<repository_name>

List version tags for a given image:

gcloud alpha container images list-tags gcr.io/<repository_name>/<image_name>

 

We then decided to take it to the next phase and ship all our images to Gcloud from our private repository and begin testing it in earnest. Stay tuned for more.

So we wrote a migration script to migrate everything from our private repo.

https://github.com/pearsontechnology/migr8-registry-gcloud

Its completely opensource and under Apache 2.0 so feel free to use it if you find it valuable.

The Readme is pretty good and is quite easy to initiate. It will transfer all repositories, images and version tags.

All you have to do is supply 4 environment variables and make sure glcoud sdk is installed and authenticated.

export GCLOUD_URL="gcr.io/<project-name>"
export REG_URL="docker-registry.example.com:5000"
export GCLOUDPATH = "/usr/bin/gcloud"
export DOCKERPATH = "/usr/bin/docker"