Kubernetes: A/B Cluster Deploys

Everything thing mentioned has been POCed and proven to work so far in testing. We run an application called Pulse in which we demoed it staying up throughout this A/B migration process.

Recently the team went through an exercise on how to deploy/manage a complete cluster upgrade. There were a couple options discussed along with what it would take to accomplish both.

  • In-situ upgrade – the usual
  • A/B upgrade –  a challenge

In the end, we chose to move forward with A/B upgrades. Keeping in mind the vast majority of our containers are stateless and thus quite easy to move around. Stateful containers are a bit bigger beast but we are working on that as well.

We fully understand A/B migrations will be difficult and In-situ will be required at some point but what the hell. Why not stretch ourselves a bit right?

So here is the gist:

Build a Terraform/Ansible code base that can deploy an AWS VPC with all the core components. Minus the databases in this picture this is basically our shell. Security groups, two different ELBs for live and pre-live, a bastion box, subnets and routes, our dns hosted zone and a gateway.

Screen Shot 2016-08-05 at 1.05.59 PM

This would be its own Terraform apply. Allowing our Operations folks to manage security groups, some global dns entries, any VPN connections, bastion access etc etc without touching the actual Kubernetes clusters.

We would then have a separate Terraform apply that will stand up what we call paasA. Which includes an Auth server, our Kubernetes nodes for load balancing (running ingress controllers), master-a, and all of our minions with the kubernetes ingress-controllers receiving traffic through frontend-live ELB.

Screen Shot 2016-08-05 at 1.06.31 PM

Once we decide to upgrade, we would spin up paasB. which is essentially a duplicate of paasA running within the same VPC.

Screen Shot 2016-08-05 at 1.06.41 PM

When paasB comes up, it gets added to the frontend pre-live ELB for smoke testing, end-to-end testing and the like.

Once paasB is tested to our satisfaction, we make the switch to the live ELB while preserving the ability to switch back if we find something major preventing a complete cut-over.

Screen Shot 2016-08-05 at 1.06.54 PM

We then bring down paasA and wwwaaaahhhllllllaaaaaa, PaaS upgrade complete.

Screen Shot 2016-08-05 at 1.07.25 PM

Now I think its obvious I’m way over simplifying this so lets get into some details.

ELBs – They stay up all the time. Our Kubernetes minions running nginx-controllers get labelled in AWS. So we can quickly update the ELBs whether live or prelive to point at the correct servers.

S3 buckets – We store our config files and various execution scripts in S3 for configuring our minions and the master. In this A/B scenario, each Cluster (paasA and paas) have their own S3 bucket where their config files are stored.

Auth Servers – Our paas deploys include our Keycloak Auth servers. We still need to work through how we transfer all the configurations or IF we choose to no longer deploy auth servers as apart of the Cluster deploy but instead as apart of the VPC.

Masters – New as apart of cluster deploy in keeping with true A/B.

Its pretty cool to run two clusters side by side AND be able to bring them up and down individually. But where this gets really awesome is when we can basically take all applications running in paasA and deploy them into paasB. I’m talking about a complete migration of assets. Secrets, Jenkins, Namespaces, ACLs, Resource Quotas and ALL applications running on paasA minus any self generated items.

To be clear, we are not simply copying everything over. We are recreating objects using one cluster as the source and the other cluster as the destination. We are reading JSON objects from the Kubernetes API and using those objects along with their respective configuration to create the same object in another cluster. If you read up on Ubernetes, you will find their objectives are very much in-line with this concept. We also have ZERO intent of duplicating efforts long term. The reality is, we needed this functionality before the Kubernetes project could get there. As Kubernetes federation continues to mature, we will continue to adopt and change. Even replacing our code with theirs. With this in mind, we have specifically written our code to perform some of these actions in a way that can very easily be removed.

Now you are thinking, why didn’t you just contribute back to the core project? We are in several ways. Just not this one because we love the approach the Kubernetes team is already taking with this. We just needed something to get us by until they can make theirs production ready.

Now with that I will say we have some very large advantages that enable us to accomplish something like this. Lets take Jenkins for example. We run Jenkins in every namespace in our clusters. Our Jenkins machines are self-configuring and for the most part stateless. So while we have to copy infrastructure level items like Kubernetes Secrets to paasB, we don’t have to copy each application. All we have to do is spin up the Jenkins container in each namespace and let them deploy all the applications necessary for their namespace. All the code and configuration to do so exists in Git repos. Thus PaaS admins don’t need to know how each application stack in our PaaS is configured. A really really nice advantage.

Our other advantage is, our databases currently reside outside of Kubernetes (except some mongo and cassandra containers in dev) on virtual machines. So we aren’t yet worried about migration of stateful data sets thus it has made our work around A/B cluster migrations a much smaller stepping stone. We are however placing significant effort into this area. We are getting help from the guys at Jetstack.io around storage and we are working diligently with people like @chrislovecnm to understand how we can bring database containers into production. Some of this is reliant upon new features like Petsets and some of it requires changes in how various databases work. Take for example Cassandra snitches where Chris has managed to create a Kubernetes native snitch. Awesome work Chris.

So what about Consul, its stateful right? And its in your cluster yes?

Well that’s a little different animal. Consul is a stateful application in that it runs in a cluster. So we are considering two different ways by which to accomplish this.

  1. Externalize our flannel overlay network using aws-vpc and allow the /16s to route to one another. Then we could essentially create one Consul cluster across two Kubernetes clusters, allow data to sync and then decommission the consul containers on the paasA.
  2. Use some type of small application to keep two consul clusters in sync for a period of time during paas upgrade.

Both of the options above have benefits and limitations.

Option 1:

  • Benefits:
    • could use a similar method for other clustered applications like Cassandra.
    • would do a better job ensuring the data is synced.
    • could push data replication to the cluster level where it should be.
  • Limitations:
    • we could essentially bring down the whole Consul cluster with a wrong move. Thus some of the integrity imagined in a full A/B cluster migration would be negated.

Option 2:

  • Benefits:
    • keep a degree of separation between each Kubernetes cluster during upgrade so one can not impact the other.
    • pretty easy to implement
  • Limitations:
    • specific implementation
    • much more to keep track of
    • won’t scale for other stateful applications

I’m positive the gents on my team will call me out on several more but this is what I could think off off the top.

We have already implemented Option #2 in a POC of our A/B migration.

But we haven’t chosen a firm direction with this yet. So if you have additional insight, please comment back.

Barring stateful applications, what are we using to migrate all this stuff between clusters? StackStorm. We already have it performing other automation tasks outside the cluster, we have python libraries k8sv1 and k8sv1beta for the Kubernetes API endpoints and its quite easy to extract the data and push it into another cluster. Once we are done with the POC we’ll be pushing this to our stackstorm repo here. @peteridah you rock.

In our current POC, we migrate everything. In our next POC, we’ll enable the ability to deploy specific application stacks from one cluster to another. This will also provide us the ability to deploy an application stack from one cluster into another for things like performance testing or deep breach management testing.

Lastly we are working through how to go about stateful container migrations. There are many ideas floating around but we would really enjoy hearing yours.

For future generations:

  • We will need some sort of metadata framework for those application stacks that span multiple namespaces to ensure we duplicate an environment properly.

 

To my team-

My hat off to you. Martin, Simas, Eric, Yiwei, Peter, John, Ben and Andy for all your work on this.

Logging – Kafka topic by Kubernetes namespace

In the beginning, there was logging ……… AND there were single homed, single server applications, engineers to rummage through server logs, CDs for installing OSs and backup tape drives. Fortunately most everything else has gone the way of the dodo. Unfortunately, logging in large part has not.

When we started our PaaS project, we recognized logging was going to be of interest in a globally distributed, containerized, volatile, ever changing environment. CISO, QA and various business units all have data requirements that can be gathered from logs. All having different use cases and all wanting log data they can’t seem to aggregate together due to the distributed nature of our organization. Now some might think, we’ve done that. We use Splunk or ELK and pump all the logs into it and KA-CHOW!!! were done. Buutttt its not quite that simple. We have a crap ton of applications, tens of thousands of servers, tons of appliances, gear and stuff all over the globe. We have one application that literally uses 1 entire ELK stack by itself because the amount of data its pumping out is so ridiculous.

So with project Bitesize coming along nicely, we decided to take our first baby step into this realm. This is a work in progress but here is the gist. Dynamically configured topics through fluentd containers running in Kubernetes on each server host. A scalable Kafka cluster that holds data for a limited amount of time. Saves data off to permanent storage for long-term/bulk analytics. A Rest API or http interface. A management tool for security of the endpoint.

Where we’re at today is dynamically pushing data into Kafka via Fluentd based on Kubernetes namespace. So what does that mean exactly? EACH of our application stacks (by namespace) can get their own logs for their own applications without seeing everything else.

I’d like to give mad props to Yiwei Chen for making this happen. Great work mate. His image can be found on Docker hub at ywchenbu/fluentd:0.8.

This image contains just a few key fluentd plugins.

fluentd-plugin-kafka

fluentd-kubernetes-metadata-filter

fluentd-record-transformer – built into fluentd. No required install.

We are still experimenting with this so expect it to change but it works quite nicely and could be modified for use cases other than topics by namespace.

You should have the following directory in place on each server in your cluster.

Directory – /var/log/pos    # So fluentd can keep track of its log position

 

Here is td-agent.yaml.

apiVersion: v1
kind: Pod
metadata:
  name: td-agent
  namespace: kube-system
spec:
  volumes:
  - name: log
    hostPath:
      path: /var/log/containers
  - name: dlog
    hostPath:
      path: /var/lib/docker/containers
  - name: mntlog
    hostPath:
      path: /mnt/docker/containers
  - name: config
    hostPath:
      path: /etc/td-agent
  - name: varrun
    hostPath:
      path: /var/run/docker.sock
  - name: pos
    hostPath:
      path: /var/log/pos
  containers:
  - name: td-agent
    image: ywchenbu/fluentd:0.8
    imagePullPolicy: Always
    securityContext:
      privileged: true
    volumeMounts:
      - mountPath: /var/log/containers
        name: log
        readOnly: true
      - mountPath: /var/lib/docker/containers
        name: dlog
        readOnly: true
      - mountPath: /mnt/docker/containers
        name: mntlog
        readOnly: true
      - mountPath: /etc/td-agent
        name: config
        readOnly: true
      - mountPath: /var/run/docker.sock
        name: varrun
        readOnly: true
      - mountPath: /var/log/pos
        name: pos

You will probably notice something thing about this config that we don’t like. The fact that its running in privileged mode. We intend to change this in near future but currently fluentd cant read the log files without it. Not a difficult change, just haven’t made it yet.

This yaml gets placed in

/etc/kubernetes/manifests/td-agent.yaml

Kubernetes should automatically pick this up and deploy td-agent.

 

And here is where the magic happens. Below is td-agent.conf. Which according to our yaml should be located at

/etc/td-agent/td-agent.conf
<source>
  type tail
  path /var/log/containers/*.log
  pos_file /var/log/pos/es-containers.log.pos
  time_format %Y-%m-%dT%H:%M:%S.%NZ
  tag kubernetes.*
  format json
  read_from_head true
</source>

<filter kubernetes.**>
  type kubernetes_metadata
</filter>

<filter **>
  @type record_transformer
  enable_ruby
  <record>
    topic ${kubernetes["namespace_name"]}
  </record>
</filter>

<match **>
  @type kafka
  zookeeper SOME_IP1:2181,SOME_IP2:2181 # Set brokers via Zookeeper
  default_topic default
  output_data_type json
  output_include_tag  false
  output_include_time false
  max_send_retries  3
  required_acks 0
  ack_timeout_ms  1500
</match>

What’s happening here?

  1. Fluentd is looking for all log files in /var/log/containers/*.log
  2. Our kubernetes-metadata-filter is adding info to the log file with pod_id, pod_name, namespace, container_name and labels.
  3. We are transforming the data to use the namespace as the kafka topic
  4. And finally pushing the log entry to Kafka.

 

Here is an example of a log file you can expect to get from Kafka. All in json.

kafkaoutput

 

Alright so now that we have data being pushed to Kafka topic by namespace what can we do with it?

Next we’ll work on getting data out of Kafka.

Securing the Kafka endpoint so it can be consumed from anywhere.

And generally rounding out the implementation.

 

Eventually we hope Kafka will become an endpoint by which logs from across the organization can be consume. But naturally, we are starting bitesized.

 

Please follow me and retweet if you like what you see. Much appreciated.

 

@devoperandi

 

Kubernetes Python Clients – 1.2.2

I just created the Python Kubernetes Client for v1.2.2.

I’ve also added some additional information on how to gen your own client if you need/want to.

https://github.com/mward29/python-k8sclient-1-2-2

 

**Update

Created AutoScaling and new beta extensions client

https://github.com/mward29/python-k8sclient-autoscaling-v1

https://github.com/mward29/python-k8sclient-v1beta1-v1.2.2

Enjoy!

Vault in Kubernetes – Take 2

A while back I wrote about how we use Vault in Kubernetes and recently a good samaritan brought it to my attention that so much has changed with our implementation that I should update/rewrite a post about our current setup.

Again congrats to Martin Devlin for all the effort he has put in. Amazing engineer.

So here goes. Please keep in mind, I’ve intentionally abstracted various things out of these files. You won’t be able to copy and paste to stand up your own. This is meant to provide insight into how you could go about it.

If it has ###SOMETHING### its been abstracted.

If it has %%something%%, we use another script that replaces those for real values. This will be far less necessary in Kubernetes 1.3 when we can begin using variables in config files. NICE!

Also understand, I am not providing all of the components we use to populate policies, create tokens, initialize Vault, load secrets etc etc. Those are things I’m not comfortable providing at this time.

Here is our most recent Dockerfile for Vault:

FROM alpine:3.2
MAINTAINER 	Martin Devlin <martin.devlin@pearson.com>

ENV VAULT_VERSION    0.5.2
ENV VAULT_HTTP_PORT  ###SOME_HIGH_PORT_HTTP###
ENV VAULT_HTTPS_PORT ###SOME_HIGH_PORT_HTTPS###

COPY config.json /etc/vault/config.json

RUN apk --update add openssl zip\
&& mkdir -p /etc/vault/ssl \
&& wget http://releases.hashicorp.com/vault/${VAULT_VERSION}/vault_${VAULT_VERSION}_linux_amd64.zip \
&& unzip vault_${VAULT_VERSION}_linux_amd64.zip \
&& mv vault /usr/local/bin/ \
&& rm -f vault_${VAULT_VERSION}_linux_amd64.zip

EXPOSE ${VAULT_HTTP_PORT}
EXPOSE ${VAULT_HTTPS_PORT}

COPY /run.sh /usr/bin/run.sh
RUN chmod +x /usr/bin/run.sh

ENTRYPOINT ["/usr/bin/run.sh"]
CMD []

Same basic docker image build on Alpine. Not too much has changed here other than some ports, version of Vault and we have added a config.json so we can dynamically create the consul backend and set our listeners.

Lets have a look at config.json

### Vault config

backend "consul" {
  address = "%%CONSUL_HOST%%:%%CONSUL_PORT%%"
  path = "vault"
  advertise_addr = "https://%%VAULT_IP%%:%%VAULT_HTTPS_PORT%%"
  scheme = "%%CONSUL_SCHEME%%"
  token = %%CONSUL_TOKEN%%
  tls_skip_verify = 1
}

listener "tcp" {
  address = "%%VAULT_IP%%:%%VAULT_HTTPS_PORT%%"
  tls_key_file = "/###path_to_key##/some_vault.key"
  tls_cert_file = "/###path_to_crt###/some_vault.crt"
}

listener "tcp" {
  address = "%%VAULT_IP%%:%%VAULT_HTTP_PORT%%"
  tls_disable = 1
}

disable_mlock = true

We dynamically configure config.json with

CONSUL_HOST = Kubernetes Consul Service IP

CONSUL_PORT = Kubernetes Consul Service Port

CONSUL_SCHEME = HTTPS OR HTTP for connection to Consul

CONSUL_TOKEN = ACL TOKEN to access Consul

VAULT_IP = VAULT_IP

VAULT_HTTPS_PORT = Vault HTTPS Port

VAULT_HTTP_PORT = Vault HTTP Port

 

run.sh has changed significantly however. We’ve added ssl support and cleaned things up a bit. We are working on another project to transport the keys external to the cluster but for now this is a manual process after everything is stood up. Our intent moving forward is to store this information in what we call ‘the brain’ and provide access to each key to different people. Maybe sometime in the next few months I can talk more about that.

#!/bin/sh
if [ -z ${VAULT_HTTP_PORT} ]; then
  export VAULT_HTTP_PORT=###SOME_HIGH_PORT_HTTP###
fi
if [ -z ${VAULT_HTTPS_PORT} ]; then
  export VAULT_HTTPS_PORT=###SOME_HIGH_PORT_HTTPS###
fi

if [ -z ${CONSUL_SERVICE_HOST} ]; then
  export CONSUL_SERVICE_HOST="127.0.0.1"
fi

if [ -z ${CONSUL_SERVICE_PORT_HTTPS} ]; then
  export CONSUL_HTTP_PORT=SOME_CONSUL_PORT
else
  export CONSUL_HTTP_PORT=${CONSUL_SERVICE_PORT_HTTPS}
fi

if [ -z ${CONSUL_SCHEME} ]; then
  export CONSUL_SCHEME="https"
fi

if [ -z ${CONSUL_TOKEN} ]; then
  export CONSUL_TOKEN=""
else
  CONSUL_TOKEN=`echo ${CONSUL_TOKEN} | base64 -d`
fi

if [ ! -z "${VAULT_SSL_KEY}" ] &&  [ ! -z "${VAULT_SSL_CRT}" ]; then
  echo "${VAULT_SSL_KEY}" | sed -e 's/\"//g' | sed -e 's/^[ \t]*//g' | sed -e 's/[ \t]$//g' > /etc/vault/ssl/vault.key
  echo "${VAULT_SSL_CRT}" | sed -e 's/\"//g' | sed -e 's/^[ \t]*//g' | sed -e 's/[ \t]$//g' > /etc/vault/ssl/vault.crt
else
  openssl req -x509 -newkey rsa:2048 -nodes -keyout /etc/vault/ssl/vault.key -out /etc/vault/ssl/vault.crt -days 365 -subj "/CN=vault.kube-system.svc.cluster.local" 
fi

export VAULT_IP=`hostname -i`

sed -i "s,%%CONSUL_HOST%%,$CONSUL_SERVICE_HOST,"   /etc/vault/config.json
sed -i "s,%%CONSUL_PORT%%,$CONSUL_HTTP_PORT,"      /etc/vault/config.json
sed -i "s,%%CONSUL_SCHEME%%,$CONSUL_SCHEME,"       /etc/vault/config.json
sed -i "s,%%CONSUL_TOKEN%%,$CONSUL_TOKEN,"         /etc/vault/config.json
sed -i "s,%%VAULT_IP%%,$VAULT_IP,"                 /etc/vault/config.json
sed -i "s,%%VAULT_HTTP_PORT%%,$VAULT_HTTP_PORT,"   /etc/vault/config.json
sed -i "s,%%VAULT_HTTPS_PORT%%,$VAULT_HTTPS_PORT," /etc/vault/config.json

cmd="vault server -config=/etc/vault/config.json $@;"

if [ ! -z ${VAULT_DEBUG} ]; then
  ls -lR /etc/vault
  cat /###path_to_/vault.crt###
  cat /etc/vault/config.json
  echo "${cmd}"
  sed -i "s,INFO,DEBUG," /etc/vault/config.json
fi

## Master stuff

master() {

  vault server -config=/etc/vault/config.json $@ &

  if [ ! -f ###/path_to/something.txt### ]; then

    export VAULT_SKIP_VERIFY=true
    
    export VAULT_ADDR="https://${VAULT_IP}:${VAULT_HTTPS_PORT}"

    vault init -address=${VAULT_ADDR} > ###/path_to/something.txt####

    export VAULT_TOKEN=`grep 'Initial Root Token:' ###/path_to/something.txt### | awk '{print $NF}'`
    
    vault unseal `grep 'Key 1:' ###/path_to/something.txt### | awk '{print $NF}'`
    vault unseal `grep 'Key 2:' ###/path_to/something.txt### | awk '{print $NF}'`
    vault unseal `grep 'Key 3:' ###/path_to/something.txt### | awk '{print $NF}'`

  fi

}

case "$1" in
  master)           master $@;;
  *)                exec vault server -config=/etc/vault/config.json $@;;
esac

Alright now that we have our image, lets have a look at how we deploy it. Now that we have SSL in place and we’ve got some good ACLs we expose Vault external to the Cluster but still internal to our environment. This allows us to automatically populate Vault with secrets, keys and certs from various sources while still providing a high level of security.

service.yaml

apiVersion: v1
kind: Service
metadata:
  name: vault
  namespace: kube-system
  labels:
    name: vault
spec:
  ports:
    - name: vaultport
      port: ###SOME_VAULT_PORT_HERE###
      protocol: TCP
      targetPort: ###SOME_VAULT_PORT_HERE###
    - name: vaultporthttp
      port: 8200
      protocol: TCP
      targetPort: 8200
  selector:
    app: vault

Ingress.yaml

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: vault
  namespace: kube-system
  labels:
    ssl: "true"
spec:
  rules:
  - host: ###vault%%ENVIRONMENT%%.somedomain.com###
    http:
      paths:
      - backend:
          serviceName: vault
          servicePort: ###SOME_HIGH_PORT_HTTPS###
        path: /

 

replicationcontroller.yaml

apiVersion: v1
kind: ReplicationController
metadata:
  name: vault
  namespace: kube-system
spec:
  replicas: 3
  selector:
    app: vault
  template:
    metadata:
      labels:
        pool: vaultpool
        app: vault
    spec:
      containers:
        - name: vault
          image: '###BUILD_YOUR_IMAGE_AND_PUT_IT_HERE###'
          imagePullPolicy: Always
          env:
            - name: CONSUL_TOKEN
              valueFrom:
                secretKeyRef:
                  name: vault-mgmt
                  key: vault-mgmt
            - name: "VAULT_DEBUG"
              value: "false"
            - name: "VAULT_SSL_KEY"
              valueFrom:
                secretKeyRef:
                  name: ###MY_SSL_KEY###
                  key: ###key###
            - name: "VAULT_SSL_CRT"
              valueFrom:
                secretKeyRef:
                  name: ###MY_SSL_CRT###
                  key: ###CRT###
          readinessProbe:
            httpGet:
              path: /v1/sys/health
              port: 8200
            initialDelaySeconds: 10
            timeoutSeconds: 1
          ports:
            - containerPort: ###SOME_VAULT_HTTPS_PORT###
              name: vaultport
            - containerPort: 8200
              name: vaulthttpport
      nodeSelector:
        role: minion

WARNING: Add your volume mounts and such for the Kubernetes Secrets associated with the vault ssl crt and key.

 

As you can see, significant improvements made to how we build Vault in Kubernetes. I hope this helps in your own endeavors.

Feel free to reach out on Twitter or through the comments.

 

 

Registry Migration (ECR)

Today I’m going to provide a Registry migration script using Python that will allow you to migrate from a private docker registry to ECR. Keep in mind, its a script people. It got the job done. Its not fancy. Its not meant to cover all the possible ways in which you could do this. It doesn’t have a bunch of error handling. Its not meant to be run all the time. But it should give you a start if you need/want to do something similar. Please read the comments in the script. There are some environment vars and such to set prior to running.

Make sure AWS CLI is configured and run:

aws ecr get-login --region us-east-1

then run the command it gives back to you to login.

If you see the following error when running the script, you just managed to overload your repo. As a result I made the script more serial (instead of parallel) to help out but I still managed to overload it in serial mode once.

Received unexpected HTTP status: 500 Internal Server Error
Traceback (most recent call last):
  File "migrate.py", line 101, in <module>

  File "migrate.py", line 29, in __init__
    self._get_catalog()
  File "migrate.py", line 39, in _get_catalog
    self._run(mylist)
  File "migrate.py", line 55, in _run
    else:
  File "migrate.py", line 98, in _upload_image

  File "/usr/lib64/python2.7/subprocess.py", line 575, in check_output
    raise CalledProcessError(retcode, cmd, output=output)

 

If you get something like below you probably aren’t logged into ECR with the user you are running the script with.

Traceback (most recent call last):
  File "migrate.py", line 98, in <module>
    MigrateToEcr()
  File "migrate.py", line 29, in __init__
    self._get_catalog()
  File "migrate.py", line 39, in _get_catalog
    self._run(mylist)
  File "migrate.py", line 43, in _run
    self._ensure_new_repo_exists(line)
  File "migrate.py", line 74, in _ensure_new_repo_exists
    checkrepo = subprocess.check_output(command, shell=True)
  File "/usr/lib64/python2.7/subprocess.py", line 575, in check_output
    raise CalledProcessError(retcode, cmd, output=output)
subprocess.CalledProcessError: Command '/usr/local/bin/aws ecr describe-repositories' returned non-zero exit status 255

Link to the script on Github.

 

Why we aren’t using ECR in a follow on post.

Kubernetes, StackStorm and third party resources – Part 2

Alright, finally ready to talk about some StackStorm in depth. In the first part of this post I discussed some depth around Kubernetes Thirdpartyresource and how excited we are to use them. If you haven’t read it I would go back and start there. I however breezed over the Event Drive Automation piece (IFTTT) involving StackStorm. I did this for two reasons: 1) I was terribly embarrassed by the code I had written and 2) It wasn’t anywhere near where it should be in order for people to play with it.

Now that we have a submittal to StackStorm st2contrib, I’m going to open this up to others. Granted its not in its final form. In fact there is a TON left to do but its working and we decided to get the community involved should you be interested.

But first lets answer the question that is probably weighing on many peoples mind. Why StackStorm? There are many other event driven automation systems. The quick answer is they quite simply won us over. But because I like bullet points. Here are a few:

  1. None of the competition were in a position to work with, support or develop a community around IFTTT integration with Kubernetes.
  2. StackStorm is an open frame work that you and I can contribute back to.
  3. Its built on OpenStack’s Mistral Workflow engine so while this is a startup like company, the foundation of what they are doing has been around for quite some time.
  4. Stable
  5. Open Source code base. (caveat: there are some components that are enterprise add-ons that are not)
  6. Damn their support is good. Let me just say, we are NOT enterprise customers of StackStorm and I personally haven’t had better support in my entire career. Their community slack channel is awesome. Their people are awesome. Major props on this. At risk of being accused of getting a kick-back, I’m a groupie, a fanboy. If leadership changes this (I’m looking at you Mr. Powell), I’m leaving. This is by far and away their single greatest asset. Don’t get me wrong, the tech is amazing but the people got us hooked.

For the record, I have zero affiliation with StackStorm. I just think they have something great going on.

 

As I mentioned in the first post, our first goal was to automate deployment of AWS RDS databases from the Kubernetes framework. We wanted to accomplish this because then we could provide a seamless way for our dev teams to deploy their own database with a Kubernetes config based on thirdpartyresource (currently in beta).

Here is a diagram of the magic:

Screen Shot 2016-02-13 at 5.48.07 PM

Alright here is what’s happening.

  1. We have a StackStorm Sensor watching the Kubernetes API endpoint at /apis/extensions/v1beta1/watch/thirdpartyresources for events. thirdpartyresource.py
  2. When a new event happens, the sensor picks it up and kicks off a trigger. Think of a trigger like a broadcast message within StackStorm.
  3. Rules listen to trigger types that I think of as channels. Kind of like a channel on the telly. A rule based on some criteria, decides whether or not to act on any given event. It either chooses to drop the event or to take action on it. rds_create_db.yaml
  4. An action-chain then performs a series of actions. Those actions can either fail or succeed and additional actions happen based on the result of the last. db_create_chain.yaml
    1. db_rds_spec munges the data of the event. Turns it into usable information.
    2. from there rds_create_database reaches out to AWS and creates an RDS database.
    3. And finally configuration information and secrets are passed back to Consul and Vault for use by the application(s) within Kubernetes. Notice how the action for Vault and Consul are grey. Its because its not done yet. We are working on it this sprint.

Link to the StackStorm Kubernetes Pack. Take a look at the Readme for information on how to get started.

Obviously this is just a start. We’ve literally just done one thing in creating a database but the possibilities are endless for integrations with Kubernetes.

I mentioned earlier the StackStorm guys are great. And I want to call a couple of them out. Manas Kelshikar, Lakshmi Kannan and Patrick Hoolboom. There are several others that have helped along the way but these three helped get this initial pack together.

The initial pack has been submitted as a pull request to StackStorm for acceptance into st2contrib. Once the pull request has been accepted to st2contrib, I would love it if more people in the Kubernetes community got involved and started contributing as well.

 

@devoperandi

 

Kubernetes, StackStorm and third party resources

WARNING!!! Beta – Not yet for production.

You might be thinking right now, third party resources? Seriously? With all the amazing stuff going on right now around Kubernetes and you want to talk about that thing at the bottom of the list. Well keep reading, hopefully by the end of this, you too will see the light.

Remember last week when I talked about future projects in my Python Client For Kubernetes blog? Well here it is. One key piece of our infrastructure is quickly becoming StackStorm.

What’s StackStorm you might ask? StackStorm is an open source event driven automation system which hailed originally from OpenStack’s Mistral workflow project. In fact, some of its libraries are from mistral but its no longer directly tied to OpenStack. Its a standalone setup that rocks. As of the time of this writing, StackStorm isn’t really container friendly but they are working to remediate this and I expect a beta to be out in the near future. Come on guys, hook a brother up.

For more information on StackStorm – go here.

I’ll be the first to admit, there documentation took me a little while to grok. Too many big words and not enough pics to describe what’s going on. But once I got it, nothing short of meeting Einstein could have stopped my brain from looping through all the possibilities.

Lets say, we want to manage an RDS database from Kubernetes. We should be able to create, destroy, configure it in conjunction with the application we are running and even more importantly, it must be a fully automated process.

So what’s it take to accomplish something like this? Well in our minds we needed a way to present objects externally i.e. third party resources and we need some type of automation that can watch those events and act on them ala StackStorm.

Here is a diagram of our intentions: We have couple loose ends to complete but soon we’ll be capable of performing this workflow for any custom resource. Database just happens to be the first requirement we had that fit the bill.

Screen Shot 2016-02-05 at 8.24.51 PM

In the diagram above we are perform 6 basic actions.

– Input thirdpartyresource to Kubernetes

– StackStorm watches for resources created, deleted OR modifed

– If trigger – makes call to AWS API to execute an event

– Receives back information

– On creation or deletion, adds or remove necessary information from Vault and Consul

 

Alright from the top, what is a third party resource exactly? Well its our very own custom resource. Kind of like a pod, endpoint or replication controller are API resources but now we get our own.

Third Party Resources immediately stood out to us because we now have the opportunity to take advantage of all the built-in things Kubernetes provides like metadata, labels, annotations, versioning, api watches etc etc while having the flexibility to define what we want in a resource. What’s more, third party resources can be grouped or nested.

Here is a an example of a third party resource:

metadata:
  name: mysql-db.prsn.io
  labels:
    resource: database
    object: mysql
apiVersion: extensions/v1beta1
kind: ThirdPartyResource
description: "A specification of database for mysql"
versions:
  - name: stable/v1

This looks relatively normal with one major exception. The metadata.name = mysql-db.prsn.io. I’ve no idea why but you must have a fully qualified domain in the name in order for everything to work properly. The other oddity is the “-“. It must be there and you must have one. Something to do with <CamelCaseKind>.

Doing this creates

/apis/prsn.io/stable/v1/namespaces/<namespace>/mysqls/...

By creating the resource above, we have essentially created our very own api endpoint by which to get all resources of this type. This is awesome because now we can create mysql resources and watch them under one api endpoint for consumption by StackStorm.

Now imagine applying a workflow like this to ANYTHING you can wrap your head around. Cool huh?

Remember this is beta and creating resources under the thirdpartyresource (in this case mysqls) requires a little curl at this time.

{
   "metadata": {
     "name": "my-new-mysql-db"
   },
   "apiVersion": "prsn.io/stable/v1",
   "kind": "MysqlDb",
   "engine_version": "5.6.23",
   "instance_size": "huge"
}

There are three important pieces here. 1) its json. 2) apiVersion has the FQDN + versions.name for the thirdpartyresource. 3) kind = MysqlDb <CamelCaseKind>

Now we can curl the Kubernetes api and post this resource.

curl -d "{"metadata":{"name":"my-new-mysql-database"},"apiVersion":"prsn.io/stable/v1","kind":"MysqlDb","engine_version":"5.6.23","instance_size":"huge"}" https://kube_api_url

 

Now if you hit you kubernetes api endpoint you should see something like this:

{
  "paths": [
    "/api",
    "/api/v1",
    "/apis",
    "/apis/extensions",
    "/apis/extensions/v1beta1",
    "/apis/prsn.io",
    "/apis/prsn.io/stable/v1",
    "/healthz",
    "/healthz/ping",
    "/logs/",
    "/metrics",
    "/resetMetrics",
    "/swagger-ui/",
    "/swaggerapi/",
    "/ui/",
    "/version"
  ]
}

Our very own Kubernetes endpoint now in /apis/prsn.io/stable/v1.

And here is a resource under the mysql thirdpartyresource located at:

/apis/prsn.io/stable/v1/mysqldbs
{
  "kind": "MysqlDb",
  "items": [
    {
      "apiVersion": "prsn.io/stable/v1",
      "kind": "MysqlDb",
      "metadata": {
        "name": "my-new-mysql-db",
        "namespace": "default",
        "selfLink": "/apis/prsn.io/stable/v1/namespaces/default/mysqldbs/my-new-mysql-db"
        ...
      }
    }
  ]
}

If your mind isn’t blown by this point, move along, I’ve got nothin for ya.

 

Ok on to StackStorm.

Within StackStorm we have a Sensor that watches the Kubernetes api for a given third party resource. In this example, its looking for MysqlDb resources. From there it compares the list of MysqlDb resources against a list of mysql databases (rds in this case) that exist and determines what/if any actions it needs to perform. The great thing about this is StackStorm already has quite a number of what they call packs. Namely an AWS pack. So we didn’t have to do any of the heavy lifting on that end. All we had to do was hook in our python client for Kubernetes, write a little python to compare the two sets of data and trigger actions based off the result.

AWS/StackStorm Pack

It also has a local datastore so if you need to store key/value pairs for any length of time, that’s quite easy as well.

Take a look at the bottom of this page for operations against the StackStorm datastore.

We’ll post our python code as soon as it makes sense. And we’ll definitely create a pull request back to the StackStorm project.

Right now we are building the workflow to evaluate what actions to take. We’ll update this page as soon as its complete.

 

If you have questions or ideas on how else to use StackStorm and ThirdPartyResources, I would love to hear about them. We can all learn from each other.

 

 

@devoperandi

 

Other beta stuff:

deployments – https://github.com/kubernetes/kubernetes/blob/master/docs/proposals/deployment.md

horizontalpodautoscaler – https://github.com/kubernetes/kubernetes/blob/release-1.1/docs/design/horizontal-pod-autoscaler.md

ingress – http://kubernetes.io/v1.1/docs/user-guide/ingress.html

Which to be fair I have talked about this in the blog about Load Balancing

jobs – https://github.com/kubernetes/kubernetes/blob/release-1.1/docs/user-guide/jobs.md

 

 

No part of this blog is sponsored or paid for by anyone other than the author. 

 

Python Client for Kubernetes

For reasons I’ll divulge in a future post, we needed a python client to interact with Kubernetes. Our latest and greatest work is going to rely pretty heavily on it and we’ve had difficulty finding one that is fully functional.

SPOILER: Go to the bottom of the article if you just want the code. 😉

We explored options like libCloud, pykube and even went back to some of the original python-kubernetes clients like you would see on Pypi. What we found was they were all either a) out of date, b) still very much in their infancy or c) no longer being contributed. And we realized sitting around waiting on someone else to build and maintain one just wasn’t going to work.

So with a lot of exploring and a ton of learning (primarily due to my lack of python skillz), we came to realize we could simply generate our own with codegen. You see, Kubernetes uses swagger for its API and codegen allows us to create our own python client using the swagger spec.

# on mac install swagger-codegen

brew install swagger-codegen

Acquire v1.json from v1.json at Kubernetes website

and run something like:

swagger-codegen generate -l python -o k8sclient -i v1.json

And this was fantastic……..until it didn’t work and the build fails.

You see, Kubernetes is running swagger spec 1.2 and they are using “type”: “any” which is an undefined custom type and codegen doesn’t know how to handle it.

See the github issues referenced here and here for a more detailed explanation.

The end result is, while custom types in swagger-spec 1.2 are allowed, there was no way to document the custom types for codegen to consume. This is fixed in swagger-spec 2.0 with “additionalProperties” to allow this mapping to occur.

But we still had a problem. We couldn’t easily create a python client from codegen.

So what we have done, right or wrong, is replace everything in the v1.json of

"type": "any"

with

"type": "string"

and it works.

With that here is a link to the v1.json file with the change.

But we also did the same thing for extensions/v1beta because we are working on some future endeavors so here is a link to that as well.

With these v1.json and v1.beta1.json files you should be able to create your own python client for Kubernetes.

Or if you choose, you could just use the clients we created. We intend to keep these clients updated but if you find we haven’t, feel free to create your own. Its dead simple.

https://github.com/mward29/python-k8sclient

https://github.com/mward29/python-k8sclient-v1beta1

 

As a final departing note, these python clients have NOT been fully vetted. We have not run across any issues as of this moment but if you find an issue before we do, PLEASE be a good samaritan and let us know.

The beta version, because its running against the beta api extensions may not have everything you would expect in it.