Kubernetes – Jobs

Ever want to run a recurring cronjob in Kubernetes? Maybe you want to recursively pull an AWS S3 bucket or gather data by inspecting your cluster. How about running some analytics in parallel or even running a series of tests to make sure the new deploy of your cluster was successful?

A Kubernetes Job might just be the answer.

So what exactly is a job anyway? Basically its a short lived replication controller. A job ensures that a task is successfully implemented even when faults in the infrastructure would otherwise cause it to fail. Consider it the fault tolerant way of executing a one-time pod/request. Or better yet cron with some brains. Oh and speaking of which, you’ll actually be able to run Jobs at specific times and dates here pretty soon in Kubernetes 1.3.

For example:

I have a Cassandra cluster in Kubernetes and I want to run:

nodetool repair -pr -h <host_ip>

on every node in my 10 node Cassandra cluster. And because I’m smart I’m going to run 10 different jobs, one at a time so I don’t overload my cluster during the repair.

Here be a yaml for you:

apiVersion: batch/v1
kind: Job
metadata:
  name: nodetool
spec:
  template:
    metadata:
      name: nodetool
    spec:
      containers:
      - name: nodetool
        image: some_private_repo:8500/nodetool
        command: ["/usr/bin/nodetool",  "repair", "-h", "$(cassandra_host_ip)"]
      restartPolicy: Never

A Kubernetes Job will ensure that each job runs through to successful completion. Pretty cool huh? Now mind you, its not smart. Its not checking to see if nodetool repair was successful. It simply looking to see if the pod exited successfully.

Another key point about Jobs is they don’t just go away after they run. Because you may want to check on the logs or status of the job or something. (Not that anyone would ever be smart and push that information to a log aggregation service). Thus its important to remember to run a Job to clean up your jobs? Yep. Do it. Just setup a Job to keep things tidy. Odd, I know, but it works.

kubectl delete jobs/nodetool

Now lets imagine I’m a bit sadistic and I want to run all my ‘nodetool repair’ jobs in parallel. Well that can be done too. Aaaannnnd lets imagine that I have a list of all the Cassandra nodes I want to repair in a queue somewhere.

I could execute the nodetool repair job and simply scale up the number of replicas. As long as the pod can pull the last Cassandra host from the queue, I could literally run multiple repairs in parallel. Now my Cassandra cluster might not like that much and I may or may not have done something like this before but…..well…we’ll just leave that alone.

kubectl scale --replicas=10 jobs/nodetoolrepair

There is a lot more to jobs than just this but it should give you an idea of what can be done. If you find yourself in a mire of complexity trying to figure out how to run some complex job, head back to the source. Kubernetes Jobs. I think I reread this link 5 times before I groked all of it. Ok, maybe it was 10. or so. Oh fine, I still don’t get it all.

To see jobs that are hanging around-

kubectl get pods -a

 

@devoperandi