Kubernetes: Pod Disruption Budget

PodDisruptionBudget is a relatively new paradigm in Kubernetes.

At its core, it ensures a certain number or percentage of pods with an assigned label will not Voluntarily be evicted at any one point in time.

As an example, lets imagine we are draining a server for the purpose of a restart. There are 5 pods of the same application (with the same label) running on the Kubernetes cluster. Two of which are running on this server we intend to restart. If our PodDisruptionBudget requires a minimum of 80% of pods to be available, the budget would only allow for one pod to be down at a time.

Example:

apiVersion: policy/v1alpha1
kind: PodDisruptionBudget
metadata:
  name: disruptme
spec:
  selector:
    matchLabels:
      name: myapp5pods
  minAvailable: 80%

 

Kubectl drain it will respect PodDisruptionBudget. Thus when we run drain on this node we intend to restart, the cluster will ensure only one pod comes down at a time and ensure the pod has been rescheduled and is running on another server before then bringing down the second pod that was running on this server.

disruption.json

{
  "apiVersion": "policy/v1beta1",
  "kind": "Eviction",
  "metadata": {
    "name": "myapp5pods-4050136386-d6ao9",
    "namespace": "default"
  }
}

 

At the time of this writing, I was unable to use kubectl to evict pods but curl is an option.

curl -v -H 'Content-type: application/json' https://10.253.92.16:8080/api/v1/namespaces/default/pods/myapp5pods-4050136386-d6ao9/eviction -d @disruption.json

 

PodDisruptionBudget is going to become very valuable as companies begin managing larger and larger clusters of Kubernetes.

 

Use cases:

Quorum based applications (assuming good shutdown procedures)

Applications requiring X number of pods to be available under load

 

k8s training videos – looking for feedback

As I’ve never done this before, I’m looking for feedback around some training videos I’m starting to create.

Granted this is a fairly generic topic but I’d love your feedback on format, look, feel, if its something you would consume, how long the videos should be etc etc.

Please help me out here. I could really use your insight.

The topic here is about Pod Scheduling. Its not really finished but it should give a good idea on the direction I’m thinking. I’m trying to get these topics within 3-5 minutes while not compromising the value of the content. My intent is to go deep on topics but divide them up by subtopics. If this is the wrong direction, I NEED to know this.

Please please please give me constructive feedback. I’m doing this for all of us after all.

 

 

Thanks

 

 

 

deployment pipeline options for Kubernetes

In the last several months various deployment (CI/CD) pipelines have cropped up within the Kubernetes community and our team also released one at KubeCon Seattle 2016. As a result I’ve been asked on a couple different occasions why we built our own. So here is my take.

We began this endeavor sometime late 2015. You can see our initial commit is on Feb 27th, 2016 to Github. But if you look closer, you will notice its a very large commit. Some 806 changes. This is because we began this project quite a bit before this time. So what does this mean? Nothing other than we weren’t aware of any other CD pipeline projects at the time and we needed one. So we took it upon ourselves to create one. You can see my slides at KubeCon London March 2016 where I talk about it a fair amount.

My goal of this blog is not to persuade you of one particular CD pipeline or another. I simply don’t care beyond the fact that contributions to Pearson’s CD pipeline would mean we can make it better faster. Beyond this we get nothing out of it.

 

 

Release it as Open Source –

We chose to release our project as open source for the community to consume at Kubecon Seattle 2016. To share our thoughts and experience on the topic, offer another option for the community to consume and provide insight to how we prefer to build/deploy/test/manage our environments.

Is it perfect? No.

Is it great? Yes, in my opinion its pretty great.

Does it have its own pros AND cons? hell yes it does.

 

Lets dig in, shall we?

There are two projects I’m aware of that claim many of the same capabilities. It is NOT my duty to explain all of them in detail but rather point out what I see as pros/cons/differences (which often correlate to different modes of thought) between them.

I will be happy to modify this blog post if/when I agree with a particular argument so feel free to add comments.

 

 

Fabric8 CD Pipeline – (Fabric8)

https://fabric8.io/guide/cdelivery.html

The Fabric8 CD pipeline was purpose built for Kubernetes/OpenShift. It has some deeper integrations with external components that are primarily RedHat affiliated. Much of the documentation focuses on Java based platforms even though they remark they can integrate with many other languages.

 

Kubernetes/OpenShift

Capable of working out of the box with Kubernetes and OpenShift. My understanding is you can set this as a FABRIC8_PROFILE to enable one or the other.

 

Java/JBoss/RedHat as a Focus point

While they mention being able to work with multiple languages, their focus is very much Java. They have some deep integrations with Apache Camel and other tools around Java including JBoss Fuse.

 

Artifact Repository

The CD pipeline for Fabric8 requires Nexus, an artifact repository.

 

Gogs or Github

Gogs is required for on-prem git repository hosting. I’m not entirely sure why Gogs would matter if simply accessing a git repo but apparently it does. Or there is integration with github.

 

Code Quality

Based on the documentation Fabric8 appears to require SonarType for use around code quality. This is especially important if you are running a Java project as Fabric8 automatically recognizes and attempts to integrate them. SonarType can support a variety of languages based on your use-case.

 

Pipeline Librarys

Fabric8 has a list of libraries for reusable bits of code to build your pipeline from. https://github.com/fabric8io/fabric8-jenkinsfile-library. Unfortunately these libraries tend to have some requirements around SonarType and the like.

 

Multi-Tenant

I’m not entirely sure as of yet.

 

Documentation

Fabric8’s documentation is great but very focused on Java based applications. Very few examples include any other languages.

 

 

Pearson CD Pipeline – (Pearson)

https://github.com/pearsontechnology/deployment-pipeline-jenkins-plugin

 

Kubernetes Only

Pure Kubernetes integration. No OpenShift.

 

Repository Integration

git. Any git, anywhere, via ssh key.

 

Language agnostic

The pipeline is entirely agnostic to language or build tools. This CI/CD platform does not specify any deep integrations into other services. If you want it, specify the package desired through the yaml config files and use it. Pearson’s CD pipeline isn’t specific to any particular language because we have 400+ completely separate development teams who need to work with it.

 

Artifact Repository

The Pearson CD pipeline does not specify any particular Artifact repo. It does however use a local aptly repo for caching deb packages. However nothing prevents artifacts from being shipped off to anywhere you like through the build process.

 

Ubuntu centric (currently)

Currently the CD pipeline is very much Ubuntu centric. The project has a large desire to integrate with Alpine and other base images but we simply aren’t there yet. This would be an excellent time to ask our community for help. please?

 

Opinionated

Pearson’s CD pipeline is opinionated about how and the order in which build/test/deploy happens. The tools used to perform build and test however are up to you. This gives greater flexibility but places the onus on the team around their choice of build/test tools.

 

Code Quality

The Pearson CD pipeline performs everything as code. What this is means is all of your tests should exist as code in a repository. Then simply point Jenkins to that repo and let it rip. The pipeline will handle the rest including spinning off the necessary number of slaves to do the job.

 

Ease of Use

Pearson’s CD pipeline is simple once the components are understood. Configuration code is reduced to a minimum.

 

Scalability

The CD pipeline will automatically spin up Jenkins slaves for various work requirements. It doesn’t matter if there are 1 or 50 microservices, build/test/deploy is relatively fast.

 

Tenancy

Pearson’s CD pipeline is intended to be used as a pipeline per project. Or better put, a pipeline per development team as each Jenkins pipeline can manage multiple Kubernetes namespaces. Pearson divides dev,stg,prd environments by namespace.

 

Documentation

Well lets just say Pearson’s documentation on this subject matter are currently lacking. There are plenty of items we need to add and it will be coming soon.

 

 

Final Thoughts:

Fabric8 deployment pipeline

The plugin the fabric8 team have built for integrating with Kubernetes/OpenShift is awesome. In fact the Pearson deployment pipeline intends to take advantage of some of their work. Hence the greatness of the open source community. If you have used Jenkinsfile, this will feel familiar to you. The Fabric8 plugin is focused on what tools should be used (ie SonarType, Nexus, Gogs, Apache Camel, Jboss Fuse). This could be explained away as having deeper integration allowing for a seamless integration but I would argue that most of these have APIs and its not difficult to make a call out which would allow for a tool agnostic approach. They also have a very high degree of focus on Java applications which doesn’t lend itself to the rest of the dev ecosystem. As I mentioned above, they do state they can integrate with other languages but I’ve been unable to find good examples of this in documentation.

Note: I was unable to find documentation on how the Fabric8 deployment pipeline scales. If someone has this information readily available I would love to read/hear about it. Its quite possible I just missed it.

Provided Jenkinsfile is a known entity, Java centric is the norm and you already integrate with many of the tools Fabric8 provides, this is probably a great fit for your team. If you need to have control over the CI/CD process, Fabric8 could be a good fit for you.

 

Pearson deployment pipeline

This is an early open source project. There are limitations around Ubuntu which we intend to alleviate. We simply haven’t had the demand from our customers to prioritize it yet. **Plug the community getting involved here***. Pearson’s deployment pipeline is very flexible from the sense of what tools it can integrate with yet more deterministic as to how the CI/CD process should work. There is no limitation on language. The Pearson deployment pipeline is easy to get started with and highly scalable. Jenkins will simply scale the number of slaves it needs to perform. Because the deployment pipeline abstracts away much of the CI/CD process, the yaml configuration will not be familiar at first.

If you don’t know Jenkins and you really don’t want to know the depths of Jenkins, Pearson’s pipeline tool might be a good place to start. Its simple 3 yaml config files will reduce the amount of configuration you need to get started. I would posit it will be half as many lines of config to create your pipeline.

The Pearson Deployment Pipeline project needs better examples/templates on how to work with various languages

 

Note:

Please remember, this blog is at a single point in time. Both projects are moving, evolving and hopefully shaping the way we think about pipelines in a container world.

 

Key Considerations:

Fabric8

Integrates well with OpenShift and Kubernetes

Tight integrations with other tools like SonarType, Camel and ActiveMQ, Gog etc etc

Less focus on how the CI/CD pipeline should work

Java centric

Requires other Fabric8 projects to get full utility from it

Tenancy – I’m not entirely sure. I probably just missed this in the documentation.

 

Pearson

Kubernetes Only

All purpose CD pipeline

Language Agnostic

More opinionated about the build/test/deploy process

Highly Scalable

Tenant per dev team/project

Easy transition for developers to move between dev teams

More onus on teams to create their build artifacts

 

 

 

 

 

StackStorm for Kubernetes just took a giant leap forward (beta)

 

came up with it one morning around 4am while trying to get the baby to sleep.

i’m pretty proud. mostly because it works 😉

 – Andy Moore

 

As many of you know, my team began integrating StackStorm with Kubernetes via ThirdPartyResources(TPR) which we showed off at KubeCon_London in March 2016. This was a great start to our integrations with Kubernetes and allowed us to expand our capabilities around managing datastores simply by posting a TPR to the Kubernetes API. Allowing StackStorm to build/deploy/manage our database clusters automatically.

This however only worked with ThirdPartyResources. In fact, it only worked with the ‘beta’ TPRs which were significantly revamped before making it into GA.

With that Andy Moore figured out how to automatically generate a StackStorm pack crammed full of exciting new capabilities for both StackStorm Sensors and Actions.

Link:

https://github.com/pearsontechnology/st2contrib/tree/bite-1162/packs/kubernetes

You will notice this has not been committed back upstream to StackStorm yet. Our latest version diverges significantly from the original pack we pushed and we need to work with the StackStorm team for the best approach to move forward.

@stackstorm if you want to help us out with this, we would be very appreciative.

screen-shot-2016-12-02-at-2-37-51-pm

The list of new capabilities for Kubernetes is simply astounding. Here are just a few:

Authentication
RBAC
HorizontalPodAutoscalers
Batch Jobs
CertificateSigningRequests
ConfigMaps
PersistentVolumes
Daemonsets
Deployments/DeploymentRollBack
Ingress
NetworkPolicy
ThirdPartyResources
StorageClasses
Endpoints
Secrets

Imagine being able to configure network policies through an automated StackStorm workflow based on a particular projects needs.

Think about how RBAC could be managed using our Kubernetes Authz Webhook through StackStorm.

Or how about kicking of Kubernetes Jobs to Administer some cluster level cleanup activity but handing that off to your NOC.

Or allowing your Operations team to patch a HorizontalPodAutoscaler through a UI.

We could build a metadata framework derived from the Kubernetes API annotations/labels for governance.

The possibilities are now literally endless. Mad props go out to Andy Moore for all his work in this endeavor.

 

Ok so why am I listing this as beta?

There is a freak ton of capabilities in our new st2 pack that we haven’t finished testing. So if you are adventurous, want to play with something new and can help us, we would love your feedback.

Thus far our testing has included the following:

Secrets

Services

Deployments

Ingresses

Physical Volumes

Replication Controllers

Quotas

Service Accounts

Namespaces

Volumes

 

Hope you get as excited about this as we are. We now have a way to rapidly integrate Kubernetes with ….. well …… everything else.

@devoperandi

 

Note: As soon as we have cleaned up a few things with the generator for this pack, we’ll open source it to the community.

 

Past Blogs around this topic:

KubeCon 2016 Europe (Slides)

 

 

Kubernetes, StackStorm and third party resources

 

Kubernetes, StackStorm and third party resources – Part 2

 

 

KubeCon Seattle Video

Finally posting this after my speaking engagement at KubeCon_Seattle in November 2016. Thanks to all that came. My hope is releasing our Deployment Pipeline will help the Kubernetes community build an ecosystem of open source CI/CD pipelines to support an awesome platform.

Below the video are links to the various open source projects we have created which are in the last slide of the Conference deck.

Link to the Deployment Pipeline:

https://github.com/pearsontechnology/deployment-pipeline-jenkins-plugin

 

 

 

Vault SSL Integration:

https://github.com/devlinmr/contrib/tree/master/ingress/controllers/nginx-alpha-ssl

 

Kubernetes Tests:

https://github.com/pearsontechnology/kubernetes-tests

 

StackStorm Integrations:

https://github.com/pearsontechnology/st2contrib

 

Authz Webhook:

https://github.com/pearsontechnology/bitesize-authz-webhook

Kube-DNS – a little tuning

We recently upgraded Kube-dns.

gcr.io/google_containers/kubedns-amd64:1.6
gcr.io/google_containers/kube-dnsmasq-amd64:1.3

Having used SkyDNS up to this point, we ran into some unexpected performance issues. In particular we were seeing pretty exaggerated response times from kube-dns when making requests it is not authoritative on (i.e. not cluster.local).

Fortunately this was on a cluster not yet serving any production customers.

It took several hours of troubleshooting and getting a lot more familiar with our new DNS and Dnsmasq, in particular the various knobs we could turn but what hinted us off to our solution was the following issue.

https://github.com/kubernetes/kubernetes/issues/27679

** Update

Adding the following line to our “- args” config under gcr.io/google_containers/kubednsmasq-amd64:1.3 did the trick and significantly improved dns performance.

- --server=/cluster.local/127.0.0.1#10053
- --resolv-file=/etc/resolv.conf.pods

By adding the second entry we ensure requests only go upstream from kube-dns instead of back to the host level resolver.

/etc/resolv.conf.pods points only to external dns, in our case, AWS DNS for our VPC which is always %.%.%.2 for whatever your VPC IP range is.

** End Update

In either case, we have significantly improved performance on DNS lookups and are excited to see how our new DNS performs under load.

 

Final thoughts:

Whether tuning for performance or not realizing your cluster requires a bit more than 200Mi RAM and 1/10 of a CPU, its quite easy to overlook kube-dns as a potential performance bottleneck.

We have a saying on the team, if its slow, check dns. If it looks good, check it again. And if it still looks good, have someone else check it. Then move on to other things.

Kube-dns has bit us so many times, we have a dashboard to monitor and alert just on it. These are old screen caps from SkyDNS but you get the point.

screen-shot-2016-10-28-at-6-46-14-pm

screen-shot-2016-10-28-at-6-46-34-pm

 

 

Kubernetes Init Containers

Kubernetes Init containers. Alright, I’m just going to tell the truth here. When I first started reading about them, I didn’t get it. I thought to myself, “with all the other stuff they could be doing right now at this early stage of Kubernetes, what the hell were they thinking? Seriously?” But that’s because I just didn’t get it. I didn’t see the value. I mean, don’t get me wrong, Init containers are good for many reasons. Transferring state between Pets, detecting databases are up prior to starting an app, Configuring PVCs with information the primary app needs etc etc. These are all important things but there are already work arounds for this stuff. Entrypoint anyone?

And then I read one line in the petset documentation (of all places) and I had a Aha! moment.

“…allows you to run docker images from third-party vendors without modification.”

That is a HUGE reason for Init Containers and in my mind should be the biggest validation of their need as a broader Kubernetes use-case.

At Pearson we have to modify existing docker images all the time to fit our needs. Whether its clustering Consul, modding Fluentd, seeding Cassandra or setting up discovery for ElasticSearch clustering. These are all things we have done and had to create our own custom images to manage. And in some cases requiring a private docker repository to do so. Hell, half the stuff I’ve written about has caused me to put out our Dockerfiles just so you could take advantage of them. If we had init containers in the first place, it would have been a lot less code and a lot more “hey, go pull this init container and use it” in my blog posts.

Alright with that, I’m actually just going to point you to the documentation on this one. Its pretty good and gives you exactly what you need to get started.

Kubernetes Init Containers

One key thing to remember, Init containers for a given app run in serial.

Now my team has to go back to work rewriting all our old shit.

Kubernetes container guarantees (and oversubscription)

Reading through the release notes of Kubernetes 1.4, I came across some fantastical news. News so good, I should have expected it. News that could not come at a better time. I’m talking about container guarantees, or what Kubernetes calls Resource Quality of Service. Let me be frank here, its like the Kubernetes team was just trying to confuse me. I’m sure the rest of you immediately knew what they were talking about but I’m a simpleton. So after reading it 5 times, I think I finally got ahold of it.

In a nutshell, when resource min and max values are set, quality of service dictates container priority when a server is oversubscribed.

Let me say this another way, we can oversubscribe server resources AND decide which containers stay alive and which ones get killed off.

Think of it like Linux OOM killer but with more fine grained control. In the Linux OOM Killer, the only thing you can do to help determine what does or does not get killed off, is adjust oom_score_adj per process. Which as it turns out is exactly what Kubernetes is doing.

Here are the details:

There are 3 levels of priority.

BestEffort – These are the containers Kubernetes will kill off first when under memory pressure.

Guaranteed – Take top priority over everything else. Kubernetes will try everything to keep these alive.

Burstable – Likely to be killed off when no more BestEffort pods exist and they have exceeded the REQUEST amount.

 

And there are two parameters you need to consider.

request – the base number of resources (cpu and ram) a container wants at runtime.

limit – The upper limit the container can consume if not already used elsewhere.

Notice how I mentioned memory pressure up above. Under CPU pressure, nothing will be killed off. Containers will simply get throttled instead.

 

So how do we determine which priority level a container will have?

Guaranteed if request == limit OR only the limits set

which looks like:

containers:
    name: mywebapp
        resources:
            limits:
                cpu: 10m
                memory: 1Gi
            requests:
                cpu: 10m
                memory: 1Gi

OR

containers:
    name: foo
        resources:
            limits:
                cpu: 10m
                memory: 1Gi

### Setting requests is optional

 

 

Burstable if request less than limit OR one of the containers has nothing set

containers:
    name: foo
        resources:
            limits:
                cpu: 10m
                memory: 1Gi
            requests:
                cpu: 10m
                memory: 1Gi

    name: bar

now recognize there are two containers above, one with nothing specified so that container gets BestEffort which makes the Pod as a whole Burstable.

OR

containers:
    name: foo
        resources:
            limits:
                memory: 1Gi

    name: bar
        resources:
            limits:
                cpu: 100m

This config above has two different resources set. One has memory set and the other cpu. Thus once again, Burstable.

 

BestEffort if no defined resources assigned.

containers:
    name: foo
        resources:
    name: bar
        resources:

 

This is just the tip of the iceberg on container guarantees.

There is a lot more there around cgroups, swap and compressible vs incompressible resources.

Head over to the github page to read more.