My exit from a great gig

Today is my last day as a Principal Cloud Platform Architect at Pearson. Over the last few weeks I’ve had a bit of a retrospect on my time at Pearson. Our accomplishments, our failures, our team and all the opportunity it has created. I’m amazed how far we have come. I’m thankful for the opportunity and the freedom to reach beyond what many thought possible.

I left nothing on the table. I gave it all I had.

But one question still stands.

What IS Bitesize?

Bitesize is a platform purpose built for the future of Pearson.

Its a combination of interwoven technologies and application development methodologies that has resulted in advancements far beyond what any cloud provider can currently attain.

Its a team of engineers that believe, support and challenge themselves and others both in and outside the team.

Its a group of leaders that believe in their team and are willing to stick their neck out for what’s right.

Its a philosophy of change.

Its an evolving set of standards that increase fidelity of these interwoven pieces.

Bitesize is the convergence of disparate teams to make something greater than it could individually.

Its a treadmill of technology.

 

As I began thinking about what has been accomplished I decided to write down a few. I know I will leave many unsaid but hopefully this will give a view into just how much has been accomplished. Many may see this list and think one of two things: “That’s total bullshit, no way they did all that” or better yet “Ok I buy it, but have you reached nirvana?”

My answer to the former – “test your bullshit meter, its a bit off”

My answer to the latter – “as soon as one reaches a star, they realize how many other stars they want to reach”

 

A few achievements to date:

Fully automated platform and application upgrades

Completely scalable CI/CD pipeline treated as cattle (i.e. if they go away they self configure when they come back up)

Built-in log and data aggregation per Kubernetes namespace

Deep cloud provider integrations without lock-in

Fully automated database provisioning (containers and virtual machines)

Dynamic Certificate Management using Hashicorp Vault fully integrated with Load Balancers

100% availability for the platform and applications through critical business periods (to my knowledge this has not been achieved til now at Pearson)

Dynamic Application Configuration

Immutable Application architecture

OAuth into the platform and various infrastructure components

Universal API for single point of use

Audit Control and Compliance throughout the stack

Baked in Enterprise Governance

Highly secure, full BGP mesh cross geographic regions, capable of standing up new endpoints < 10 seconds

8.5 Million concurrent (and I do mean concurrent) user performance test, 150-250ms avg response

Enterprise Chargeback model

Dynamic CIDR provisioning (NSOT) for AWS and Kubernetes

Open Sourced Authz webhook resulting in its adoption by CoreOS

Automated generation of StackStorm AWS packs

Contributed StackStorm Kubernetes Pack to StackStorm Exchange

Contributing next generation (over 106 new packs) of StackStorm AWS Packs to StackStorm Exchange (currently in incubator)

Open Sourced many new technologies including Environment Operator, StackStorm Packs, Kong plugins, Kubernetes Test Harness, Nginx Controller, Jenkins plugin for Environment Operator, and CI/CD pipeline

On-Demand Locust (Perf testing suite) on Kubernetes using Iron Functions , deployed < 10 seconds

Integrated Monitoring/Alerting throughout the stack

Self-onboarding of applications through to production with little or no assistance from the Bitesize team

Congrats team. You’ve got this.
@devoperandi

Kubernetes – Device Plugins (alpha)

Brief History

In March of 2017, I wrote about Opaque Integer Resources whereby specific hardware capabilities could be used in Kubernetes. Alpha in 1.5, it allowed for the potential to enable resource like Last Level Cache, GPUs and Many Integrated Core devices etc etc.

In Kubernetes 1.8, Opaque Integer Resources were replaced with Extended Resources. This was a great move as it migrated from a kubernetes.io/uri model to allow resources to be assigned to any domain outside kubernetes.io and thus simply extend the API with API aggregation.

Extended Resources are a phenomenal start to vastly expand opportunities around Kubernetes workloads but it still had the potential to require modifications to Kubernetes core in order to actually use a new resource. And this is where Device Plugins come in.

 

Requirements:

Kubernetes 1.8

DevicePlugins enabled in Kubelet

 

Device Plugins

Device Plugins is a common framework by which hardware devices for specific vendors can be plugged into Kubernetes.

Think of it this way:

Extended Resources = how to use a new resource
Device Plugins = how vendors can advertise to and hook into Kubernetes without modifying Core

One of the first examples of Device Plugins in use is with Nvidia k8s-device-plugin. Which makes complete sense because Nvidia is leading an entire industry is various hardware arenas, GPU being just one of them.

 

How Device Plugins work

Device Plugins are/should be containers running in Kubernetes that provide access to a vendor (or enterprise) specific resource. The container advertises said resource to Kubelet via gRPC. Because this is hardware specific, it must be done on a per node basis. However a Daemonset can be deployed to cover a multitude of nodes for the same resource across multiple machines.

The Device Plugin has three parts:

Registration – plugin advertises itself to Kubelet over gPRC

ListandWatch – provides list of devices and/or modifies existing state of device on change including failures

Allocate – device specific instructions for Kubelet to make available to a container

At first glance this may seem rather simple but it should be noted that prior to Device Plugins, Kubelet specifically handled each device. Which is where hardware vendors had to contribute back to Kubernetes Core to provide net new hardware resources. With device plugin manager, this will be abstracted out and responsibility will lay on the vendor. Kubelet will then keep a socket open to ListandWatch for any changes in device state or list of devices.

 

Use of new devices through Extended Resources

Once a new Device Plugin is advertised to the cluster. It will be quite simple to use.

Now lets imagine we are using Nvidia’s GPU device plugin at nvidia.com/gpu

Here is how we allocate a gpu resource to a container.

apiVersion: v1
kind: Pod
metadata:
  name: need-some-gpu-pod
spec:
  containers:
  - name: my-container-needing-gpu
    image: myimage
    resources:
      requests:
        cpu: 2
        nvidia.com/gpu: 1

 

Gotchas

(At the time of this post)

Integers Only – this is common in Kubernetes but worth noting. 1 for gpu above can not be 0.5.

No Overallocation – Unlike Memory and CPU, devices can not be over allocated. So if Requests and Limits are specified, they must equal each other.

Resource Naming – I can’t confirm this but playing around with nvidia gpu I was unable to create multiple of the same device across multiple nodes.

Example:

I had difficulty advertising nvidia.com/gpu on node 2 once it was advertised on node one.

If correct, this would mean I would need to add nvidia.com/gpu-<node_name> or something of that measure to add the gpu device for multiple servers in a cluster. And also call out that specific device when assigning to the container requiring the resource. Now keep in mind, this is alpha so I would expect it to get modified rapidly but it is currently a limitation.

 

More info on Device Plugins

For a deeper review of the Device Plugin Manager

More on Extended Resources and Opaque Integer Resources

 

@devoperandi

Open Source – Environment Operator

The day has finally come. Today we are announcing our open source project Environment Operator (EO).

Environment Operator is used throughout our project and has rapidly gained a name for itself as being well written and well thought out. Props go out to Simas Cepaitis, Cristian Radu and Ben Somogyi who have all contributed.

At its core, EO enables a seamless application deployment capability for a given environment/namespace within Kubernetes.

Benefits/Features:

  • multi-cluster deployments
  • audit trail
  • status
  • consistent definition of customer environments
  • separate build from deploy
  • minimizes risk and scope of impact
  • simple abstraction from Kubernetes
  • BYO CI/CD
  • empowers our customers (dev teams)
  • API interface
  • multiple forms of authentication
  • deploy through yaml config and API
  • written in Go
  • Docker Registries

 

Multi-Cluster Deployments – With EO running in each namespace and exposed via an API, CI/CD pipelines can simply call the API endpoint regardless of Kubernetes cluster and deploy new services.

 

API Interface – EO has its own API endpoint for deployments, status, logs and the like. This combined with a yaml config for its environment is a very powerful combination.

 

Audit Trail – EO provides an audit trail of all changes to the environment through its logging to stdout.

 

Status – EO provides a /status endpoint by which to understand the status of an environment or individual services within the environment with /status/${service}

 

Separate Build from Deploy – What we found was, while our CI/CD pipeline is quite robust it lacked real-time feedback and audit capabilities needed by our dev teams. Separating our build from deploy allowed us to add in these additional features, simplify its use and enabled our dev teams to bring their own familiar pipelines to our project.

 

Minimize Risk and Scope of impact – Because EO runs inside the Kubernetes cluster we could limit its capabilities through Kubernetes service accounts to only its namespace. This limits risk and impact to other dev teams running in the same cluster as well as requiring a dev to call and entirely wrong API endpoint in order to effect another environment. Further more, authentication is setup for each EO, so separation of concerns between environments can easily be made.

 

Simple Abstraction – Because EO is so simple to use, it has enabled our teams to get up and running much faster in Kubernetes. Very little prior knowledge is required, they can use their same pipelines by using a common DSL in our Jenkins plugin and get all the real-time information all from one place per environment.

 

BYO CI/CD – I think this is pretty self-explanatory but we have many dev teams at Pearson that already have their own CI/CD pipelines. They can continue using their pipeline or choose to use ours.

 

Empower our Dev teams – Ultimately EO is about empowering Dev teams to manage their own environments without requiring tons of prior knowledge to get started. Simply deploy EO and go.

 

Authentication – EO currently supports two different kinds of authentication. Token based which gets pulled from a Kubernetes secret or OAuth. We currently tie directly into Keycloak for auth.

 

Plugin (DSL) for Jenkins – Because most of our Dev teams run Jenkins, we wrote a plugin to hook directly into it. Other plugins could very easily be written.

 

Docker Registries – EO can connect to private, public, gcloud and docker hub registries.

 

As you can see, Environment Operator has a fair amount of capabilities built-in but we aren’t stopping there.

Near term objectives:

  • Stateful sets
  • Kubernetes Jobs
  • Prometheus

 

Github:

https://github.com/pearsontechnology/environment-operator

https://github.com/pearsontechnology/environment-operator-jenkins-plugin

 

Let us know what you think!

@devoperandi

 

 

 

 

OpenID Connect – Enabling Your Team

Hello! My name is Matt Halder and I’ve had some interesting experiences working in a variety of IT fields. I started out working at a Government Contractor in Washington, D.C as a Networker Controller; moved my way up to Network Engineer and finished as a Lead Technologist. From there, headed westward to Denver, CO for an opportunity to work at Ping Identity as a Security Operations Engineer.  Currently, I work at FullContact as DevOps Engineer.  The FullContact team has been using kubernetes in production for the last seven months as a way to reduce our overall cloud hosting costs and move away from IaaS vendor lock-in.  Both the development and staging clusters were bootstrapped using kops.  The largest barrier to adoption that was echoed throughout the development team was needing the ability to tail logs.  When role-based access control was introduced in kubernetes 1.6, the ability to provide access to the cluster outside of shared tokens, certs, or credentials became a reality.  Here are the steps that were used to enable openid-connect on kubernetes.

When setting up an OpenID Connect provider, there are few terms to be aware of.  First is the “IdP”, which is the identity provider; many technologies can be used as an identity provider such as Active Directory, Free IPA, Okta, Dex or PingOne.  Second is the “SP”, which is the service provider; in this case the service provider is the kubernetes API.  The basic overview of an OpenID Connect workflow is this: the user authenticates to the IdP, the IdP returns a token to the user, this token is now valid for any SP that is configured to use the the IdP that produced the token.

  1. Set up your IdP with an openid-connect endpoint and acquire the credentials.
  2. Configure the SP [aka configure the API server] to accept openid-connect tokens and include a super-admin flag so that existing setup will continue to work throughout the change.
  3. Generate kubeconfig file including oidc user config.
  4. Create role bindings for users on the cluster.
  5. Ensure all currently deployed services have role bindings associated with them.

Step 1: Set up the IdP

Since G Suite is already in place, we had an IdP that could be used for the organization.  The added benefit is this IdP is pretty well documented and supported right out of the box, the caveat being that there is no support for groups so each user will need their own role binding on the cluster.

  • Navigate to https://console.developers.google.com/projectselector/apis/library.
  • From the drop-down create a new project.
  • The side bar, under APIs & services -> select Credential.
  • Select OAuth consent screen (middle tab in main view).  Select an email and choose a product name, press save.
  • This will take you back to the Credentials tab (same as the screenshot above).  Select OAuth clientID from the drop-down.
  • From application type -> select Other and give a unique name.
  • Copy the clientID and client secret or download the json.  Download is under OAuth 2.0 client IDs on the right most side.

Step 2: Configure the SP [aka configure API Server] to accept OIDC tokens

Kops now has the ability to add pre install and post install hooks for openid-connect. If we were starting from scratch, this is the route that would be explored.  However, adding these hooks didn’t trigger any updates and forcing a rolling update on a system running production traffic was too risky and was untested since staging had tested/updated prior to this functionality being introduced.

Kubelet loads core manifests from a local path, the kops clusters kubelet loads from /etc/kubernetes/manifests. This directory stores the kube-apiserver manifest file that tells kubelet how to deploy the api server as a pod.  Editing this file will trigger kubelet to re-deploy the API server with new configuration.  Note, this operation is much riskier on a single master cluster than on a multi master cluster.

  • Copy the original kube-apiserver.manifest.
  • Edit kube-apiserver.manifest adding these lines:
--authorization-mode=RBAC

--authorization-rbac-super-user=admin

--oidc-client-id=XXXXXX-XXXXXXXXXXX.apps.googlecontent.com

--oidc-issuer-url=https://accounts.google.com

--oidc-username-claim=email
  • Kubelet should re-deploy the API server within a couple of minutes of the manifest being edited.
  • Ensure that network overlays/CNI is functioning properly before proceeding, not all overlays shipped with service accounts and role bindings. This caused some issues with early adopters to kubernetes 1.6 (Personally, I had to generate a blank configmap for calico since it would fail if one wasn’t found).

Step 3: Generating a kubeconfig file

This process is broken into two steps, the first is to generate the cluster and context portion of the config while the second part is having the user acquire their openid-connect tokens and add them to the kubeconfig.

  • While opinions will vary, I’ve opted to skip the TLS verifications on the kubeconfig. Reasoning being is this would require a CA infrastructure  to generate certs per users which isn’t in place.
  • There’s a bit of a chicken and egg thing going on here where kubectl needs to be installed so that a kubeconfig can be generated for the kubectl (although that’s how ethereumwallet is installed so maybe it’s just me).  Either way, this script can be edited with correct context and endpoints to generate the first half of the kubeconfig:
#!/usr/bin/env bash

set -e

USER=$1

if [ -z "$USER" ]; then
  echo "usage: $0 <email-address>"
  exit 1
fi

echo "setting up cluster for user '$USER'"

# Install kubectl dependency
source $(dirname $(readlink -f "$0"))/install_kubectl.sh 1.6.8

# Set kubeconfig location the current users home directory
export KUBECONFIG=~/.kube/config

# Set cluster configs
kubectl config set-cluster cluster.justfortesting.org \
  --server=https://api.cluster.justfortesting.org \
  --insecure-skip-tls-verify=true

#Set kubeconfig context
kubectl config set-context cluster.justfortesting.org \
  --cluster=cluster.justfortesting.org \
  --user=$USER

kubectl config use-context cluster.justfortesting.org
  • To generate the second part of the kubeconfig, use k8s-oidc-helper from here to generate the user portion and append the output at the bottom of the config file.  Now, with a functioning kubeconfig the user needs a role binding present in the cluster to have access.  The IdP client-id and client-secret will need to be made available to users so they can generate the openid-connect tokens.  I’ve had good success with LastPass for this purpose.

Step 4: User Role Bindings

  • Now, create a default role that users can bind to. The example gives the ability to list pods and their logs from the default namespace.
kind: Role
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  namespace: default
  name: developer-default-role
rules:
- apiGroups: [""]
  resources: ["pods", "pods/log"]
  verbs: ["get", "list"]
  • Now, bind users to this role (notice the very last line has to be identical to the G Suite email address that used in Step 3).
  • At our organization, these files are generated by our team members and then approved via github pull request.  Once the PR has been merged into master, the role bindings become active on the clusters via jenkins job.
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: RoleBinding
metadata:
  name: ${USER}@organization.tld-binding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: developer-default-role
subjects:
- apiGroup: rbac.authorization.k8s.io
  kind: User
  name: $USER@organization.tld

Step 5: Existing tooling needs an account and binding

The last step is necessary for any existing tooling in the cluster to ensure continued functionality.  The “–authorization-rbac-super-user=admin” flag from step 2 was added to ensure continuity throughout the process.  We use helm to deploy foundational charts into the cluster; helm uses a pod called “tiller” on the cluster to receive all specs from helm sdk and communicate them to the API, scheduler, and controller-manager.  For foundational tooling such as this, use service accounts and cluster role bindings.

apiVersion: v1
kind: ServiceAccount
metadata:
  name: tiller
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: tiller-cluster-rolebinding
subjects:
  - kind: ServiceAccount
    name: tiller
    namespace: kube-system
roleRef:
  kind: ClusterRole
  name: cluster-admin
  apiGroup: ""

 

 

Kubernetes: FaaS Options (part 1)

Over the last few months I’ve been diving into various Serverless/FaaS architectures that can run on Kubernetes. To say this space has exploded would be a severe understatement. The number of amazing developers working in this space is remarkable. Much less the number of them integrating with Kubernetes.

I’m not going to talk about wrappers around Lambda (which there are a TON of). I’m talking about true FaaS capabilities that can at least demonstrate they run on Kubernetes.

As it turns out there are a fair number of them.

I’ve worked with several of these now but I’ll point out the ones I’ve not as we go along. In most circumstances I was able to reduce the number of candidates to explore simply by reviewing their architectures to understand their pros and cons.

I’ve also come across what I believe will be some key indicators we may want to be aware of before taking on one of these capabilities.

  • Language support
  • Performance and Scalability (how quickly can a basic function execute)
  • Asynchronous/Synchronous support
  • Monitoring
  • Architecture

 

 

OpenWhisk

OpenWhisk was built and designed by IBM. Seems to be gaining a fair amount of traction in this space and has a good reputation. An excellent overview of the OpenWhisk Architecture can be written by Markus Thömmes , an OpenWhisk contributor at medium.com.

Language Support  OpenWhisk has full language support for just about anything. It even has integrations with Swift, Cloudant, Slack and YouTube.

Performance and Scalability – I found performance of OpenWhisk to be somewhat sluggish out of the box. But Mark provides some pretty good ways to increase performance here. Scalability is quite good as all components including the controllers can be scaled out.

Asynchronous/Synchronous – Asynchronous Only. Sounds like there are some plans for (semi?) synchronous support.

Monitoring – IBM does have Dashboard that can be use with IBM Bluemix and the CLI can be used for gaining insights as well but built-in capabilities with open source monitoring platforms are non-existent.

Architecture – 

CouchDB and Kafka are in direct line of the execution for any function. CouchDB being for both authentication and action retrieval. Personally I couldn’t see us requiring Authentication through OpenWhisk and I would imagine most others have their own Auth capabilities that support far more than what are offered here. Kafka because all requests are Asynchronous (at this time).

Primary problem the with the above is availability. The more stateful (semi or otherwise) services required to be available, the more opportunity for failure however you’ll find this is fairly common in FaaS. Some sort of message queue and some sort of storage for holding code. This tends to limit the number of languages (and/or versions) supported but IBM have done a good job here. Basically any container can be an invoker as long as it conforms to a few specifics.

Notes for OpenWhisk on Kubernetes:

  • No use of Kubernetes scheduling.
  • OpenWhisk controller talks directly with the Docker API on the host, thus limiting scalability to what that host can handle. Also not going to work for availability.

Recap: Overall OpenWhisk is a platform that’s been around for a little bit now. It largely resembles Lambda in its capability but open to the masses. I could see OpenWhisk being used in very large FaaS implementations but its number of dependencies in the critical path scare me. Its performance could use some enhancing out of the box. But for a FaaS that relies on injecting functions into containers, its language support is stellar and it has some pretty cool direct integrations.

Bottom Line: I can’t recommend this platform if running Kubernetes at this time.

 

Kubeless

Kubeless is almost a brand new project. As of the time of this writing, Kubeless has only been committed to in any earnest for the last 5 months. They are truly Kubernetes native and plug right in to the Serverless project. I did not get the chance to really test out this platform but I’m aiming to get a handle on it in the next few weeks.

Language Support  Python and NodeJS

Performance and Scalability – I just don’t know yet

Asynchronous/Synchronous – Both

Monitoring – Baked in monitoring with Prometheus.

Architecture – 

Kubeless relies heavily on built-in Kubernetes capabilities such as ThirdPartyResources (or Custom Definitions depending on version of Kubernetes) and takes advantage of the built-in API server. Everything to run a function exists in the ThirdPartyResource. As a result however, the Kubeless team has to provide support by language/version for functions to run. My hope is they will make this a bit more generic to allow customer runtimes? Otherwise I fear they won’t be able to keep up.

Correction: Executions with Kubeless are through http or triggered events. Thank You @sebgoa for pointing this out.

Notes: Kubeless has an easily consumable UI and directly plugs in to the Serverless Framework.

Kubeless runs on vanilla Kubernetes, OpenShift and hooks seamlessless into Kubernetes-RBAC for security.

Recap: Really cool new up and coming project integrating deeply with Kubernetes. Could be a heavy contender in the future.

Bottom Line: Not yet unless you are solely Python and NodeJS based.

 

 

IronFunctions

IronFunctions are the current unsung heroes in my mind. Under heavy development starting in July of 2016, this is an easily consumable open source project that integrates well with Kubernetes while having the unique ability to run Lambda style functions as well. So for all you Lambda junkies wanting to break your addiction, this might be a pretty damn good option.

Language Support  Only limited by the docker containers you can dream up.

Performance and Scalability – Only limited by the infrastructure its running on. I quite easily executed it in several languages both locally and on a full cluster in the 200-250ms range for Synch requests and 300ish for Asynch. I don’t see any scalability issues at this time. If a ceiling was hit, it would be quite easy to simply spin up a new IronFunctions capability in a different namespace in Kubernetes.

Asynchronous/Synchronous – Both

Monitoring – Logs

Architecture – 

IronFunctions are a truly well built platform that I’m hesitant to say could serve many different use cases. There are a few basic components to running IronFunctions.

  • IronFunctions – Essentially the controller/API that manages incoming requests and starts up resources/container to fulfill said request.
  • Database – for configuration only. Not in the critical request path.
  • Message Queue – For Asynchronous requests.

Notes: Has a usable UI for managing functions. HotFunctions are pretty awesome. The CLI is very easy to use.

All in all, IronFunctions was the dark horse that surprised me by a long shot. I would love to see Prometheus monitoring make it in as I’m not terribly excited about Logging being the metrics collection point. Overall, minor gripe.

Recap: I was noticeably surprised by its scalability, performance and maturity for a project I just happen to run across. It has all the makings of a truly scalable, production capable FaaS offering. With Synchronous, Asynchronous AND HotFunction capabilities, I was very impressed. Combine that with ease of use, just enough integration with Kubernetes and I’m pretty much sold. Keep up the good work.

Bottom Line: Of the ones I’ve reviewed so far, a definite Yes. Just get me some metrics into Prometheus. 😉

In a future post I’ll have a look at Fission, Funktion and maybe an up and comer by alexellis called faas-netes.

@devoperandi

Kubernetes – PodPresets

Podpresets in Kubernetes are a cool new addition to container orchestration in v1.7 as an alpha capability. At first they seem relatively simple but when I began to realize their current AND potential value, I came up with all kinds of potential use cases.

Basically Podpresets inject configuration into pods for any pod using a specific Kubernetes label. So what does this mean? Have a damn good labeling strategy. This configuration can come in the form of:

  • Environment variables
  • Config Maps
  • Secrets
  • Volumes/Volumes Mounts

Everything in a PodPreset configuration will be appended to the pod spec unless there is a conflict, in which case the pod spec wins.

Benefits:

  • Reusable config across anything with the same service type (datastores as an example)
  • Simplify Pod Spec
  • Pod author can simply include PodPreset through labels

 

Example Use Case: What IF data stores could be configured with environment variables. I know, wishful thinking….but we can work around this. Then we could setup a PodPreset for MySQL/MariaDB to expose port 3306, configure for InnoDB storage engine and other generic config for all MySQL servers that get provisioned on the cluster.

Generic MySQL Pod Spec:

apiVersion: v1
kind: Pod
metadata:
  name: mysql
  labels:
    app: mysql-server
    preset: mysql-db-preset
spec:
  containers:
    - name: mysql
      image: mysql:8.0
      command: ["mysqld"]
  initContainers:
  - name: init-mysql
    image: initmysql
    command: ['script.sh']

Now notice there is an init container in the pod spec. Thus no modification of the official MySQL image should be required.

The script executed in the init container could be written to templatize the MySQL my.ini file prior to starting mysqld. It may look something like this.

#!/bin/bash

cat >/etc/mysql/my.ini <<EOF

[mysqld]

# Connection and Thread variables

port                           = $MYSQL_DB_PORT
socket                         = $SOCKET_FILE         # Use mysqld.sock on Ubuntu, conflicts with AppArmor otherwise
basedir                        = $MYSQL_BASE_DIR
datadir                        = $MYSQL_DATA_DIR
tmpdir                         = /tmp

max_allowed_packet             = 16M
default_storage_engine         = $MYSQL_ENGINE
...

EOF

 

Corresponding PodPreset:

kind: PodPreset
apiVersion: settings.k8s.io/v1alpha1
metadata:
  name: mysql-db-preset
  namespace: somenamespace
spec:
  selector:
    matchLabels:
      preset: mysql
  env:
    - name: MYSQL_DB_PORT
      value: "3306"
    - name: SOCKET_FILE
      value: "/var/run/mysql.sock"
    - name: MYSQL_DATA_DIR
      value: "/data"
    - name: MYSQL_ENGINE
      value: "innodb"

 

This was a fairly simple example of how MySQL servers might be implemented using PodPresets but hopefully you can begin to see how PodPresets can abstract away much of the complex configuration.

 

More ideas –

Standardized Log configuration – Many large enterprises would like to have a logging standard. Say something simply like all logs in JSON and formatted as key:value pairs. So what if we simply included that as configuration via PodPresets?

Default Metrics – Default metrics per language depending on the monitoring platform used? Example: exposing a default set of metrics for Prometheus and just bake it in through config.

 

I see PodPresets being expanded rapidly in the future. Some possibilities might include:

  • Integration with alternative Key/Value stores
    • Our team runs Consul (Hashicorp) to share/coordinate config, DNS and Service Discovery between container and virtual machine resources. It would be awesome to not have to bake in envconsul or consul agent to our docker images.
  • Configuration injection from Cloud Providers
  • Secrets injection from alternate secrets management stores
    • A very similar pattern for us with Vault as we use for Consul. One single Secrets/Cert Management store for container and virtual machine resources.
  • Cert injection
  • Init containers
    • What if Init containers could be defined in PodPresets?

I’m sure there are a ton more ways PodPresets could be used. I look forward to seeing this progress as it matures.

 

@devoperandi

The perils of a Kube-DNS issue with Nginx Ingress

Ok so this is going to be a tough one to write but I’m going to do it anyway. This is a story of data overload, a shit ton of rabbit holes, some kick ass engineers and a few hours of my life I hope not to repeat. I never cease being amazed by how one thing can cause so much trouble.

Requirements:

  • Using Kube-DNS for internal DNS resolution. I assume this to be most of my audience.
  • Running Nginx Ingress Controllers for Reverse Proxy

If your environment doesn’t fit the bill on either of the above, you can probably ignore this terribly written yet informative post.

Our team recently took what I would call a partial outage as a result of this problem. I can’t, nor would I want, to go into the details around how long. 🙂 But needless to say we went through a lot of troubleshooting and I can only hope this will help someone else.

It all started out on a beautiful sunny Colorado day………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………… Nevermind. You probably don’t want to hear the ramblings of a terrible doesn’t-wanna-be-a-writer who can’t write.

So lets get to the symptoms.

Symptoms included:

Higher than normal Network Response Times for applications.

 

 

 

 

 

 

Some domains worked and some don’t.

In the course of this troubleshooting we noticed that some domains worked and some didn’t. I mean like 100% of the time they worked. No errors. No problems. Take for example all our CI/CD applications and our documentation site. They all worked without fail. All these things are on the same platform. The only difference is their endpoint urls.

….or so it seemed.

To make matters worse, we had my-app-blue.prsn.io, my-app-green.prsn.io (blue/green deploys) AND my-app.prsn.io.

We could hit the blue and green endpoints just fine but my-app.prsn.io would err a portion of the time.

Here is the kicker, my-app-blue.prsn.io and my-app.prsn.io literally route to the same exact set of pods. Only difference is the endpoint url.

 

Tons of NXDOMAIN requests (more than normal):

What is an NXDOMAIN?

The NXDOMAIN is a DNS message type received by the DNS resolver (i.e. client) when a request to resolve a domain is sent to the DNS and cannot be resolved to an IP address. An NXDOMAIN error message means that the domain does not exist.

  1. dnsmasq[1]:179843192.168.154.103/52278 reply my-app.my-base-domain.com.some-namespace.svc.cluster.local is NXDOMAIN

Now notice the event above shows

my-app.my-base-domain.com.some-domain.svc.cluster.local

This is because the resolver could not find

my-app.my-base-domain.com

so it attempted to add its default domain of “name-space.svc.cluster.local”

resulting in a string of NXDOMAINs like so:

dnsmasq[1]: 179861 192.168.125.227/43154 reply my-app.some-external-endpoint.com.svc.cluster.local is NXDOMAIN
dnsmasq[1]: 179863 192.168.125.227/43154 reply my-app.some-external-endpoint.com.svc.cluster.local is NXDOMAIN
dnsmasq[1]: 179866 192.168.71.97/55495 cached my-app.some-external-endpoint.com.cluster.kube is NXDOMAIN
dnsmasq[1]: 179867 192.168.120.91/35011 reply my-app.some-external-endpoint.com.cluster.local is NXDOMAIN
dnsmasq[1]: 179869 192.168.104.71/40891 reply my-app.some-external-endpoint.com.cluster.local is NXDOMAIN
dnsmasq[1]: 179870 192.168.104.71/57224 reply my-app.some-external-endpoint.com.cluster.local is NXDOMAIN

This is because Kubernetes ndots:5 by default. More on that here.

 

 

Next we found Domain Resolution Errors

A comment from a teammate:

I suspect it's DNS. Possibly ours, but I can't be sure where or how. Things are resolving fine internally presently and even externally...
But ingresses clearly log that they can't resolve certain things.

Now this we could gather this from the increase in NXDOMAIN events but this provided a more clarity as to what we were looking at.

Another teammate:

we know that kubedns pods resolve *external* dns entries fine
you can make sure of that running
    nslookup www.whatevs 127.0.0.1
on kube-dns pods

 

SO how in the hell do we have Nginx-controller throwing domain resolution errors but we can resolve anything we like just fine from the fucking DNS server itself?

In the mean time we got some more data.

 

We also saw Throttling at the API Server:

Throttling request took 247.960344ms, request: GET:http://127.0.0.1:8080/api/v1/namespaces/app-stg/pods
Throttling request took 242.299039ms, request: GET:http://127.0.0.1:8080/api/v1/namespaces/docs-prd/pods?labelSelector=pod-template-hash%3D2440138838%2Cservice%3Dkong-dashboard
Throttling request took 247.059299ms, request: GET:http://127.0.0.1:8080/api/v1/namespaces/otherapp-dev/configmaps

which caused us to have a look at ETCD.

ETCD likely Overloaded:

 W | etcdserver: server is likely overloaded
 W | etcdserver: failed to send out heartbeat on time (deadline exceeded for 129.262195ms)
 W | etcdserver: server is likely overloaded
 W | etcdserver: failed to send out heartbeat on time (deadline exceeded for 129.299835ms)

 

At this point here is what we’ve got:

  • Network response times have increased
  • Some domains are working just fine but others aren’t
  • NXDOMAIN requests have increased
  • Domain resolution errors from Nginx
  • DNS resolution from ALL Kube-DNS pods work just fine
  • API Server is throttling requests to save ETCD

 

Data points we missed:

  • Domain endpoints that were under load failed more often

 

So what was the problem?

Next stop, Nginx.

Here is what bit us. For each backend in Nginx, a socket will be opened to resolve DNS. Identified by the line with “resolver” in it like below. This socket has a TTL of 30 seconds by default. Meaning if the something happens to the DNS resolver (kube-dns pod), Nginx will fail away from it in 30 seconds UNLESS retries are configured. IF retries are configured, the 30 second TTL will be reset every time a retry takes place. As you can imagine, Nginx ends up keeping the socket open almost indefinitely under high load and thus never creates a new socket to a kube-dns pod that is up and available.

Ours was set to:

resolver kube-dns.kube-system.svc.cluster.local;

Big mistake.

You see, resolving the resolver is well, bad form. Not to mention likely to cause a shit ton of head aches. It did for us.

What about adding

valid=10s

at the end of the resolver line sense we are setting a domain variable? Only works for proxy_pass. Not for the resolver itself.

Ok fine, what options do we have? We thought of three.

  1. Add Kube-DNS/dnsmasq pod as a daemonset and have it added to every server. Not a bad idea over all. The IP could be setup to listen over the Docker socket and thus be static across all hosts. But this does present challenges. For example, we’ll still end up in a Chicken and Egg scenario especially if using things like a Private Docker Registry as a Kubernetes Pod in our cluster as we do.
  2. Running dnsmasq on every server with systemd and still have it available over the docker socket. Thus allowing for a statically assigned IP that can be set in Nginx. This also has the advantage of significantly reducing the number of DNS requests that make it to kube-dns, distributes the load and almost makes DNS significantly less hassle. It however does mean we wouldn’t be running it as a container. This also has the added benefit of being able to place dnsmasq on any server outside the Kubernetes world, thus allowing for a little bit more consistency across the platform.
  3. Run dnsmasq as a sidecar to all Nginx pods could be a valuable option as well. It lacks the availability of option #2 but it means Nginx could simply look at local loopback address for resolving DNS. It also has the added benefit of having Kubernetes automatically reschedule the container should it fail.

Alright, what did we pick?

 

**NOTE** we have changed and moved to running dnsmasq as a sidecar along side the Nginx container.

 

At this time we are using Option #2. Easy to setup, provides continuity across our platform, reduces network traffic due to caching of DNS requests and did I mention it was easy to setup?

 

Note: There is work in progress by the Kubernetes folks around this. Although I’m not sure there has been a definitive solution just yet.

 

Hope this helps you and yours from hitting the same pitfall we did. GL

 

@devoperandi

 

Kubernetes: Pod Disruption Budget

PodDisruptionBudget is a relatively new paradigm in Kubernetes.

At its core, it ensures a certain number or percentage of pods with an assigned label will not Voluntarily be evicted at any one point in time.

As an example, lets imagine we are draining a server for the purpose of a restart. There are 5 pods of the same application (with the same label) running on the Kubernetes cluster. Two of which are running on this server we intend to restart. If our PodDisruptionBudget requires a minimum of 80% of pods to be available, the budget would only allow for one pod to be down at a time.

Example:

apiVersion: policy/v1alpha1
kind: PodDisruptionBudget
metadata:
  name: disruptme
spec:
  selector:
    matchLabels:
      name: myapp5pods
  minAvailable: 80%

 

Kubectl drain it will respect PodDisruptionBudget. Thus when we run drain on this node we intend to restart, the cluster will ensure only one pod comes down at a time and ensure the pod has been rescheduled and is running on another server before then bringing down the second pod that was running on this server.

disruption.json

{
  "apiVersion": "policy/v1beta1",
  "kind": "Eviction",
  "metadata": {
    "name": "myapp5pods-4050136386-d6ao9",
    "namespace": "default"
  }
}

 

At the time of this writing, I was unable to use kubectl to evict pods but curl is an option.

curl -v -H 'Content-type: application/json' https://10.253.92.16:8080/api/v1/namespaces/default/pods/myapp5pods-4050136386-d6ao9/eviction -d @disruption.json

 

PodDisruptionBudget is going to become very valuable as companies begin managing larger and larger clusters of Kubernetes.

 

Use cases:

Quorum based applications (assuming good shutdown procedures)

Applications requiring X number of pods to be available under load

 

k8s training videos – looking for feedback

As I’ve never done this before, I’m looking for feedback around some training videos I’m starting to create.

Granted this is a fairly generic topic but I’d love your feedback on format, look, feel, if its something you would consume, how long the videos should be etc etc.

Please help me out here. I could really use your insight.

The topic here is about Pod Scheduling. Its not really finished but it should give a good idea on the direction I’m thinking. I’m trying to get these topics within 3-5 minutes while not compromising the value of the content. My intent is to go deep on topics but divide them up by subtopics. If this is the wrong direction, I NEED to know this.

Please please please give me constructive feedback. I’m doing this for all of us after all.

 

 

Thanks

 

 

 

deployment pipeline options for Kubernetes

In the last several months various deployment (CI/CD) pipelines have cropped up within the Kubernetes community and our team also released one at KubeCon Seattle 2016. As a result I’ve been asked on a couple different occasions why we built our own. So here is my take.

We began this endeavor sometime late 2015. You can see our initial commit is on Feb 27th, 2016 to Github. But if you look closer, you will notice its a very large commit. Some 806 changes. This is because we began this project quite a bit before this time. So what does this mean? Nothing other than we weren’t aware of any other CD pipeline projects at the time and we needed one. So we took it upon ourselves to create one. You can see my slides at KubeCon London March 2016 where I talk about it a fair amount.

My goal of this blog is not to persuade you of one particular CD pipeline or another. I simply don’t care beyond the fact that contributions to Pearson’s CD pipeline would mean we can make it better faster. Beyond this we get nothing out of it.

 

 

Release it as Open Source –

We chose to release our project as open source for the community to consume at Kubecon Seattle 2016. To share our thoughts and experience on the topic, offer another option for the community to consume and provide insight to how we prefer to build/deploy/test/manage our environments.

Is it perfect? No.

Is it great? Yes, in my opinion its pretty great.

Does it have its own pros AND cons? hell yes it does.

 

Lets dig in, shall we?

There are two projects I’m aware of that claim many of the same capabilities. It is NOT my duty to explain all of them in detail but rather point out what I see as pros/cons/differences (which often correlate to different modes of thought) between them.

I will be happy to modify this blog post if/when I agree with a particular argument so feel free to add comments.

 

 

Fabric8 CD Pipeline – (Fabric8)

https://fabric8.io/guide/cdelivery.html

The Fabric8 CD pipeline was purpose built for Kubernetes/OpenShift. It has some deeper integrations with external components that are primarily RedHat affiliated. Much of the documentation focuses on Java based platforms even though they remark they can integrate with many other languages.

 

Kubernetes/OpenShift

Capable of working out of the box with Kubernetes and OpenShift. My understanding is you can set this as a FABRIC8_PROFILE to enable one or the other.

 

Java/JBoss/RedHat as a Focus point

While they mention being able to work with multiple languages, their focus is very much Java. They have some deep integrations with Apache Camel and other tools around Java including JBoss Fuse.

 

Artifact Repository

The CD pipeline for Fabric8 requires Nexus, an artifact repository.

 

Gogs or Github

Gogs is required for on-prem git repository hosting. I’m not entirely sure why Gogs would matter if simply accessing a git repo but apparently it does. Or there is integration with github.

 

Code Quality

Based on the documentation Fabric8 appears to require SonarType for use around code quality. This is especially important if you are running a Java project as Fabric8 automatically recognizes and attempts to integrate them. SonarType can support a variety of languages based on your use-case.

 

Pipeline Librarys

Fabric8 has a list of libraries for reusable bits of code to build your pipeline from. https://github.com/fabric8io/fabric8-jenkinsfile-library. Unfortunately these libraries tend to have some requirements around SonarType and the like.

 

Multi-Tenant

I’m not entirely sure as of yet.

 

Documentation

Fabric8’s documentation is great but very focused on Java based applications. Very few examples include any other languages.

 

 

Pearson CD Pipeline – (Pearson)

https://github.com/pearsontechnology/deployment-pipeline-jenkins-plugin

 

Kubernetes Only

Pure Kubernetes integration. No OpenShift.

 

Repository Integration

git. Any git, anywhere, via ssh key.

 

Language agnostic

The pipeline is entirely agnostic to language or build tools. This CI/CD platform does not specify any deep integrations into other services. If you want it, specify the package desired through the yaml config files and use it. Pearson’s CD pipeline isn’t specific to any particular language because we have 400+ completely separate development teams who need to work with it.

 

Artifact Repository

The Pearson CD pipeline does not specify any particular Artifact repo. It does however use a local aptly repo for caching deb packages. However nothing prevents artifacts from being shipped off to anywhere you like through the build process.

 

Ubuntu centric (currently)

Currently the CD pipeline is very much Ubuntu centric. The project has a large desire to integrate with Alpine and other base images but we simply aren’t there yet. This would be an excellent time to ask our community for help. please?

 

Opinionated

Pearson’s CD pipeline is opinionated about how and the order in which build/test/deploy happens. The tools used to perform build and test however are up to you. This gives greater flexibility but places the onus on the team around their choice of build/test tools.

 

Code Quality

The Pearson CD pipeline performs everything as code. What this is means is all of your tests should exist as code in a repository. Then simply point Jenkins to that repo and let it rip. The pipeline will handle the rest including spinning off the necessary number of slaves to do the job.

 

Ease of Use

Pearson’s CD pipeline is simple once the components are understood. Configuration code is reduced to a minimum.

 

Scalability

The CD pipeline will automatically spin up Jenkins slaves for various work requirements. It doesn’t matter if there are 1 or 50 microservices, build/test/deploy is relatively fast.

 

Tenancy

Pearson’s CD pipeline is intended to be used as a pipeline per project. Or better put, a pipeline per development team as each Jenkins pipeline can manage multiple Kubernetes namespaces. Pearson divides dev,stg,prd environments by namespace.

 

Documentation

Well lets just say Pearson’s documentation on this subject matter are currently lacking. There are plenty of items we need to add and it will be coming soon.

 

 

Final Thoughts:

Fabric8 deployment pipeline

The plugin the fabric8 team have built for integrating with Kubernetes/OpenShift is awesome. In fact the Pearson deployment pipeline intends to take advantage of some of their work. Hence the greatness of the open source community. If you have used Jenkinsfile, this will feel familiar to you. The Fabric8 plugin is focused on what tools should be used (ie SonarType, Nexus, Gogs, Apache Camel, Jboss Fuse). This could be explained away as having deeper integration allowing for a seamless integration but I would argue that most of these have APIs and its not difficult to make a call out which would allow for a tool agnostic approach. They also have a very high degree of focus on Java applications which doesn’t lend itself to the rest of the dev ecosystem. As I mentioned above, they do state they can integrate with other languages but I’ve been unable to find good examples of this in documentation.

Note: I was unable to find documentation on how the Fabric8 deployment pipeline scales. If someone has this information readily available I would love to read/hear about it. Its quite possible I just missed it.

Provided Jenkinsfile is a known entity, Java centric is the norm and you already integrate with many of the tools Fabric8 provides, this is probably a great fit for your team. If you need to have control over the CI/CD process, Fabric8 could be a good fit for you.

 

Pearson deployment pipeline

This is an early open source project. There are limitations around Ubuntu which we intend to alleviate. We simply haven’t had the demand from our customers to prioritize it yet. **Plug the community getting involved here***. Pearson’s deployment pipeline is very flexible from the sense of what tools it can integrate with yet more deterministic as to how the CI/CD process should work. There is no limitation on language. The Pearson deployment pipeline is easy to get started with and highly scalable. Jenkins will simply scale the number of slaves it needs to perform. Because the deployment pipeline abstracts away much of the CI/CD process, the yaml configuration will not be familiar at first.

If you don’t know Jenkins and you really don’t want to know the depths of Jenkins, Pearson’s pipeline tool might be a good place to start. Its simple 3 yaml config files will reduce the amount of configuration you need to get started. I would posit it will be half as many lines of config to create your pipeline.

The Pearson Deployment Pipeline project needs better examples/templates on how to work with various languages

 

Note:

Please remember, this blog is at a single point in time. Both projects are moving, evolving and hopefully shaping the way we think about pipelines in a container world.

 

Key Considerations:

Fabric8

Integrates well with OpenShift and Kubernetes

Tight integrations with other tools like SonarType, Camel and ActiveMQ, Gog etc etc

Less focus on how the CI/CD pipeline should work

Java centric

Requires other Fabric8 projects to get full utility from it

Tenancy – I’m not entirely sure. I probably just missed this in the documentation.

 

Pearson

Kubernetes Only

All purpose CD pipeline

Language Agnostic

More opinionated about the build/test/deploy process

Highly Scalable

Tenant per dev team/project

Easy transition for developers to move between dev teams

More onus on teams to create their build artifacts