My exit from a great gig

Today is my last day as a Principal Cloud Platform Architect at Pearson. Over the last few weeks I’ve had a bit of a retrospect on my time at Pearson. Our accomplishments, our failures, our team and all the opportunity it has created. I’m amazed how far we have come. I’m thankful for the opportunity and the freedom to reach beyond what many thought possible.

I left nothing on the table. I gave it all I had.

But one question still stands.

What IS Bitesize?

Bitesize is a platform purpose built for the future of Pearson.

Its a combination of interwoven technologies and application development methodologies that has resulted in advancements far beyond what any cloud provider can currently attain.

Its a team of engineers that believe, support and challenge themselves and others both in and outside the team.

Its a group of leaders that believe in their team and are willing to stick their neck out for what’s right.

Its a philosophy of change.

Its an evolving set of standards that increase fidelity of these interwoven pieces.

Bitesize is the convergence of disparate teams to make something greater than it could individually.

Its a treadmill of technology.

 

As I began thinking about what has been accomplished I decided to write down a few. I know I will leave many unsaid but hopefully this will give a view into just how much has been accomplished. Many may see this list and think one of two things: “That’s total bullshit, no way they did all that” or better yet “Ok I buy it, but have you reached nirvana?”

My answer to the former – “test your bullshit meter, its a bit off”

My answer to the latter – “as soon as one reaches a star, they realize how many other stars they want to reach”

 

A few achievements to date:

Fully automated platform and application upgrades

Completely scalable CI/CD pipeline treated as cattle (i.e. if they go away they self configure when they come back up)

Built-in log and data aggregation per Kubernetes namespace

Deep cloud provider integrations without lock-in

Fully automated database provisioning (containers and virtual machines)

Dynamic Certificate Management using Hashicorp Vault fully integrated with Load Balancers

100% availability for the platform and applications through critical business periods (to my knowledge this has not been achieved til now at Pearson)

Dynamic Application Configuration

Immutable Application architecture

OAuth into the platform and various infrastructure components

Universal API for single point of use

Audit Control and Compliance throughout the stack

Baked in Enterprise Governance

Highly secure, full BGP mesh cross geographic regions, capable of standing up new endpoints < 10 seconds

8.5 Million concurrent (and I do mean concurrent) user performance test, 150-250ms avg response

Enterprise Chargeback model

Dynamic CIDR provisioning (NSOT) for AWS and Kubernetes

Open Sourced Authz webhook resulting in its adoption by CoreOS

Automated generation of StackStorm AWS packs

Contributed StackStorm Kubernetes Pack to StackStorm Exchange

Contributing next generation (over 106 new packs) of StackStorm AWS Packs to StackStorm Exchange (currently in incubator)

Open Sourced many new technologies including Environment Operator, StackStorm Packs, Kong plugins, Kubernetes Test Harness, Nginx Controller, Jenkins plugin for Environment Operator, and CI/CD pipeline

On-Demand Locust (Perf testing suite) on Kubernetes using Iron Functions , deployed < 10 seconds

Integrated Monitoring/Alerting throughout the stack

Self-onboarding of applications through to production with little or no assistance from the Bitesize team

Congrats team. You’ve got this.
@devoperandi

Kubernetes – Device Plugins (alpha)

Brief History

In March of 2017, I wrote about Opaque Integer Resources whereby specific hardware capabilities could be used in Kubernetes. Alpha in 1.5, it allowed for the potential to enable resource like Last Level Cache, GPUs and Many Integrated Core devices etc etc.

In Kubernetes 1.8, Opaque Integer Resources were replaced with Extended Resources. This was a great move as it migrated from a kubernetes.io/uri model to allow resources to be assigned to any domain outside kubernetes.io and thus simply extend the API with API aggregation.

Extended Resources are a phenomenal start to vastly expand opportunities around Kubernetes workloads but it still had the potential to require modifications to Kubernetes core in order to actually use a new resource. And this is where Device Plugins come in.

 

Requirements:

Kubernetes 1.8

DevicePlugins enabled in Kubelet

 

Device Plugins

Device Plugins is a common framework by which hardware devices for specific vendors can be plugged into Kubernetes.

Think of it this way:

Extended Resources = how to use a new resource
Device Plugins = how vendors can advertise to and hook into Kubernetes without modifying Core

One of the first examples of Device Plugins in use is with Nvidia k8s-device-plugin. Which makes complete sense because Nvidia is leading an entire industry is various hardware arenas, GPU being just one of them.

 

How Device Plugins work

Device Plugins are/should be containers running in Kubernetes that provide access to a vendor (or enterprise) specific resource. The container advertises said resource to Kubelet via gRPC. Because this is hardware specific, it must be done on a per node basis. However a Daemonset can be deployed to cover a multitude of nodes for the same resource across multiple machines.

The Device Plugin has three parts:

Registration – plugin advertises itself to Kubelet over gPRC

ListandWatch – provides list of devices and/or modifies existing state of device on change including failures

Allocate – device specific instructions for Kubelet to make available to a container

At first glance this may seem rather simple but it should be noted that prior to Device Plugins, Kubelet specifically handled each device. Which is where hardware vendors had to contribute back to Kubernetes Core to provide net new hardware resources. With device plugin manager, this will be abstracted out and responsibility will lay on the vendor. Kubelet will then keep a socket open to ListandWatch for any changes in device state or list of devices.

 

Use of new devices through Extended Resources

Once a new Device Plugin is advertised to the cluster. It will be quite simple to use.

Now lets imagine we are using Nvidia’s GPU device plugin at nvidia.com/gpu

Here is how we allocate a gpu resource to a container.

apiVersion: v1
kind: Pod
metadata:
  name: need-some-gpu-pod
spec:
  containers:
  - name: my-container-needing-gpu
    image: myimage
    resources:
      requests:
        cpu: 2
        nvidia.com/gpu: 1

 

Gotchas

(At the time of this post)

Integers Only – this is common in Kubernetes but worth noting. 1 for gpu above can not be 0.5.

No Overallocation – Unlike Memory and CPU, devices can not be over allocated. So if Requests and Limits are specified, they must equal each other.

Resource Naming – I can’t confirm this but playing around with nvidia gpu I was unable to create multiple of the same device across multiple nodes.

Example:

I had difficulty advertising nvidia.com/gpu on node 2 once it was advertised on node one.

If correct, this would mean I would need to add nvidia.com/gpu-<node_name> or something of that measure to add the gpu device for multiple servers in a cluster. And also call out that specific device when assigning to the container requiring the resource. Now keep in mind, this is alpha so I would expect it to get modified rapidly but it is currently a limitation.

 

More info on Device Plugins

For a deeper review of the Device Plugin Manager

More on Extended Resources and Opaque Integer Resources

 

@devoperandi