State management in Tekton pipelines with Slack interactive messages

bitsofinfo
Level Up Coding
Published in
15 min readAug 13, 2020

--

Earlier this year I re-entered the rabbit hole which is the dizzying world of CI/CD platforms and solutions. Today’s marketplace presents so many choices that I can only imagine how daunting it is for a newcomer to the space to decide on what solution to go with.

Thankfully the industry is starting to invest in defining some standardization and conventions for CI/CD systems and the concepts around “pipelines” which is ripe with frequently repeated patterns across various vendors that some baseline level of standardization is needed. One of the organizations involved in this effort is the Continuous Delivery Foundation (CDF) (loosely related to the CNCF) who is working on these efforts. When looking into this space, I came across one of the CDF’s core projects called Tekton which was originally donated by Google to the CDF.

(if you are just looking for a Tekton compatible tool for managing CI/CD state w/ Slack interactivity, you can skip the rest and check out cicdstatemgr on GitHub)

Tekton

The Tekton project is made up of several components; three key ones are pipelines, triggers, and dashboard. But what makes Tekton different? Well if you look at CI/CD systems what do they all have in common? Well one big component they all seem to end up re-writing is the means by which “pipelines” made up of N “tasks” get created, distributed/scheduled, executed and managed by some kind of “master” across N number of “workers”, this is all generally custom code that gets written again and again by each vendor.

What Tekton does is define some standard ways on how you can define “Pipelines” and “Tasks” (as well as various other key patterns) as Kubernetes resources (via custom CRDs) and then just let Kubernetes itself natively manage the creation, scheduling, execution and management of those things (in the form of “PipelineRuns” that contain N “Tasks” (k8s Pods) made up of N steps (containers) . This in effect gets you a Kubernetes native way of doing CI/CD and provides the foundational components by which you can build higher level CI/CD systems without having to go re-write all that scheduling/master/worker code that is so often repeated by each CI/CD vendor in some form or fashion. Awesome! now we are a bit further along the way to some standardization for CI/CD systems! Its important to note that Tekton itself is NOT an out of the box CI/CD solution, instead its a framework by which you can build higher level CI/CD functionality on top of. Several projects have already been built on top of Tekton, one of which is Jenkins-X. If you want to learn more about Tekton itself you can read more about it here.

The project

As part of my evaluation of the current state of the CI/CD ecosystem, I decided to prototype a solution built on top of Tekton as the concepts really intrigued me with its Kubernetes centric focus and cloud agnostic footprint. I initially spent some time trying to get rolling with Jenkins-X but quickly ran into too many bugs and/or lack of support trying to get it to work with a specific git provider and particular cloud k8s vendor so I decided to let it go and circle back to that project once Jenkins-X matures a bit more.

The concept was fairly straight forward and fit a typical CI/CD use-case pattern. I wanted developers to be able to push a tag to Git, trigger a build, and then let them control when to validate/test, deploy to a development environment and finally “promote” a tag on to higher level environments such as production. Tekton doesn’t really have any “user interface” outside of the dashboard project (which is great for seeing whats going on inside of Tekton) but not really designed for facilitating a choice driven control point whereby end users can interact with an executing set of pipelines and respond to decision points etc. Given that, I decided I’d want the end user experience (i.e. developers or devops) to be mediated with Slack via interactive message controls (i.e. buttons). The thought being as various pipelines start/complete/fail or require user interaction, this would be conveyed via Slack which would serve both as a notification system for status as well as an interaction point for “choices” to be made.

Prototyping

Starting out with Tekton was not to bad, after some trial and error of learning the basics of Pipelines, Tasks, PipelineRuns I got a basic “build” pipeline functioning that could be instantiated by manually applying a PipelineRun k8s manifest which kicked off the process. The basic pipeline was made up of tasks that pulled from a Git repo, did a image build/push in Kaniko and then finally “deployed” the application image artifact by invoking a custom deployment tool that leverages helmfile-deploy under the covers.

This was a decent start but I next had to look into the Tekton Triggers project in order to be able to have a PipelineRun created automatically in response to a Git tag being pushed; rather than a human manually crafting a PipelineRun object which is not sustainable. In a nutshell the Tekton Triggers project lets you expose your own EventListeners (think HTTP endpoints) which can accept payloads sent by another system (typically JSON), use CEL expressions to extract the particulars you care about and then map those parameters to the Tekton Pipelines resources that you want dynamically created (i.e. PipelineRuns).

Now that I had the rudimentary capability to automatically trigger the pipeline based on a Git event, next was creating the ability to both send Slack notifications about events within the pipeline as well as let users in Slack make choices via interactive components (buttons) in response to those notifications via a custom Slack “app”. It wasn’t too complicated to craft some simple notifications to POST to Slack and only a bit more involved to create interactive buttons for the messages. When a Slack message button was pressed, Slack sends an HTTP POST to and endpoint of your choice; which in this case was my Tekton Triggers EventListener endpoint. With Slack however, their POSTs add an ultra annoying wrinkle of not just posting JSON, but instead JSON embedded in a a traditional url-encoded form post variable named “payload”. For this, I needed to have the EventListener call out to a custom Tekton WebhookInterceptor that could extract this and return it as plain JSON so that it could then be manipulated with CEL. This was done by writing one in go and its on github at slack-payload-handler.

Once this was in place, after a build was complete a notification was sent, then after the deployment task finished, a set of buttons was sent letting the user iterate and “re-run” the entire pipeline.

For a prototype this worked pretty good, and to be honest the majority of the challenges were not getting Tekton itself to function, but more about all the boilerplate logisitics of wrangling all the things your pipeline will need to function (think secrets, configs, keys etc), migrating Dockerfiles that build fine in Docker but behave differently when built by Kaniko, and just a lot YAML crafting, parameter mapping, debugging CEL expressions and all the typical things you end up doing writing any piece of software; and yes CI/CD is custom software regardless of if you are writing it yourself or just configuring things on top of another platform.

At the end of the day this was just a proof of concept for what was possible and it had a lot of statically defined, hardwired assumptions and copy-pasted scripts built in. To take it to a more usable state, it needed some work.

Requirements identified

Developing the prototype on Tekton exposed a few patterns that I found myself repeating and in need of a tool to simplify some of it. The areas I ended up identifying as future needs were as follows:

Task results, params and shared contextual state data

Tasks within a pipeline (and potentially across N pipelines) often need contextual state data that is shared among them via inputs or outputs. Things like “environment”, “projectName”, “version” etc and other common primitive attribute/value type variables; including sometimes more complicated structured values such as JSON/YAML snippets. Tekton provides a mechanism to address this via its “task parameters” and “task results” feature (i.e. named results point to files on disk that can be shared across N tasks in a pipeline). This worked ok for me, but task “results” have a value size limit of 4096 bytes. This limitation can be worked around be writing values (files) to a larger shared “workspace” volume. The task results worked fine most of the time within a single pipeline but didn’t easily address storing results or common “state” data across multiple pipelines separated by process (and time) boundaries. Results/data shared across N pipelines could be achieved w/ workspaces, but I just found it cumbersome to manage, think about and track across all the N YAML files. More and more the need arose to just to dynamically add and remove inputs/outputs without having to constantly be re-declaring results and params at the Task/Pipeline level.

If you have a small amount of inputs or result data the mechanism is great, but if you go beyond a handful, you can spend a lot of time declaring and mapping parameters to/from results, tasks and pipelines back and forth. In short I saw this becoming a bigger issue to manage going forward as the pipelines became more complex. Ideally using Task results and params was good for some use-cases (like moving tracking identifiers and simple result statuses around) but really I wanted to limit my use of this and just hold pointers to a more robust set of contextual state data relevant to a logical thread of execution (i.e. an application release/version) that was easy to consume and mutate.

Scoping of contextual state data

Piggybacking off the latter section, it soon became clear that the pipelines would ultimately become generic. For example and pipeline to “deploy” an application image to a cluster should clearly be re-usable across different “contexts” of execution (i.e. deploy to production vs qa vs dev etc). With this would come the need to be able to define (at a high level) within each individual application’s git project, a way to define and customize the behaviors of each pipeline execution easily by developers ultimately scoped by these “contexts” of execution. This “context” of execution would also have a set of associated data related to it that was more mutable and runtime specific. Both the behaviors, data and configuration for any CI/CD “context” may differ from one to the next.

Execution patterns & reacting to success/failure

Inside the simple prototype a simple pattern emerged: “can this be executed?”, “execute it”, “check exit code”, “invoke endpoint”. Where “invoke endpoint” might be to send something to Slack or recursively invoke the main Tekton EventListener endpoint to trigger something else within Tekton (i.e. the same or different pipeline). This pattern had the potential of becoming even more prevalent with a more complicated pipeline. I wasn’t really concerned with generating the Tekton resources to produce sets of Conditions/Tasks to fulfill this, as there are many ways to do this, but really just wanted a generic way I could handle the “invoke endpoint “ action in a consistent and customizable way per application that would utilize these pipelines. The curation of the Tekton resources for what defines the pipelines and tasks would be done separately, but this tool could be leveraged from within to make the invocations simpler.

Invoking endpoints, customize messages & options

Since a key item of each pipeline would be the ability to invoke endpoints based on some outcome or arbitrarily defined “event” (i.e. Tekton and/or Slack and more) this would need to be something exposed for customization. Since Slack was the primary means by which both notices and end-user interaction would take place there would need to be a way to easily change all options related to this (i.e. templates for messages, endpoints etc) Likewise, the ability to tweak arguments that would be sent to the EventListener itself for re-invoking Tekton based on outcomes in an automated way. In fact this pipeline config data could just be arbitrary and totally custom. Task X needs configurability for option Z? Add it in the pipeline config file and let the developer tweak it.

Taking responses and storing them

After an endpoint would be invoked as the result of some prior result (i.e. Tekton or Slack), you might often want to capture some part of the HTTP response from that call and subsequently store it back into the shared “contextual state data” mentioned previously that would be shared throughout the flow of execution. There was a repeated need to be able to generically parse an HTTP response, extract parameters and then subsequently store them for future use. This was very similar to the capabilities that the CEL expressions in Tekton triggers provided. Here I was thinking of things such as Slack thread IDs and auditing tracing items; or even analyzing a response and generating other content to be stored in the contextual state data.

Consuming and mutating contextual state data

As noted previously I really wanted to store the bulk of the contextual “state” data for a thread of “execution” outside of Tekton, but also make it extremely easy to access and manipulate from within Tasks. I.E. it should be locally accessible on the filesystem and consumable in some common formats (YAML, JSON, sourced shell ENV vars etc). The combination of these should permit just about any “task” whether a Bash script or a custom program to be able to read/consume from the data, while a dedicated interface would be needed to mutate the data. This would also alleviate having to transport some of these values around and re-declaring them over and over between different pipelines/tasks via the “results/params” mechanisms.

Letting developers customize behaviors for each app

Finally, key to any CI/CD system, if the “pipelines” and “tasks” are designed fairly generically, there should be a mechanism by which end developers can tweak the behaviors of the pipelines via custom Task consumable configurations as well as (in this case) customize the messages and interactivity that occurs at each step along the way. The majority of this behavior should be tucked away in some sort of “baseline” configuration that can be extended from, but at the end of the day, at a minimum some inputs at execution time should be consumed from these files that are managed by the end developers who are pushing code to the system.

From prototyping to iterating

Ok great, now that I had identified some of the use-cases and patterns that would be required to make a more robust implementation it was time to get started. I wasn’t setting out to go build an “out of the box CI/CD system”, but just a tool to make some things easier when crafting one on your own. Well this is where I went down the road of starting to write some code to create a utility that could be leveraged from within pipelines to address the items listed above. Over several iterations this code became refined into what is cicdstatemgr: both a CLI and python library that can be used within CI/CD pipelines (like Tekton) to help address the needs and use-cases I laid out in the previous section.

cicdstatemgr

So what kinds of things can cicdstatemgr help with exactly? Its best probably shown with an illustration of the general execution pattern it helps address (below):

Note that the pattern of Tasks in the diagram above is not imposed nor implemented by cicdstatemgr, but is just there to illustrate a common pattern that I implemented within Tekton and then utilized cicdstatemgr’s capabilities within Tasks. You are totally free to design your pipelines, flows and tasks however you want and utilize cicdstatemgr’s when, if and where they make the most sense for your particular use-case.

Below demonstrates an example workflow that was implemented with cicdstatemgr and Tekton that shows the concept of “contexts” and how they can be logically used in CI/CD. Each “context” has its own instance of “cicdContextData” that can be shared across all pipelines executed within it. When a new “context” starts, a new instance is created that may or may not be seeded with data from the prior context.

cicdstatemgr general functionality overview

Developers declare an “app-pipeline-config.yaml” file in each app’s SCM repository (name it whatever you want)

Each app pipeline config file declares one or more optional “bases” that it can inherit from, the format/layout of “bases” YAML files is the same as app pipeline config files. App pipeline config files are intended to be merged with “bases” and override them.

Each app pipeline config file defines the configuration for one or more named “cicd-contexts” each containing one or more named “pipeline” configs that can contain completely arbitrary YAML structures that you define to hold whatever configuration you want which can subsequently be consumed, referenced or mutated by cicdstatemgr’s other operations such as or just be loaded onto a Task’s filesystem via This data can then then be consumed by any tooling you want within Tasks or cross referenced from within -handle-event operations.

You create a new instance of a “cicdContextData” with theset of arguments which consume an app pipeline config file, merge it with it’s declared “bases” to yield the contents of the “cicdContextData” object that will be retrievable via its “id” throughout its life and across process boundaries. This data is stored in a “primary store” (currently redis supported), and can be loaded via onto the local filesystem on demand in YAML, JSON, or sourceable “shell” formats.

Within each “pipeline” config section there is also an “event-handlers” section that is referenced when the -handle-event <pipelineName>=<eventName> is invoked. Within here you can define N named event handler configurations of several types such as “notify”, “manual-choice”, “set-values”, “trigger-pipeline” and “respond”. Each of these has a specific set of options that you can read about here, but at the end of the day they leverage jinja2 to render out data that will be sent to endpoints; or consume from data in the “cicdContextData” to set new values.

There is a separate config and secrets file for cicidstatemgr itself which is intended to contain low level configurations/templates that are common regardless of the underlying app pipeline config being executed.

Its important to note that there is NO “schema” for what a “cicdContextData” must look like other than the general skeleton structure of an app pipeline config file that seeds a new “cicdContextData” instance when it is first created. You are free to -get/-set any k/v, nested, non-nested structure on this object. Its totally up to you to utilize and design its structure as you see fit. cicdstatemgr makes heavy use of being able to get/set data w/ JSONPath expressions as well as leveraging jinja2 for evaluating or setting new data within it. The structure is arbitrary and you defined the “contracts”.

Since cicdstatemgr is also a Python module you can utilize it directly via your own scripts if you so wish to build higher level functionality. Don’t want to do that? No problem, just utilize the CLI as-is.

Note that none of the above builds/creates a CI/CD system for you. You are still responsible for crafting your Tekton pipelines, triggers, consuming input data from EventListeners and bridging inputs to Pipelines etc…. cicdstatemgr simply is a tool in your toolbox that you can then utilize within your Tasks to call out to whenever you want.

Here is another illustration of how it could be utilized:

TODOs

At the time of this writing, cicdstatemgr is still pretty new and I consider it beta software. It’s being used in a production Tekton CI/CD implementation that is responsible for the delivery of over 30 different applications across numerous “contexts” from development, to QA and on through production. That said, there are several short-term TODOs, primarily:

  • Although initially developed to interact with heavily with Slack, the functionality is quite generic and the code is only a few modifications away from being “slack” agnostic. In fact, the only real coupling to “slack” is the name of a few configuration variables in the configs and a few slack specific headers. 99% of the slack “awareness” is abstracted away in jinja2 templates that are completely controllable by the operator. Want to interact with MS Teams? Its not too big of a leap. At the end of the day, the goal is to make it completely interoperable with anything that can accept HTTP invocations.
  • Several of the “event handler” configs, other than “set-values” only support a single action per named event. This needs to be improved to permit multiple actions.
  • Implement a “locking” abstraction for the cicdContextData which would be handy for concurrent Task executions within certain pipeline scenarios
  • Document how additional primary datasources can be added (beyond Redis)

Getting started

If you want to play with cicdstatemgr independent of Tekton, you can just start with the CLI basics by doing these examples

If you are interested in playing with an example of cicdstatemgr with Tekton that you can run yourself click here. (check view the screencast video below)

Hopefully this was a decent introduction to Tekton which is an ultra cool bleeding edge way of implementing CI/CD that can scale right on top of Kubernetes. Tekton gives YOU the power to create any CI/CD system you can dream up without having to worry about all the low level server/execution aspects of it. Let k8s handle it!

As noted previously, I’m currently using this in production and so far its working great. Thanks to the Tekton folks for making a great platform we can all build upon.

Anyways, thanks for reading, and hopefully someone might find it of use.

Originally published at http://bitsofinfo.wordpress.com on August 13, 2020.

--

--