Container Internals Series Part 1: cgroups

Experimenting with Control Groups (cgroups) in Go

Published in

Level Up Coding

5 min readMay 6, 2024

I’ve always been a big fan of containers, and while I primarily used them for the first two years, their internals remained somewhat mysterious to me. Eventually, I grasped the theory and understood how the components combined to form isolated running units. However, it always bothered me that I hadn’t delved into the code and conducted experiments myself. Who knows what I might find? In this series, I’ll guide you through code snippets I’ve written to experiment with Linux namespaces, cgroups, and more. While I enjoy using CLI commands, I prefer to work in Go, ensuring repeatability and preparing myself for future projects.

In Part 1, we’ll delve into cgroups. We’ll begin with an overview of what they are and how they’re utilized. However, the focus will primarily be on observing them in action and exploring the raw features they offer.

Control Groups (cgroups)

As per Wiki:

cgroups (abbreviated from control groups) is a Linux kernel feature that limits, accounts for, and isolates the resource usage (CPU, memory, disk I/O, etc.[1]) of a collection of processes.

This definition should suffice for an SRE engineer or someone versed in container technology. To add my two cents, cgroup is a collection of processes on which you can specifically craft resource usage, which makes them nicely fit into the image of containers. Why? Because containers, beyond isolation and other features, essentially constitute a group of processes. By grouping these processes into cgroups and configuring resource limits, we effectively control container resource utilization.

It’s worth noting that this capability isn’t exclusive to containers; it’s a standalone feature. To illustrate, here’s a snippet showcasing how to leverage it in practice.

⚠️ Note: The following snippet does not cover all the use cases, but I found it cool to tinker around and conduct some tests. Make sure to check the comments for more details.

package main

import (
 "log"
 "time"
 "os/exec"
 "github.com/containerd/cgroups/v3/cgroup2" 
)

// Library used is containerd/cgroups (v2): https://github.com/containerd/cgroups

// Utility function (check below for usage)
func pointerInt64(int int64) *int64 {
 return &int
}

func main() {
 var (
  quota  int64  = 200000
  period uint64 = 1000000
  //weight uint64 = 100
  maj  int64  = 8
  min  int64  = 0
  rate uint64 = 120
  max int64 = 1000
 )
 res := cgroup2.Resources{
  // NOTE: Under CPU section we limit the CPU time the processes inside this cgroup can use to 20% (quota/period)
  CPU: &cgroup2.CPU{
   //Weight: &weight, // e.g. (weight in the child cgroup) / (sum of cpu weights in the control groups) => percentage of cpu for this child cgroup processes
   Max:    cgroup2.NewCPUMax(&quota, &period), // e.g. 200000 1000000 meaning processes inside this cgroup can (together) run on the CPU for only 0.2 sec every 1 second
   //Cpus:   "0", // This limits on which CPU cores can the processes inside this cgroup run (NOTE: Also "Mems" needs to be set: https://github.com/containerd/cgroups/blob/fa6f6841ed3d57355acadbc06f1d7ed4d91ac4f7/cgroup2/manager.go#L97!)
   //Mems:   "0", // Memory Node” refers to an on-line node that contains memory. 
  },
  Memory: &cgroup2.Memory{
   Max:  pointerInt64(629145600), // ~629MB // If a cgroup's memory usage reaches this limit and can't be reduced, the system OOM killer is invoked on the cgroup. 
   Swap: pointerInt64(314572800), // Swap usage in bytes
   High: pointerInt64(524288000), // memory usage throttle limit. If a cgroup's memory use goes over the high boundary specified here, the cgroup’s processes are throttled and put under heavy reclaim pressure. The default is max, meaning there is no limit.
  },
  IO: &cgroup2.IO{
   Max: []cgroup2.Entry{{
    Major: maj, 
    Minor: min, 
    Type: cgroup2.ReadIOPS, // Limit I/O Read Operations per second for a block device identified as (major, minor) - e.g. "ls -l /dev/sda*"
    Rate: rate, // number of (read) operations per second
   }},
  },
  Pids: &cgroup2.Pids{
   Max: max, // number of processes allowed - The process number controller is used to allow a cgroup hierarchy to stop any new tasks from being fork()’d or clone()’d after a certain limit is reached.
  },
 }
 
 // dummy PID of -1 is used for creating a "general slice" to be used as a parent cgroup.
 // see https://github.com/containerd/cgroups/blob/1df78138f1e1e6ee593db155c6b369466f577651/v2/manager.go#L732-L735
 // "'-' inside the cgroup name make a child branch => my-cgroup-abc.slice - my.slice/my-cgroup.slice/my-cgroup-abc.slice/<processes>"
 m, err := cgroup2.NewSystemd("/", "my-cgroup-abc.slice", -1, &res)
 if err != nil {
  log.Fatalln(err)
 }
 cgType, err := m.GetType()
 if err != nil {
  log.Fatalln(err)
 }
 // Print cgroup Type - Ref: https://www.kernel.org/doc/html/v4.18/admin-guide/cgroup-v2.html#Threads
 log.Println(cgType)

 // Run stress command to run on 1 CPU for 30 sec (NOTE: apt install stress)
 cmd := exec.Command("stress", "-c", "1", "--timeout", "30")
 // Start the command
 if err := cmd.Start(); err != nil {
  log.Printf("Error starting the command: %v\n", err)
  return
 }

 // Retrieve the PID of the started process (+1 because stress command internally spawns another child process depending upon how many CPUs you want to stress (one process per stressed CPU))
 pid := cmd.Process.Pid + 1
 log.Printf("PID of the spawned process: %d\n", pid)

 // Add stress process to cgroup
 if err := m.AddProc(uint64(pid)); err != nil {
  log.Fatalln(err)
 }

 procs, err := m.Procs(false)
 if err != nil {
  log.Fatalln(err)
 }
 log.Printf("List of processes inside this cgroup: %v", procs)

 // cgroup freezer - freezes processes inside cgroup (retains memory, but stops execution on the CPU)
 log.Println("Freezing Process")
 if err := m.Freeze(); err != nil {
  log.Fatalln(err)
 }
 time.Sleep(time.Second * 15)

 // cgroup freezer - continue running previuously freezed processes
 log.Println("Thawing Process")
 if err := m.Thaw(); err != nil {
  log.Fatalln(err)
 }

 // Wait for the stress command to finish
 if err := cmd.Wait(); err != nil {
  log.Printf("Error waiting for the command to finish: %v\n", err)
  return
 }

 // Delete cgroup to not leave dangling resources
 err = m.DeleteSystemd()
 if err != nil {
  log.Fatalln(err)
 }
}

While the program is running, you can use htop, and what you'll see is that the stress command should be running on 1 CPU core at 100%, but actually, it is only able to stress it for 20% because we limited it using cgroup. Similar limitations are configured for memory, IOPS, the number of PIDs, etc. Pretty neat, right?

Container Internals Series Part 2: Network Namespace

Linux Network Namespace from scratch using Golang

levelup.gitconnected.com

Conclusion

In this series,starting with cgroups in Part 1, we delve into cgroups and their effects on CPU, memory, and I/O. Through simple experimentation, we uncover how cgroups shape container behavior, without the theatrics.

To stay current with the latest cloud technologies, make sure to subscribe to my weekly newsletter, Cloud Chirp. 🚀