Cloud Design Patterns— Explained Simply

Published in

Level Up Coding

10 min readJan 17, 2020

“There are only patterns, patterns on top of patterns, patterns that affect other patterns. Patterns hidden by patterns. Patterns within patterns…”
- Chuck Palahniuk

Perhaps the simplest way to describe a pattern would be a regularity that predictably repeats itself. The beauty of patterns lies in their ubiquity — From the way organelles are arranged in the smallest of micro-organisms to the way stars line up in galaxies, patterns exist everywhere. You simply can’t escape these patterns which govern most (if not all) laws of our physical world.

Unsurprisingly, design patterns also exist in the universe of software design. A software design pattern is a reusable solution (template) to commonly occurring issues. The world first started to take note of design patterns in software in the 1970s and much like the technologies they have come to now build, patterns have been evolving ever since.

Building efficient and scalable software is a lot like building a brick wall — A brick-layer takes great care to ensure the pattern and symmetry of their wall is maintained. This allows the foundation to be strong enough to support the weight of everything that is stacked on top of it. It allows the building of a taller, stronger, more “scalable” wall (pardon the pun). Like the brick-layer, a software engineer needs to pay attention to the design patterns that direct the underlying software if they are to create a solution that is both efficient and scalable.

With the rising popularity of Cloud Computing, the focus of Engineers everywhere has now shifted to harnessing the patterns that govern these processes.

Here’s a compilation of cloud design patterns that I’ve found to be repeatedly useful to me. I admit this list is biased from my own experience but I will try and add to it as I learn more. Hope this helps you understand design patterns that much better.

1. Asynchronous Request and Reply

Problem Statement

When we talk about Cloud Applications, Micro-services often come to mind. This is an architecture style where multiple micro-services i.e. remote APIs or third-party services are composed to perform dedicated functions to provide a particular functionality to a client application.

In such a setup, synchronously processing the client request before all the back end work is completed may not always be feasible. This is especially true in the case of long-running processes. Furthermore, latency also becomes a consideration when the response to the client needs to be a few milliseconds.

Solution: Asynchronous Processing (HTTP Polling or Event Notification)

The client makes an API request
The application now offloads the work to another back end service or message queue
The client can poll for the resource/process status using an HTTP GET (HTTP Polling) at regular intervals as appropriate for the client application
The status API returns “In-Progress” indicating the back end process is still running
Once the process completes, the status will return the required output or another reference to the resource
The application can also push event notifications once the process is complete without the need for the client to poll repeatedly

Let us simplify with an example

Imagine you’re too tired to make dinner and decide to place a takeaway order over the phone at your favorite restaurant (API request)
The restaurant provides you with an order number (Acknowledgement and reference identifier)
You arrive at the restaurant to pick up your order, provide give them the order number and the restaurant staff informs you that your order is still not complete
If you’re particularly hungry, you might return to the reception to inquire about your order every 5–10 mins (HTTP Polling)
Or, the restaurant can provide you with a pager that buzzes once your order is complete (Event Notification)

2. Command and Query Responsibility Segregation (CQRS)

Problem Statement

Traditionally, people interact with Information Systems through a CRUD data store. Read and Write Data models are often the same. As the complexity of the application increases, multiple representations of the information are created, all referring to one common data (conceptual) model. This can lead to data/resource contentions, performance slowdown and in some cases, security issues.

Solution: Separate the Read and Write models

This can be done by creating separate schemas or different databases for the Read and Write operations. All reads from a single data store and all Writes/Updates to a separate data store
Creating separate databases provides additional isolation that helps with scalability and performance
The Write databases can be relational while Read databases can be No SQL document-based
This approach not only provides separation of concerns but also allows each data store to independently scale based on its workload
The important consideration in this model is to maintain sync between the Read and Write stores. This is typically achieved through publishing events from the Write store and consumed by the Read store

Let us simplify with an example

Imagine you’re still waiting at the same restaurant for your order… Your stomach’s growling in hunger as you wait for your takeaway.
You notice the delay is due to there being only a single server who is taking care of all diners and takeaway orders
While Roger, our extremely busy server, is taking orders from one of the diners, the other diners are all waiting to place their orders
Then some (like you) are waiting for a status on their takeaway orders and others who have placed their order and now waiting to be served
Sadly, Roger has become the bottleneck in this operation and the overall performance of the restaurant has degraded
Thankfully, Charlie is back from his break and gets to work on accepting takeaway and dine-in orders while Roger proceeds to serve food to the tables and takeaway diners

3. Event Sourcing

Problem Statement

When interacting with data, most applications store the current state of the data. Updates to the data elements are typically overwritten by the latest. If one wanted a history of all the updates, the application would need to be built to maintain history tables. Every time an update request is made, the current data would be moved to the history table before being overwritten. This adds to the overhead of having a robust historical backup and ensuring the application scale plan includes scaling the historical stores as well.

Solution — Event Store

Define application change as a sequence of events and record them in an event store in the sequence they were applied (an audit trail)
The events are immutable and stored as append-only but can be published to consumers who may process them as required
This model not only stores the history of the state of the application but also allows a replay of those events to obtain the current state of the application — one of the following

- A complete rebuild
- A point in time rebuild
- Reverse events

The trick here is dealing with external systems in a manner where they do not know the difference between real transactions and replays

Let us simplify with an example

Event sourcing works a lot like Book Keeping: A bookkeeper will log the day’s financial transactions such as payments, receipts, purchases etc made by an organization and document it within either a supplier or general ledger
All the financial events, money-in and money-out are now available in these ledgers (..the audit trail)
An accountant can use this information to create reports for any custom time-period
Going through the transactions (replay), the accountant could potentially recreate the same report multiple times and still get the same result for that specified period

4. Retry

Problem Statement

A distributed environment is prone to transient errors due to slow networks, timeouts etc. But these issues typically self-correct and if the action is re-triggered, it’s likely to succeed. In such situations, applications need to handle these transient failures without impacting the end-user experience.

Solution — Try and try again

There are 3 ways to handle transient failures

Stop and Report exception: If a fault isn’t transient or cannot succeed when repeated, the application could raise an alert and log the exception

Retry immediately: If a fault is rare, the application could retry the failing request immediately and the request may be successful

Retry with a delay: If a fault is caused by connectivity issues or issues that may need a short period, the application could retry the failing request after a reasonable amount of time has passed

The time delay and number of retries can be configured to suit the application needs
If the request still fails even after the desired retry count, the application could report it as a fault and raise an alert

Let us simplify with an example

It’s your dear friend’s birthday! You want to be the first one to wish them so you call them exactly as the clock strikes 12
The phone’s busy… You figure someone beat you to it. You hang up (kind of disappointed)
But you also know the phone won’t be busy for too long. So you redial, and this time you get through and wish them. #Hurray

5. Circuit Breaker

Problem Statement

There are situations where failures can be caused by unexpected situations and take relatively longer to fix. Retry or waiting for the request to timeout may not be the best option as it may cause further cascading issues such as resource contention and/or blocked threads.

Solution — Fail fast

Prevent the application from retrying an operation likely to fail

A circuit breaker acts as a proxy that monitors recent failures of an operation
The proxy maintains a count of failures and if it crosses the set threshold, it’s placed in an Open state
In the Open state, the request fails immediately and is handled appropriately by the application
However, limited requests are still allowed to pass through to check if the operation is still failing or has been successful (fixed)
If the operation continues to return failures, the Open state continues
If the operation was successful, it is assumed the issue has been fixed and the circuit breaker switches to Closed state
Error and failure handling in this pattern requires careful consideration to create an acceptable end-user experience

Let us simplify with an example

This pattern is inspired by the actual circuit breaker present in the electrical lines at home or office
Electrical Circuit breakers are a protective measure against damage to a circuit in the event of an electrical current overload
It connects to your circuit board and interrupts the flow of electrical current if it detects a fault in the flow
In the event of a fault, the breaker switch automatically goes off and stops the electricity from flowing through the circuit (Open)
Without circuit breakers, in the event of a power surge, you’d be left with blown fuses
With a circuit breaker, all you do is unplug some of the appliances that caused the power surge, and flip the circuit breaker switch back to the “on” position (Closed)

6. Sidecar

Problem Statement

Peripheral tasks such as Monitoring, Logging etc are critical to most applications and often integrated within them. However, these tasks run alongside the same process as the application which is potentially inefficient and points to improper separation of concerns. Plus, outages caused in these monitoring and logging components could severely impact the entire application functions.

Solution: Co-locate as Sidecar

Co-locate the set of tasks along with the application but place them in their process or container as a Sidecar
Sidecars are typically small/pluggable components and can be written in different languages
Both application and sidecar are deployed as a single unit, therefore latency is low
The sidecar can be used for modifying how the application container works without having to make any changes to the code
The sidecar is bolted to the application and its lifecycle is dependent on the application
If the sidecar container logic gets complex or tightly coupled with the main application, it may better be integrated with the main application’s code instead

Let us simplify with an example

A common example of the sidecar is the literal sidecar attached to a motorcycle — working as a single unit
The sidecar enhances the usability of the motorcycle by increasing passenger capacity
The sidecar could be made by another company and need not be bought from the motorcycle manufacturer — providing modularity and flexibility of choice
It does not have an engine of its own but relies on the engine of the actual motorcycle. If the motorcycle stops, the sidecar stops

While we’ve only barely scratched the surface here, I do hope this list was helpful to you.

Which design patterns do you find yourself employing constantly? — I ‘d love to hear more about them.

Please feel free to post your recommendations in the comments below and I’ll do my best to try and update this list to include them.

Cloud Design Patterns— Explained Simply

1. Asynchronous Request and Reply

2. Command and Query Responsibility Segregation (CQRS)

3. Event Sourcing

4. Retry

5. Circuit Breaker

6. Sidecar

Written by Rahat Shaikh