DynamoDB: 5 specific ways to supercharge your database design

Rawad Haber
Level Up Coding
Published in
12 min readSep 27, 2022

--

Credit Digital Cloud Training

So you’ve started working on your first serverless application, you know a thing or two about designing back-end applications but have worked with things like Express, or any backend framework, alongside Postgres behind an ORM like Prisma, Sequelize, or TypeORM. For whatever reason, the decision was made to build a new service using the AWS serverless stack. You get to the Database part of the stack, where DynamoDB is almost always and exclusively the database of choice in the AWS serverless world.

You start pondering: Wait this “database” does not support joins? It is only one table? This must be some form of sick joke, surely 😳? Oh my, you can only query data based on the primary key, and for that matter what the heck is a Hash Key, and what is a sort key sorting exactly? If I need to query based on anything other than the primary key, do I need a Global Secondary Index? What does that even mean🤦🏻‍♂️?

For many of us as we’re just starting out with DynamoDB these kinds of questions are valid concerns. After all, this is not what we were taught in university and not what we have been doing for the last few decades.

But here is the good news: once you really understand how DynamoDB is intended to be used, its fundamentals, and coupled with a good understanding of the application you’re trying to build, it will start to feel a bit like magic! Infinitely scalable, with very low latency numbers and no need to maintain any infrastructure whatsoever. Even better, I would argue that if you design your database well, you won’t even have any need for in-memory caches such as Redis!

Alright, so here goes, 5 specific ways to level up your DynamoDB designs.

I. Get comfortable with DynamoDB-specific concepts

There are some Database concepts that DynamoDB shares with industry-standard SQL and NoSQL databases, but also a lot of notions that only apply to it. Let’s take a look at some of the DynamoDB-specific concepts, and discuss their importance.

Partition

I believe this is the most critical piece to really understand. On the surface, a DynamoDB database is only a single table. But the reason such a Database works for big companies such as Amazon, Snap, Disney, and Zoom is largely linked to partitioning. When you store an item in DynamoDB, it gets assigned a partition.

A partition is a logical space backed by SSD drives that will hold a piece of data you store. You don’t create partitions, rather they are determined based on the partition key you assign to a given item. This is one of the many properties that allow DynamoDB to scale so well: Ideally, although it is one table, data should be evenly distributed among partitions that only exist behind the scenes.

Your mental model here should change from tables and rows to partitions and items on the same partition.

Partition Key (Hash Key)

If you come from a relational background or have taken a database design course in university where most likely you learned things like SQL, normalization, and relationships, you might be tempted to think of the partition key as the equivalent of a primary key in SQL. Not quite!

Although a partition key can be used to uniquely identify an item in a given DynamoDB database, its main function is to determine where DynamoDB stores your item behind the scenes across the key space. So make sure to be very deliberate and purposeful with how you choose your partition keys, they are absolutely essential to a good DynamoDB database design.

Sort Key (Range Key)

Once DynamoDB has determined which partition to use behind the scenes to store your item, it will then use the range key to sort multiple items on the same partition.

By default, when you run a query operation in Dynamo, you get back an array of items that are sorted in ascending order based on the sort key.

Now here’s the kicker, and I see people miss this quite a bit:

An item is uniquely identified by the combination of the partition key AND the sort key, i.e the combination of the two is the primary key!

Global Secondary Index (GSI)

A very particular property of Dynamo is that it can only be queried by its primary key. It is strongly coupled to its keys, which can be both a blessing and a curse. I will admit, sometimes this can be a bit frustrating!

A GSI will allow you to query on attributes that are not the hash or range keys in the original table. The way this works is you will choose what your new hash and range keys are for that index from any of the existing attributes, and Dynamo will asynchronously replicate data from the main table to the index behind the scenes. From then onwards, you can use this new key to query that index based on the new attributes you have selected.

Use as many indexes as you need to fulfill your requirements, but if those start getting out of hand, you might have a monolith that’s trying to do too much on your hands!

GetItem operation

This is not too different than the findOne method in most ORMs. It will always take in the primary key (hash key & range key) as a parameter and return a single item from a given table.

You’ll want to use this in your application when you have enough context to know the exact primary key of your item, say when a user selects a row in a products table on the UI, which would fire off an API call. That would in turn use the GetItem Dynamo method behind the scenes to fetch that particular product by its primary key.

Query operation

We’ve already talked about the concept of a partition and how that’s used. The Dynamo query operation will take in the partition key as a parameter and optionally the sort key, and will return all items it stored on that partition sorted by range key.

This is why being deliberate with your partition key is crucial: if you’ve designed for items to be stored in a given partition on purpose, you will get back all those items ready to go usually with very low latency!

II. Invest some time in the design phase

Don’t get me wrong, you should be doing this regardless of whether it’s DynamoDB, Postgres, or MongoDB. Investing time in database design is absolutely perennial, but it’s even more important when working with DynamoDB.

I have picked up a few tricks from the man himself Alex DeBrie in his book on DynamoDB which was an absolute pleasure to read! However, I have also added my own twist to it. Let’s go over the essential building blocks for designing your DynamoDB database.

Domain Entities

Start out by clearly defining your domain entities:

  • What is this entity from a functional perspective?
  • What are its attributes?
  • What uniquely identifies it?
  • Does it have any relationships with other domain entities? If so, what is the cardinality?

Again this can be another eye-opener if it feels like there are too many domain entities, and too many relationships, it might be that you’re trying to do too much within the context of one service!

Say we are building an access management service, you’ll have three domain entities:

  • User: The user that will access your application. A user can have many roles.
  • Role: A given role a user can have. It determines what data they can access from the back-end, as well as what they see on the UI.
  • Organization: The organization that a user is linked to. An organization can have multiple users, but a user can only belong to a single organization.

Access Patterns

List out how you want to access these entities once stored in your database. This will normally map out to your API responses that would be consumed either on the client side or by another back-end service.

In our access management service, that might look something like this:

  • For a given user, fetch all the roles they have
  • For a given organization, fetch all the users that are linked to it
  • For a given role, fetch the number of users that have it (A difficult one admittedly, DynamoDB doesn’t really support aggregations, well not directly at least.)

Here, you should also specify whether an entity will be fetched from the main table or a GSI.

Entity Schema

You should probably be aware that DynamoDB is schemaless, meaning it won’t enforce any schema. This schema that you’ll put together here will have to be enforced by your application code.

For me the way I like documenting schemas is through a TypeScript interface notation, I’m biased I know, it is my favorite language after all!

On a side note, TypeScript will not enforce your schema either as it doesn’t validate data at runtime. For that, you’ll need runtime validation, best done through my favorite TS library Zod!

A schema might look something like this for a user:

An organization:

Putting it together

Finally, this whole document should live as close as possible to developers on your team. To that end, I have found the best place to write it is in a markdown file that lives in the root directory of your project!

III. Know how and when to use single table design

In the product I work on myself, we use this database design quite a bit. Remember when I mentioned that items with the same partition key will be stored on the same partition behind the scenes? Well, this is the little secret to this way of designing a DynamoDB database.

Essentially, a single table design is where you make your partition key and sort key generic, and overload them. Normally, they would be denoted as PK and SK respectively. You would do the same with GSIs as well, adding GSIPKs and GSISKs as and when needed.

Here’s a little serverless framework snippet of how you would create such a table:

So in our access management example, one of our access patterns is to serve for a given organization all users linked to it. In this case, it would make sense for an organization entity and a user entity to share a partition key, i.e live on the same partition. That would then allow us to run a query operation against that partition key, and Dynamo would return everything on that partition!

So you would store an organization entity in DynamoDB that roughly looks like this:

And a couple of users that look like this:

So now when you query against the ‘ORG#LFCLTD’ PK, you get back all three items. Effectively, what you just did here is mimic a join in SQL, by being deliberate with your choice of PK!

A Query operation that will result in all three items being returned will look like the following, I’m using TypeScript again here:

This type of design is really powerful and results in very low latency responses and scales amazingly well. But this will only work if:

  • Your service is lean enough, and it’s not trying to do too much. It doesn’t try to cover tens of domain entities, otherwise, complexity will rapidly grow
  • You know the access patterns you are trying to fulfill well and have a solid grasp on the functionality of your service
  • There are no ad-hoc query requirements, meaning the end-user can query your database however they wish. It works with SQL, it just doesn’t work with Single Table Design at all

IV. Favour multi-table design when working with GraphQL

There are times when multi-table design using DynamoDB will make more sense than single-table design. One of those times will be if you’re using GraphQL, or AppSync in the AWS world.

I absolutely love GraphQL, and AppSync too, although I’m not a fan at all of VTL resolvers and am still patiently waiting for TS/JS resolvers. The general premise in AppSync is that you define a schema, which will contain the domain objects of your application, the fields within them, and two very important top-level objects:

  • A query: define all the queries that will be used to fetch data
  • A mutation: define all the operations that will be used to write/update/delete data

A lot of the normalization concepts we are familiar with will apply here. In our access management example, we will store a user object and an organization object in two separate tables now. The Graph, AKA our schema might look something like this:

Now the consumer of this GraphQL API will have a choice to make when they fetch a user using the getUserByOrgId query:

  • They don’t need organization info, and so they choose to omit the Organization nested resolver you see within the user object. What this means is that we only hit one table, the user table
  • They do need detailed organization info, say for the UI they are building, and as such will then use our nested resolver. This is pretty cool because our API now will respond with a user object and a nested organization object. In this case, we will be hitting two tables. But GraphQL takes care of the ‘joining’ for us!

This is great because it gives us a lot of flexibility to use however many tables we need, giving our consumers the choice to pick and choose what data they want to fetch. Finally, the GraphQL schema is now the ultimate source of truth, AKA our documentation!

Do bear in mind, however, that this does come at the cost of adding a bit of latency to your API. At the end of the day, although GraphQL is taking care of the joining, you are still making two subsequent requests to two separate DynamoDB tables as opposed to one request as we did in the earlier section.

V. Use CDC( Change Data Capture) to add support for aggregations and caching

DynamoDB has DynamoDB streams (MongoDB has a similar feature), a nifty feature that will publish an event to a stream whenever an item in your table has had a change of state. This can be one of the following:

  • Create
  • Modify
  • Delete

And just recently, AWS has added support for event mapping in DynamoDB streams. Meaning, a subscriber to that stream, say a lambda function, will only be triggered if the event pattern matches. Very similar to Event-Bridge in that way.
In Serverless framework, this might look something like this:

The way this works is it will trigger the AddUserCount Lambda function only if the item concerned has an attribute kind of type string and its value is User. This is another trick I like to use, especially in single table design, to have a static kind attribute that defines what domain entity an item is.

Going back to our slightly complex access pattern, fetch the number of Users linked to a given organization. The way you would fulfill that using DynamoDB streams is by checking the PK of the user, i.e which organization they belong to, and then running an Update operation on the relevant organization entity. This will result in adding an attribute of type Number, userCount, pre-computed and ready to go, nothing for you to do in your Lambda code!

This would look something like this:

The thing is, most use cases I’ve seen people use DynamoDB streams for have been along the lines of moving data from Dynamo to another type of OLAP DB, say Redshift or ElasticSearch. This is a very valid use case indeed, but don’t sleep on doing aggregations using DynamoDB streams. Not only counts but also sums and totals!

Level Up Coding

Thanks for being a part of our community! Before you go:

🚀👉 Placing developers like you at top startups and tech companies

--

--