System Design Interview: Design WhatsApp

Published in

Level Up Coding

6 min readJun 18, 2024

In this system design interview scenario, we’re asked to design a messaging app similar to WhatsApp.

While a real interview might focus on one or more functionalities of the app, in this article, we’ll take a high-level overview of the system’s architecture, and then you could explore specific areas in more depth if needed.

Clarifying Functional Requirements

Let’s narrow down the scope with some questions to the interviewer, as designing the entire WhatsApp platform in an hour is unrealistic:

Primary Use Case: The app’s primary purpose will be to send, check, and receive messages, as well as to read and mark messages as read.
Groups: We will not cover group messaging, only one-on-one messaging.
Content Types: We will only support text messages and will not support images or videos.

Clarifying Non-functional requirements

Scale: First, let’s talk about the scale, which is how large the system will be and how many messages it will handle. Let’s assume that 10 billion messages are sent daily, and we aim to double that within a year.

Availability: In terms of availability, we want the system to be highly available and always operational.

Latency: As for the system latency, we want it to happen almost instantly, so the majority of API requests should be completed within 100ms.

Estimation: Data Math

With 10 billion daily messages, we have roughly 10B messages / 86,400 seconds per day = 115,740 messages per second (MPS). Doubling within a year means we should plan for 115,740 * 2 = 231,480 MPS.

Assuming 200 bytes per message, daily storage is 10B messages * 200 bytes = 2 terabytes (TB). And yearly storage with growth is approximately 2TB * 365 days * 2 = 1.5 petabytes (PB).

It’s important to note that we calculated averages, but systems need to handle peak traffic, which could be significantly higher than the average MPS. We might need to scale up based on peak times.

High-Level API Design

We’ll likely use a RESTful API style for broader compatibility. Here’s a breakdown of possible endpoints:

Send a Message (POST /messages): The request body includes the recipient’s ID and message content. A successful response (200) returns a unique message identifier. Error codes (400, 500) handle missing parameters or server issues.
Check for New Messages (GET /messages): The response is either a 200 with an array of unread messages or a 204 if there are none.
Get a Specific Message (GET /messages/:messageId): Returns a specific message (200) or a 404 if not found.
Mark Message as Read (PUT or PATCH /messages/:messageId): A successful response (200) confirms the change, while a 404 indicates the message wasn’t found.

Additional Considerations: We’d integrate WebSockets for real-time updates. The API would handle authentication and initial connection establishment. And pagination might be needed for the ‘Check for New Messages’ endpoint. Security measures, like input validation, are also necessary.

High-Level System Design

Mobile App: The primary interface for users will be the mobile app (iOS, Android). This app handles sending and receiving messages, contact management, and conversations.

Load Balancer: To handle incoming requests efficiently, we’ll use a load balancer to distribute traffic across multiple servers. This improves our application’s reliability.

API Servers: All requests will go to the API servers, which handle the RESTful APIs we outlined earlier, managing messaging logic. API servers themselves could be stateless; this way, we can scale out horizontally (adding more servers) as traffic grows.

WebSocket Connections: WhatsApp-like apps heavily rely on WebSockets for real-time communication. The chat servers will maintain persistent WebSocket connections with the mobile apps. When a message arrives, it can be instantly pushed to the recipient’s device.

Message Distributor: Next, we will have a Message Distributor service, and the main purpose of this service is to decouple API servers from direct database writes, which is especially important for handling the high write volume.

A message queue, such as Kafka or RabbitMQ, is a great fit here. Here’s how it will work:

The API server receives a “Send Message” POST request.
It places the message on the queue and promptly returns a success/acknowledgment to the client.
Separate worker processes asynchronously read from the queue and write messages into the database.

Database (NoSQL): We agreed that eventual consistency is acceptable, and this makes NoSQL a scalable choice for the high message volume.

Let’s consider two strong options:

Cassandra: Wide-column store known for scalability, high availability, and write performance. Especially good if we anticipate high write volume with simpler read patterns (message fetch by ID mainly).
DynamoDB: Fully managed key-value and document database offered by AWS. It is advantageous if we want a minimal-maintenance database solution that scales easily.

Sharding and Partitioning: It’s crucial to shard (horizontally partition) the data since no single database can handle our 1.5 Petabyte storage needs.

But how would we shard and partition these data, and how do these API servers know where to request that data from?

We could partition based on userId. All messages involving a user will reside in the same shard/partition. And our API servers have two potential ways to locate data:

Consistent Hashing Ring: Data location can be determined based on the partition key, allowing API servers to route requests to the correct database shard directly.
Metadata Service: A separate service keeps a mapping of partition keys to shard locations. API servers query this service first, then make the database call.

Conclusion and current system bottlenecks

This outlines the primary architecture for a WhatsApp-like application. Now, let’s examine the potential bottlenecks in our current system and areas for improvement:

Database Writes: High write volume is a potential bottleneck. Sharding, message queues, and optimized database choices are crucial.
End-to-End Encryption: The WhatsApp model heavily emphasizes security. Implementing end-to-end encryption would be a crucial discussion.
Group Chats: This feature brings additional complexity to message routing and storage.
Media Handling: We can implement a system for handling image and video uploads, using compression here and multiple storage sizes for thumbnails.

If you’d like to learn more about each component we discussed here, I have a FREE System Design Interview Concepts Tutorial where I go over each of them in more detail.