Messaging Queue Comparison : NSQ and Apache Kafka

Published in

Level Up Coding

6 min readJun 28, 2020

As the system grows bigger and bigger, for avoiding any single point of failure, often we switch from the old monolithic to micro-service architecture. By dividing the application into smaller independent units, the interaction count between each unit will grow significantly.

Messaging queue offers another way to communicate and coordinate asynchronously especially in this decoupled unit while improving performance, reliability, and scalability. The component which adds the message to the queue is called producer while the component which retrieves the message and process it called a consumer. Consumer and producer doesn’t interact directly and uses the broker as the one that usually manages the queue.

Apache Kafka and NSQ is the example of the messaging queue which currently really attracts my own interest.
So let’s get into it.

1. NSQ

NSQ is a realtime distributed messaging platform which is a successor from simplequeue.

The core component of the NSQ is consist of :

nsqd is the daemon that receives, queues, and delivers message to client.
nsqlookupd is the daemon that manages topology information.
Clients query nsqlookupd to discover nsqd producers for a specific topic and nsqd nodes broadcasts topic and channel information.
nsqadmin is a Web UI to view aggregated cluster stats in realtime and perform various administrative tasks.

NSQ offers :

High-availability topology that minimizes SPOF.
Increase the availability by setting up multiple instances for nsqd and nsqlookupd.
Guarantees that the message is delivered at least once.
Certain degree of persistence. The message is stored until the consumer sends the finish signal.
Easy to configure

A single nsqd instance is designed to handle multiple streams of data at once. Streams are called “topics” and a topic has 1 or more “channels”. Each channel receives a copy of all the messages for a topic.

Both topics and channels are not pre-configured. Topics are created on first publish on the named topic or by subscribing channel on the named topic. Channels are created on the first use of subscribing to the named channel.
and both buffer data independently of each other.

A channel generally does have multiple clients connected and each message will be delivered to a random client

There are things to re-consider for the NSQ such as :

The topology you are using will always impact the reliability
Using only single instance will make it more prone to single point of failure. NSQ is designed to be used for multiple instances.
When server NSQD crashes ungracefully, there might be a dataloss. Since there is no built-in replication.
Unordered message
Since all of nsqd instances don’t communicate between each other, unordered message its possible occurrence.
Duplicated message. Any case of consumer time out, NSQ will do the re-queue for the message, creating the possibility of the duplicate message.

2. Apache Kafka

Comparing Kafka Streaming Platform to a messaging system such as NSQ is not an apple to apple comparison. So what we will go through is only the Kafka Messaging System.

Kafka said that they have a better throughput compared to other messaging queues. Supported by built-in partitioning and replication also fault tolerance, making it one of the reliable messaging system.

A topic is a category or feed name to which records are published. Topics in Kafka are always multi-subscriber; that is, a topic can have zero, one, or many consumers that subscribe to the data written to it.

For each topic, the Kafka cluster maintains a partitioned log that looks like this:

Each partition is an ordered, immutable sequence of records that is continually appended to a structured commit log. The records in the partitions are each assigned a sequential id number called the offset that uniquely identifies each record within the partition.

The Kafka clusters persist all the records whether they have been consumed or not using a configurable retention period. For example, If we set the retention period for five days, then the message will still persist for five days after it has been published and still available for consumption.

Same as other messaging systems which consists of Producer and Consumer. Kafka has a unique way of how both of these works :

Producer

Publish data to the topics of their choice. The producer is responsible for choosing which record to assign to which partition within the topic. This can be done in a round-robin fashion simply to balance load or it can be done according to some semantic partition function (say based on some key in the record).

Consumer

Consumers label themselves with a consumer group name, and each record published to a topic is delivered to one consumer instance within each subscribing consumer group. Consumer instances can be in separate processes or on separate machines.

If all the consumer instances have the same consumer group, then the records will effectively be load balanced over the consumer instances.

If all the consumer instances have different consumer groups, then each record will be broadcast to all the consumer processes.

3. Comparison

Both messaging queue have a unique way of defining their architecture and how their broker works, but in several points, this might be a consideration which platform you would choose :

Availability

If the NSQD server crashes ungracefully, there might be a possible dataloss.
Kafka has a build-in replication and partitioning which make it have a higher availability and reliability. with replication factor N, Kafka could tolerate N-1 server failures without losing any records.

To cater to this kind of problem, we could create redundant NSQD pairs on a separate host that receive the exact same copies of the message.

Persistence

NSQ will delete the message if the consumer have already send the finish signal for the message.

Kafka has another rule by setting the retention whether it is time based or size based, but the message is still persist after a certain specified time/size from the moment they are published.

Replay-able messages

Since Kafka have the persistence storage system of the records, they provide the capability of replaying the message over and over again as long as it is still stored.

Order of the message

Since multiple instances of NSQD doesn’t communicate with each other, there are always a possibility of an unordered message. While Kafka maintain each of their partition as an ordered sequence of records, Kafka will always provide an exact order of the message in a partition.

4. Similarity

Both of NSQ and Kafka is quite a feats compared to others traditional message broker, Since both of them using Publish/Subscribe pattern and the way the Kafka’s Consumer Group have similarities with the NSQ’s Channel system.

Still both of them provide a much more reliability, scalability, and persistence in their own degree.

5. In My Humble Opinion

For an open source messaging queue, NSQ provides quite a magnificent architecture and use-case, While on the other hand, Kafka provides much more sturdy platform from persistence, reliability and availability.

On this case, Which platform do you choose is based on only this one question.

Is dataloss is acceptable to you ?

If the answer is no, then Kafka is the answer.

Another opinion of mine, If your system is already in Java and might interest in implementing Kafka, Then Kafka’s streaming platform might be a consideration to think about.

Yeah, the Streaming Platform is much pricier than the Messaging system. But seeing how it works, makes me really think about how much the possibilities ahead by implementing everything as a stream, not just as a singular record of messages.

As always, we have an opening at Tokopedia.
We are an Indonesian technology company with a mission to democratize commerce through technology and help everyone achieve more.
Find your Dream Job with us in Tokopedia!
https://www.tokopedia.com/careers/

https://kafka.apache.org/
https://nsq.io/