Streams and how they fit into Node.js async nature.

Introduction to Streams in Node.js and performance testing.

Nazarii Romankiv
Level Up Coding

--

As you already know from a previous article, the fundamental idea of an asynchronous server is to process tiny pieces of work one by one. Streams ideally represent this concept, so let’s demystify streams in Node.js.

What are streams in Node.js?

If we remove the details a stream is just a buffer of data that is either used to consume or produce data in small pieces.

A diagram that represents a stream on a high-level

So what are the benefits of streams? To answer this question let’s talk about the different types of streams that we have, and the first one is…

Readable stream

Readable streams are created to “read” the data from a source. We can also say that a Readable stream is a data producer.

A readable stream has two modes of work, paused and flowing. In the first mode, the stream will return data only when you ask it to do so, while in the latter one it will constantly emit data events with chunks of data. Below you can see a state machine diagram of a readable stream.

State machine diagram for a readable stream.

As you can see the stream by default is in a paused state, and it transforms into a flowing state as soon as you subscribe to the data event. Then you can pause it via pause method.

You may wonder why we need two modes. We will return to this topic later in this article.

Let’s jump into performance testing of a server, we will compare two servers. One will be using an async file read to serve the big file (220 MB) to the client…

…and another one will be using readable streams to server the file to the client.

As always I will be using wrk benchmarking, we will start with a server that reads a full file. We will load the server with 100 concurrent connections for 30 seconds.

As you can see the server can handle only 0.17 requests per second, and what is more important is that the server consumed up to 20 GB of RAM at its peak.

Memory consumptions of the server on peak.

Now let’s test the server that uses streams. As you can see in the image below, the server can serve 13 requests per second on average.

The results of testing of streams server.

The server that uses streams used only 200 MB of RAM on average. We were able to reduce the RAM consumption by almost 100 times! This means you can put more instances of your application on the same machine, and handle more requests.

The memory comsumption of the streams server

It is worth reminding you that one of the ways to handle blockers in the async server is to use partition, as you know from the previous article. And streams are a great implementation of the partitioning approach.

Since there is a producer there should be something that can consume that data and this is why we need…

Writable stream

If a readable stream is a producer, a writable stream is a consumer. As with any other stream, it has an internal buffer, and this internal buffer is used to save the pieces of data that are “pending” to be sent to the target storage where we want to put that data.

The writable stream also has different states, and you can see them represented in the diagram below.

The stream is created with an empty buffer and a producer can push the data to the stream, but when the buffer is full the write() method will return false to you and you need to wait until the buffer will become empty and emit drain event.

Backpressure

Have you ever wondered what will happen if a readable stream produces more data than a writable stream can process? Yes, as we already know the stream has an inner buffer so the data will be stored in the buffer, but what will happen if the buffer is full?

JS is a very dynamic language, thus the stream will try to allocate more memory for its inner buffer to handle such a case. Though, this can cause an out-of-memory exception.

This is when backpressure comes in handy. It is a mechanism created to handle such cases. Do you remember the state machines of different streams? So the readable stream pushes the data into the writable stream until the buffer is full, then it enters paused mode. After that, it waits until the drain event to enter the flowing mode one more time.

As you can see the backpressure is the only reason why a readable stream needs two modes. In that way, we handle cases when a readable produces more data than a writable stream can handle.

You may wonder if you need to handle this each time you use streams, and the answer you don’t need to. There is a function called pipeline that handles this for you.

Now that we know about readable and writable streams it is time to talk about the third type of stream that is called…

Duplex stream

A duplex stream is a stream that represents both readable and writable streams at the same time. Since it acts as a readable and writable stream, internally it maintains two buffers, one for reading date and another one for writing the data.

Diagram of a duplex stream.

Such kinds of streams may be useful for bidirectional communication, for example, Socket from the net core module is a Duplex stream.

Transform stream is a specific case for Duplex stream. As the name implies the Transform stream modifies your data in flight while transferring it from one place to another. An example of a Transform stream can be any stream from the zlib core module, like Gzip class.

Summary

Node.js is great at processing small pieces of work, and Streams are the representation of that idea. Using streams can boost the performance of your Node.js server and significantly reduce RAM consumption.

Though you rarely implement custom streams in your day-to-day work, streams are involved in every application that we are building in that or another way.

You can find the source code used for this article in this GitHub repository.

What next?

Love Node.js? You can read my other articles about it

--

--