Build a Scalable Database with Typescript and IPFS

Published in

Level Up Coding

10 min readJan 10, 2023

Introduction

This tutorial will use TypeScript to create a distributed filesystem database using Node.js. A distributed filesystem database allows us to store and retrieve data across multiple machines, providing reliability and increased performance.

Prerequisites

Make sure you have Node.js and Yarn installed.

Setting up the Project

Create a new project directory and run yarn init -y to initialize a new Node.js project. Install the required dependencies with yarn add typescript @types/node @types/express.

Configuring TypeScript

Run tsc --init to create a tsconfig.json file with the default TypeScript configuration. Update the file with the following options:

{
  "compilerOptions": {
    "target": "es2017",
    "outDir": "dist",
    "rootDir": "src",
    "strict": true,
    "esModuleInterop": true
  }
}

Defining the Data Model

First, define an enum for the different types of nodes in our filesystem:

type NodeType = 'FILE' | 'DIRECTORY';

Next, define the Node type:

type Node = {
  id: string;
  type: NodeType;
  name: string;
  parent?: string;
  children?: string[];
  content?: string;
};

This type defines the following fields for a node:

id: a unique identifier for the node
type: the type of the node (either FILE or DIRECTORY)
name: the name of the node
parent: the id of the parent node (optional)
children: an array of the ids of the children nodes (optional)
content: the content of the node, if it is a file (optional)

Implementing the Database

To implement our database, we will define a series of functions that take the current state of the database and return a new state. This allows us to use the principles of immutability and pure functions to create a more predictable and maintainable database.

First, let’s define a function for adding a node to the database:

const addNode = (state: Map<string, Node>, node: Node) => {
  return new Map(state).set(node.id, node);
};

Next, let’s define a function for removing a node from the database:

const removeNode = (state: Map<string, Node>, id: string) => {
  return new Map(state).delete(id);
};

We can also define a function for updating a node in the database:

const updateNode = (state: Map<string, Node>, node: Node) => {
  return new Map(state).set(node.id, node);
};

It can be useful to have a function for retrieving a single node from the database:

const getNode = (state: Map<string, Node>, id: string) => {
  return state.get(id);
};

We can also define a function for retrieving the children of a given node:

const getChildren = (state: Map<string, Node>, id: string) => {
  const node = state.get(id);
  if (!node || !node.children) return [];
  return node.children.map(childId => state.get(childId)).filter(Boolean);
};

Choosing the Storage

Now that we have implemented the functions for manipulating the nodes in our database, we need a way to store the database itself.

There are several options to consider when choosing the storage for our database:

Local filesystem
Cloud storage bucket (e.g. Amazon S3)
Distributed file storage system (e.g. IPFS)

For this tutorial, we will choose IPFS as our storage solution. IPFS (InterPlanetary File System) is a peer-to-peer network for sharing and storing files and can be used to create a distributed filesystem.

Integrating with IPFS

To use IPFS, we will install the ipfs-http-client library and use its API to store and retrieve files from the IPFS network.

First, install the library with yarn add ipfs-http-client.

Next, create a new file src/ipfs.ts and import the library:

import IPFS from 'ipfs-http-client';

Then, create a new instance of the IPFS class and connect to the IPFS network:

const ipfs = new IPFS({ host: 'ipfs.infura.io', port: 5001, protocol: 'https' });

Now we can use the ipfs instance to interact with the IPFS network.

Storing the Database

To store the database in IPFS, we will convert the database to a JSON object and store it as a file in IPFS.

First, define a function to convert the database to a JSON object:

const serializeDatabase = (state: Map<string, Node>) => {
  const nodes = Array.from(state.values());
  const json = JSON.stringify(nodes);
  return json;
};

Next, define a function to store the database in IPFS:

const storeDatabase = async (state: Map<string, Node>) => {
  const json = serializeDatabase(state);
  const buffer = Buffer.from(json);
  const response = await ipfs.add(buffer);
  const hash = response[0].hash;
  return hash;
};

This function converts the database to a JSON string, creates a buffer from the string, and stores the buffer in IPFS using the add method of the ipfs instance. The returned hash is the unique identifier of the stored file in IPFS.

Retrieving the Database

To retrieve the database from IPFS, we will retrieve the file using its hash and convert it back to a Map.

First, define a function to retrieve the file from IPFS:

const retrieveDatabase = async (hash: string) => {
  const file = await ipfs.cat(hash);
  const json = file.toString();
  return json;
};

This function retrieves the file from IPFS using the cat method of the ipfs instance and converts it to a string.

Next, define a function to convert the JSON string to a Map:

const deserializeDatabase = (json: string) => {
  const nodes = JSON.parse(json);
  const state = new Map(nodes.map(node => [node.id, node]));
  return state;
};

These functions allow us to store and retrieve the database from IPFS.

Adding Query Support

To enable querying the database, we will define a function that takes a predicate function and returns an array of nodes that match the predicate.

First, define the Query type:

type Query = (node: Node) => boolean;

This type represents a function that takes a Node and returns a boolean indicating whether the node matches the query.

Next, define the query function:

const query = (state: Map<string, Node>, predicate: Query) => {
  const nodes = Array.from(state.values());
  const matches = nodes.filter(predicate);
  return matches;
};

This function takes the current state of the database and a predicate function and returns an array of nodes that match the predicate.

For example, we can use the query function to retrieve all nodes with a given name:

const nameQuery = (name: string) => (node: Node) => node.name === name;
const nodesWithName = query(state, nameQuery('my-node'));

We can also use the query function to retrieve all nodes of a given type:

const typeQuery = (type: NodeType) => (node: Node) => node.type === type;
const fileNodes = query(state, typeQuery('FILE'));
const directoryNodes = query(state, typeQuery('DIRECTORY'));

The query function allows us to retrieve nodes from the database based on arbitrary criteria flexibly.

Usage Guide

Now that we have implemented the functions for manipulating and querying the database, let’s see how to use them in a Node.js application.

First, import the functions and types from the database module:

import { Node, NodeType, addNode, removeNode, updateNode, getNode, getChildren, query } from './database';

Next, create an instance of the Map class to represent the state of the database:

const state = new Map<string, Node>();

Now we can use the functions to manipulate and query the database:

// Add a node to the database
const node: Node = { id: '1', type: 'FILE', name: 'my-file', content: 'Hello, world!' };
addNode(state, node);

// Remove a node from the database
removeNode(state, '1');

// Update a node in the database
updateNode(state, { ...node, content: 'Updated content' });

// Retrieve a single node from the database
const retrievedNode = getNode(state, '1');

// Retrieve the children of a node
const children = getChildren(state, '1');

// Query the database
const nameQuery = (name: string) => (node: Node) => node.name === name;
const nodesWithName = query(state, nameQuery('my-node'));

These are the basic operations for manipulating and querying the database.

Read Optimization and Indexes

It can be useful to optimize read performance by creating indexes on the database. An index is a data structure that allows for faster lookup of specific nodes in the database.

For example, we can create an index on the name field of the Node type:

const nameIndex = new Map<string, string[]>();

const updateNameIndex = (state: Map<string, Node>) => {
  nameIndex.clear();
  state.forEach((node, id) => {
    const name = node.name;
    if (!nameIndex.has(name)) {
      nameIndex.set(name, []);
    }
    nameIndex.get(name)!.push(id);
  });
};

updateNameIndex(state);

This index allows us to quickly retrieve all nodes with a given name by looking up the name in the nameIndex map.

We can also create an index on the parent field to allow for faster lookup of the children of a given node:

const parentIndex = new Map<string, string[]>();

const updateParentIndex = (state: Map<string, Node>) => {
  parentIndex.clear();
  state.forEach((node, id) => {
    if (!node.parent) return;
    if (!parentIndex.has(node.parent)) {
      parentIndex.set(node.parent, []);
    }
    parentIndex.get(node.parent)!.push(id);
  });
};

updateParentIndex(state);

These indexes allow us to retrieve specific nodes from the database.

Creating a DB Server and Adding RPC Endpoints

To make it easier to use the database in a distributed environment, we can create a server that exposes Remote Procedure Call (RPC) endpoints for manipulating and querying the database. We can use a library like JSON-RPC to handle the RPC protocol.

First, install the JSON-RPC library with yarn add @ersinfotech/json-rpc.

Then, create a new instance of the Server class and start listening for connections:

const server = new Server();
server.listen(3000);

Now we can define RPC endpoints for manipulating and querying the database. For example, here is an endpoint for adding a node to the database:

server.method('addNode', (node: Node) => {
  addNode(state, node);
});

This endpoint takes a Node as an argument and adds it to the database using the addNode function from the database module.

We can define similar endpoints for the other database functions, such as removeNode, updateNode, getNode, getChildren, and query.

With these endpoints, clients can easily manipulate and query the database over a network using the JSON-RPC protocol.

Adding WebSocket and REST API Endpoints

In addition to RPC endpoints, we can expose the database functions over WebSocket and REST API protocols. This can be useful for connecting to the database from web browsers or mobile applications.

To add WebSocket and REST API endpoints, we can use a library like Express to handle the HTTP protocol.

First, install the Express library with yarn add express.

Next, create a new file src/api.ts and import the library:

import express, { Router, Request, Response } from 'express';

Then, create a new instance of the Router class and define the endpoints:

const router = Router();

router.post('/addNode', (req: Request, res: Response) => {
  const node = req.body as Node;
  addNode(state, node);
  res.send();
});

router.post('/removeNode', (req: Request, res: Response) => {
  const id = req.body.id as string;
  removeNode(state, id);
  res.send();
});

// define similar endpoints for updateNode, getNode, getChildren, and query

To add the endpoints to the Express app, create a new instance of the Express class and use the use method to attach the router:

const app = express();
app.use(express.json());
app.use('/api', router);
app.listen(3001);

Adding WebSocket Endpoints

To add WebSocket endpoints, we can use a library like ws to handle the WebSocket protocol.

First, install the ws library with yarn add ws.

Next, create a new file src/websocket.ts and import the library:

import WebSocket from 'ws';

const wss = new WebSocket.Server({ port: 3002 });

wss.on('connection', (ws) => {
  ws.on('message', (message) => {
    const { method, params } = JSON.parse(message);
    switch (method) {
      case 'addNode':
        addNode(state, params as Node);
        break;
      case 'removeNode':
        removeNode(state, params.id as string);
        break;
      // define similar cases for updateNode, getNode, getChildren, and query
    }
  });
});

These endpoints use the message event and accept and return JSON data.

With these endpoints, clients can easily manipulate and query the database using WebSocket connections.

End-to-End Encryption

To ensure the privacy and security of the data in our database, we can add end-to-end encryption. This means the data is encrypted before it is stored in IPFS and can only be decrypted by the client that encrypted it.

We can use a library like libsodium-wrappers to handle the encryption.

First, install the library with yarn add libsodium-wrappers.

Next, update the serializeDatabase function to encrypt the data before converting it to a JSON object:

import { randomBytes, secretbox } from 'libsodium-wrappers';

const serializeDatabase = (state: Map<string, Node>, key: Uint8Array) => {
  const nodes = Array.from(state.values());
  const encryptedNodes = nodes.map((node) => ({
    ...node,
    content: node.content ? secretbox(node.content, randomBytes(secretbox.nonceLength), key) : undefined,
    encrypted: true,
  }));
  const json = JSON.stringify(encryptedNodes);
  return json;
};

This function encrypts the content field of each node using a new nonce and the given key, and converts the encrypted nodes to a JSON string.

We can also update the deserializeDatabase function to decrypt the data after converting it from a JSON object:

const deserializeDatabase = (json: string, key: Uint8Array) => {
  const encryptedNodes = JSON.parse(json) as EncryptedNode[];
  const nodes = encryptedNodes.map((node) => ({
    ...node,
    content: node.encrypted ? secretbox.open(node.content as Uint8Array, nonce as Uint8Array, key) : node.content,
  }));
  const state = new Map<string, Node>();
  nodes.forEach((node) => state.set(node.id, node));
  return state;
};

We can also update the storeDatabase and retrieveDatabase functions to use the serializeDatabase and deserializeDatabase functions with the encryption key:

const storeDatabase = async (state: Map<string, Node>, key: Uint8Array) => {
  const json = serializeDatabase(state, key);
  const buffer = Buffer.from(json);
  const response = await ipfs.add(buffer);
  const hash = response[0].hash;
  return hash;
};

const retrieveDatabase = async (hash: string, key: Uint8Array) => {
  const file = await ipfs.cat(hash);
  const json = file.toString();
  const state = deserializeDatabase(json, key);
  return state;
};

Now, the client can use the storeDatabase and retrieveDatabase functions to store and retrieve the database, respectively, with end-to-end encryption. The client should provide the same key when calling both functions to encrypt and decrypt the data properly.

Conclusion

In this tutorial, we have seen how to create a distributed filesystem database using TypeScript. We have used IPFS as the storage solution and implemented functions for manipulating and querying the database. We have also seen how to create indexes to optimize read performance and how to write unit tests to ensure the correctness of the implementation.

Level Up Coding

Thanks for being a part of our community! Before you go:

👏 Clap for the story and follow the author 👉
📰 View more content in the Level Up Coding publication
🔔 Follow us: Twitter | LinkedIn | Newsletter

🚀👉 Join the Level Up talent collective and find an amazing job

Build a Scalable Database with Typescript and IPFS

Introduction

Prerequisites

Setting up the Project

Configuring TypeScript

Defining the Data Model

Implementing the Database

Choosing the Storage

Integrating with IPFS

Storing the Database

Retrieving the Database

Adding Query Support

Usage Guide

Read Optimization and Indexes

Creating a DB Server and Adding RPC Endpoints

Adding WebSocket and REST API Endpoints

Adding WebSocket Endpoints

End-to-End Encryption

Conclusion

Level Up Coding

Written by Radovan Stevanovic