Build a Scalable Database with Typescript and IPFS
Introduction
This tutorial will use TypeScript to create a distributed filesystem database using Node.js. A distributed filesystem database allows us to store and retrieve data across multiple machines, providing reliability and increased performance.
Prerequisites
Make sure you have Node.js and Yarn installed.
Setting up the Project
Create a new project directory and run yarn init -y
to initialize a new Node.js project. Install the required dependencies with yarn add typescript @types/node @types/express
.
Configuring TypeScript
Run tsc --init
to create a tsconfig.json
file with the default TypeScript configuration. Update the file with the following options:
{
"compilerOptions": {
"target": "es2017",
"outDir": "dist",
"rootDir": "src",
"strict": true,
"esModuleInterop": true
}
}
Defining the Data Model
First, define an enum
for the different types of nodes in our filesystem:
type NodeType = 'FILE' | 'DIRECTORY';
Next, define the Node
type:
type Node = {
id: string;
type: NodeType;
name: string;
parent?: string;
children?: string[];
content?: string;
};
This type defines the following fields for a node:
id
: a unique identifier for the nodetype
: the type of the node (eitherFILE
orDIRECTORY
)name
: the name of the nodeparent
: the id of the parent node (optional)children
: an array of the ids of the children nodes (optional)content
: the content of the node, if it is a file (optional)
Implementing the Database
To implement our database, we will define a series of functions that take the current state of the database and return a new state. This allows us to use the principles of immutability and pure functions to create a more predictable and maintainable database.
First, let’s define a function for adding a node to the database:
const addNode = (state: Map<string, Node>, node: Node) => {
return new Map(state).set(node.id, node);
};
Next, let’s define a function for removing a node from the database:
const removeNode = (state: Map<string, Node>, id: string) => {
return new Map(state).delete(id);
};
We can also define a function for updating a node in the database:
const updateNode = (state: Map<string, Node>, node: Node) => {
return new Map(state).set(node.id, node);
};
It can be useful to have a function for retrieving a single node from the database:
const getNode = (state: Map<string, Node>, id: string) => {
return state.get(id);
};
We can also define a function for retrieving the children of a given node:
const getChildren = (state: Map<string, Node>, id: string) => {
const node = state.get(id);
if (!node || !node.children) return [];
return node.children.map(childId => state.get(childId)).filter(Boolean);
};
Choosing the Storage
Now that we have implemented the functions for manipulating the nodes in our database, we need a way to store the database itself.
There are several options to consider when choosing the storage for our database:
- Local filesystem
- Cloud storage bucket (e.g. Amazon S3)
- Distributed file storage system (e.g. IPFS)
For this tutorial, we will choose IPFS as our storage solution. IPFS (InterPlanetary File System) is a peer-to-peer network for sharing and storing files and can be used to create a distributed filesystem.
Integrating with IPFS
To use IPFS, we will install the ipfs-http-client library and use its API to store and retrieve files from the IPFS network.
First, install the library with yarn add ipfs-http-client
.
Next, create a new file src/ipfs.ts
and import the library:
import IPFS from 'ipfs-http-client';
Then, create a new instance of the IPFS
class and connect to the IPFS network:
const ipfs = new IPFS({ host: 'ipfs.infura.io', port: 5001, protocol: 'https' });
Now we can use the ipfs
instance to interact with the IPFS network.
Storing the Database
To store the database in IPFS, we will convert the database to a JSON object and store it as a file in IPFS.
First, define a function to convert the database to a JSON object:
const serializeDatabase = (state: Map<string, Node>) => {
const nodes = Array.from(state.values());
const json = JSON.stringify(nodes);
return json;
};
Next, define a function to store the database in IPFS:
const storeDatabase = async (state: Map<string, Node>) => {
const json = serializeDatabase(state);
const buffer = Buffer.from(json);
const response = await ipfs.add(buffer);
const hash = response[0].hash;
return hash;
};
This function converts the database to a JSON string, creates a buffer from the string, and stores the buffer in IPFS using the add
method of the ipfs
instance. The returned hash is the unique identifier of the stored file in IPFS.
Retrieving the Database
To retrieve the database from IPFS, we will retrieve the file using its hash and convert it back to a Map.
First, define a function to retrieve the file from IPFS:
const retrieveDatabase = async (hash: string) => {
const file = await ipfs.cat(hash);
const json = file.toString();
return json;
};
This function retrieves the file from IPFS using the cat
method of the ipfs
instance and converts it to a string.
Next, define a function to convert the JSON string to a Map:
const deserializeDatabase = (json: string) => {
const nodes = JSON.parse(json);
const state = new Map(nodes.map(node => [node.id, node]));
return state;
};
These functions allow us to store and retrieve the database from IPFS.
Adding Query Support
To enable querying the database, we will define a function that takes a predicate function and returns an array of nodes that match the predicate.
First, define the Query
type:
type Query = (node: Node) => boolean;
This type represents a function that takes a Node
and returns a boolean indicating whether the node matches the query.
Next, define the query
function:
const query = (state: Map<string, Node>, predicate: Query) => {
const nodes = Array.from(state.values());
const matches = nodes.filter(predicate);
return matches;
};
This function takes the current state of the database and a predicate function and returns an array of nodes that match the predicate.
For example, we can use the query
function to retrieve all nodes with a given name:
const nameQuery = (name: string) => (node: Node) => node.name === name;
const nodesWithName = query(state, nameQuery('my-node'));
We can also use the query
function to retrieve all nodes of a given type:
const typeQuery = (type: NodeType) => (node: Node) => node.type === type;
const fileNodes = query(state, typeQuery('FILE'));
const directoryNodes = query(state, typeQuery('DIRECTORY'));
The query
function allows us to retrieve nodes from the database based on arbitrary criteria flexibly.
Usage Guide
Now that we have implemented the functions for manipulating and querying the database, let’s see how to use them in a Node.js application.
First, import the functions and types from the database
module:
import { Node, NodeType, addNode, removeNode, updateNode, getNode, getChildren, query } from './database';
Next, create an instance of the Map
class to represent the state of the database:
const state = new Map<string, Node>();
Now we can use the functions to manipulate and query the database:
// Add a node to the database
const node: Node = { id: '1', type: 'FILE', name: 'my-file', content: 'Hello, world!' };
addNode(state, node);
// Remove a node from the database
removeNode(state, '1');
// Update a node in the database
updateNode(state, { ...node, content: 'Updated content' });
// Retrieve a single node from the database
const retrievedNode = getNode(state, '1');
// Retrieve the children of a node
const children = getChildren(state, '1');
// Query the database
const nameQuery = (name: string) => (node: Node) => node.name === name;
const nodesWithName = query(state, nameQuery('my-node'));
These are the basic operations for manipulating and querying the database.
Read Optimization and Indexes
It can be useful to optimize read performance by creating indexes on the database. An index is a data structure that allows for faster lookup of specific nodes in the database.
For example, we can create an index on the name
field of the Node
type:
const nameIndex = new Map<string, string[]>();
const updateNameIndex = (state: Map<string, Node>) => {
nameIndex.clear();
state.forEach((node, id) => {
const name = node.name;
if (!nameIndex.has(name)) {
nameIndex.set(name, []);
}
nameIndex.get(name)!.push(id);
});
};
updateNameIndex(state);
This index allows us to quickly retrieve all nodes with a given name by looking up the name in the nameIndex
map.
We can also create an index on the parent
field to allow for faster lookup of the children of a given node:
const parentIndex = new Map<string, string[]>();
const updateParentIndex = (state: Map<string, Node>) => {
parentIndex.clear();
state.forEach((node, id) => {
if (!node.parent) return;
if (!parentIndex.has(node.parent)) {
parentIndex.set(node.parent, []);
}
parentIndex.get(node.parent)!.push(id);
});
};
updateParentIndex(state);
These indexes allow us to retrieve specific nodes from the database.
Creating a DB Server and Adding RPC Endpoints
To make it easier to use the database in a distributed environment, we can create a server that exposes Remote Procedure Call (RPC) endpoints for manipulating and querying the database. We can use a library like JSON-RPC to handle the RPC protocol.
First, install the JSON-RPC library with yarn add @ersinfotech/json-rpc
.
Then, create a new instance of the Server
class and start listening for connections:
const server = new Server();
server.listen(3000);
Now we can define RPC endpoints for manipulating and querying the database. For example, here is an endpoint for adding a node to the database:
server.method('addNode', (node: Node) => {
addNode(state, node);
});
This endpoint takes a Node
as an argument and adds it to the database using the addNode
function from the database
module.
We can define similar endpoints for the other database functions, such as removeNode
, updateNode
, getNode
, getChildren
, and query
.
With these endpoints, clients can easily manipulate and query the database over a network using the JSON-RPC protocol.
Adding WebSocket and REST API Endpoints
In addition to RPC endpoints, we can expose the database functions over WebSocket and REST API protocols. This can be useful for connecting to the database from web browsers or mobile applications.
To add WebSocket and REST API endpoints, we can use a library like Express to handle the HTTP protocol.
First, install the Express library with yarn add express
.
Next, create a new file src/api.ts
and import the library:
import express, { Router, Request, Response } from 'express';
Then, create a new instance of the Router
class and define the endpoints:
const router = Router();
router.post('/addNode', (req: Request, res: Response) => {
const node = req.body as Node;
addNode(state, node);
res.send();
});
router.post('/removeNode', (req: Request, res: Response) => {
const id = req.body.id as string;
removeNode(state, id);
res.send();
});
// define similar endpoints for updateNode, getNode, getChildren, and query
To add the endpoints to the Express app, create a new instance of the Express
class and use the use
method to attach the router:
const app = express();
app.use(express.json());
app.use('/api', router);
app.listen(3001);
Adding WebSocket Endpoints
To add WebSocket endpoints, we can use a library like ws to handle the WebSocket protocol.
First, install the ws
library with yarn add ws
.
Next, create a new file src/websocket.ts
and import the library:
import WebSocket from 'ws';
const wss = new WebSocket.Server({ port: 3002 });
wss.on('connection', (ws) => {
ws.on('message', (message) => {
const { method, params } = JSON.parse(message);
switch (method) {
case 'addNode':
addNode(state, params as Node);
break;
case 'removeNode':
removeNode(state, params.id as string);
break;
// define similar cases for updateNode, getNode, getChildren, and query
}
});
});
These endpoints use the message
event and accept and return JSON data.
With these endpoints, clients can easily manipulate and query the database using WebSocket connections.
End-to-End Encryption
To ensure the privacy and security of the data in our database, we can add end-to-end encryption. This means the data is encrypted before it is stored in IPFS and can only be decrypted by the client that encrypted it.
We can use a library like libsodium-wrappers to handle the encryption.
First, install the library with yarn add libsodium-wrappers
.
Next, update the serializeDatabase
function to encrypt the data before converting it to a JSON object:
import { randomBytes, secretbox } from 'libsodium-wrappers';
const serializeDatabase = (state: Map<string, Node>, key: Uint8Array) => {
const nodes = Array.from(state.values());
const encryptedNodes = nodes.map((node) => ({
...node,
content: node.content ? secretbox(node.content, randomBytes(secretbox.nonceLength), key) : undefined,
encrypted: true,
}));
const json = JSON.stringify(encryptedNodes);
return json;
};
This function encrypts the content
field of each node using a new nonce and the given key
, and converts the encrypted nodes to a JSON string.
We can also update the deserializeDatabase
function to decrypt the data after converting it from a JSON object:
const deserializeDatabase = (json: string, key: Uint8Array) => {
const encryptedNodes = JSON.parse(json) as EncryptedNode[];
const nodes = encryptedNodes.map((node) => ({
...node,
content: node.encrypted ? secretbox.open(node.content as Uint8Array, nonce as Uint8Array, key) : node.content,
}));
const state = new Map<string, Node>();
nodes.forEach((node) => state.set(node.id, node));
return state;
};
We can also update the storeDatabase
and retrieveDatabase
functions to use the serializeDatabase
and deserializeDatabase
functions with the encryption key:
const storeDatabase = async (state: Map<string, Node>, key: Uint8Array) => {
const json = serializeDatabase(state, key);
const buffer = Buffer.from(json);
const response = await ipfs.add(buffer);
const hash = response[0].hash;
return hash;
};
const retrieveDatabase = async (hash: string, key: Uint8Array) => {
const file = await ipfs.cat(hash);
const json = file.toString();
const state = deserializeDatabase(json, key);
return state;
};
Now, the client can use the storeDatabase
and retrieveDatabase
functions to store and retrieve the database, respectively, with end-to-end encryption. The client should provide the same key
when calling both functions to encrypt and decrypt the data properly.
Conclusion
In this tutorial, we have seen how to create a distributed filesystem database using TypeScript. We have used IPFS as the storage solution and implemented functions for manipulating and querying the database. We have also seen how to create indexes to optimize read performance and how to write unit tests to ensure the correctness of the implementation.
Level Up Coding
Thanks for being a part of our community! Before you go:
- 👏 Clap for the story and follow the author 👉
- 📰 View more content in the Level Up Coding publication
- 🔔 Follow us: Twitter | LinkedIn | Newsletter
🚀👉 Join the Level Up talent collective and find an amazing job