Building Vector Databases with FastAPI and ChromaDB

Beginners guide to ChromaDB vectorstores and FastAPI with Langchain

Published in

Level Up Coding

10 min readMay 7, 2024

Returning to writing after a lengthy break, I’m finally carving out some time to dive into it. With the amount of innovations and new AI tech popping up, I decided to take a step back and explore some of the fundamentals now. After getting to work around APIs at my work and with Python being my language of choice, I felt that it was a good time to explore some API development frameworks for Python.

While studying for my final exams, as usual I got distracted by YouTube recommendations and came across a video by Travis Media about FastAPI. Without hesitation, I forgot about my studies and started exploring FastAPI (yeah, a CS undergrad student in his senior year still hasn’t been taught how to develop and deploy an app from scratch).

For all those who are starting out with backend development in Python, there are mainly 3 frameworks available:

Flask
FastAPI
Django

Django is a full-stack web framework that is catered towards building a complete web app from the scratch whereas Flask and FastAPI are micro-web framework that are ideal for developing smaller applications and web APIs.

Why did I choose FastAPI over Flask?

To be honest, I’m not entirely sure. I’ve come across information suggesting that FastAPI offers a more API-development friendly environment, with features that we’ll explore further in this article. Being someone who tends to be indecisive, I felt it was better to simply choose one and dive in, rather than endlessly deliberating over technologies and wasting time. So, without further delay, let’s jump right into it.

What is FastAPI?

“FastAPI is a modern, fast (high-performance), web framework for building APIs with Python based on standard Python type hints.” — From the FastAPI documentation

Some features of FastAPI are:

Fast
Intuitive
Easy to Code
Standards-based: Compatible with OpenAPI and JSON Schema.
Automatic Docs
Based on and compatible with Pydantic

“No brainfuck” — From the FastAPI documentation

Among the listed features, only two really grabbed my attention: its ease of coding (perfect for my lazy tendencies) and its foundation built on top of Pydantic.

Pydantic is a data-validation library that allows you to declare schemas using classes and inheritance. Its main advantage lies in its built-in features for type safety, enforcing your data to conform to the schema.

A basic Pydantic code looks like:

from pydantic import BaseModel

#extending the BaseModel
class User(BaseModel):
    user: str
    age: int

user = User(user="Om",age="21")
print(user)

Even though we provided the model with age in string format, Pydantic automatically typecasts it to an integer. This is one of the advantages of using Pydantic over the built-in Python classes.

Setting up FastAPI

Setting up FastAPI is pretty simple and requires just a pip installation using pip install fastapi

Once you have got FastAPI installed, you can test it out using this sample code:

from fastapi import FastAPI

app = FastAPI()


@app.get("/")
async def root():
    return {"message": "Whatchamacallit"}

app = FastAPI(): This line creates an instance of the FastAPI class and assigns it to the variable app. This instance represents your FastAPI application.
@app.get("/"): This is a decorator syntax in Python, used to define a route for handling HTTP GET requests to the root URL ("/") of your API. Decorators are a way to modify or extend the behaviour of functions or methods. The @app.get decorator indicates that the following function (root()) will handle GET requests to the specified route. To simplify, you can think of it as a way of mapping a specific URL endpoint to a Python function that will handle requests made to that endpoint.
async def root():: This line defines a function named root using the async keyword, indicating that it is an asynchronous function. This function will handle requests to the root URL ("/") of your API.

To run the server (FastAPI uses uvicorn), ensure that you save the file as main.py, as it will be referenced by the command fastapi dev main.py. This command should start a localhost server that you can use for testing.

To test the code, we will use an HTTP client like Postman or HTTPie. You can choose whichever you prefer. HTTPie’s minimalist approach appeals to me more, but Postman is packed with more API-testing features. Won’t be diving into it in this article.

Building a real-world API using FastAPI

Well, I don’t want to bore you with the same old tutorial of going over all the features sequentially. Instead, we will be building something interesting yet simple using Langchain, ChromaDB, and FastAPI. We will build an API that creates and deletes a vector database and fetches relevant chunks from a PDF document using semantic search.

To explain in short:

Langchain: An open-source framework that helps developers build applications using large language models (LLMs). It contains all the required LLM tools as built-in functions for convenient development.
ChromaDB: An open-source vector database to store all the word embeddings / chunks.

Vector Databases

If you’re wondering about the purpose of vector databases, they’re incredibly powerful and play a major role in many AI startups that have emerged in the last two years. Vector databases are utilized to store embeddings, which are vector representations of textual data that capture the meaning of the content. This enables various operations such as fetching data without the need for formal querying or keyword matches. Data from vector databases is retrieved through similarity search, which mainly employs either of these two techniques:

K-Nearest Neighbours: This involves calculating the distance between each vector using methods like Euclidean, Manhattan distances, or cosine similarity.
Approximate Nearest Neighbour Search: Instead of computing distances between each vector in the database, we retrieve a “good guess” of the nearest neighbour.

To delve deeper into similarity search, I recommend reading this well-written article by Rajat Tripathi.

The flow

Chunking the PDF Document using Langchain.
Generating word embeddings for the chunks using an open-source embedding model.
Uploading word embeddings to the vector database.
Fetching the nearest neighbouring chunks to the user query using similarity search.
Deleting the database.
Create endpoints for the functions in FastAPI.

Chunking the PDF Document using Langchain

Create a new file named functions.py which will contain all the endpoint methods we will be calling in the main app.

To chunk the PDF document, we will load the document using PyPDF. Langchain offers multiple document loader methods, with PyPDF being one of them.

from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader("files/samples.pdf")
pages = loader.load()

After loading the document, the next step is to chunk it. The primary purpose of chunking documents is to break them into contextually relevant segments that can be later fed into an LLM for Retrieval Augmented Generation (RAG). Langchain provides a variety of text splitters to choose from for chunking documents.

Types of text-splitters:

Recursive: Recursively splits text. Splitting text recursively serves the purpose of trying to keep related pieces of text next to each other. This is the recommended way to start splitting text.
HTML: Splits text based on HTML-specific characters.
Markdown: Splits text based on Markdown-specific characters.
Code: Splits text based on characters specific to coding languages. 15 different languages are available to choose from.
Token: Splits text on tokens. There exist a few different ways to measure tokens.
Character: Splits text based on a user defined character. One of the simpler methods.

For this application, to keep it simple, we’ll proceed with the RecursiveCharacterTextSplitter(). It’s the simplest and most effective text splitter for basic PDF or text documents. As we become familiar with the chunking process, we can experiment with different text splitters, but that’s a topic for a separate article. The RecursiveCharacterTextSplitter() splits chunks based on predefined separators such as ‘\n’ or whitespace, and the chunk size can be set as a parameter.

from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100,length_function=len,
        is_separator_regex=False)
chunks = text_splitter.split_documents(pages)

To understand the RecursiveCharacterTextSplitter() in detail, this is a really good article that covers all the aspects.

Generating word embeddings for the chunks using an open-source embedding model

Word-embedding models are trained using supervised learning on a large corpus of data, which helps the model assign similar vectors to words or chunks with similar meanings/context. This is the beauty of word embeddings, as they enable us to facilitate semantic search.

There are multiple embedding models available from OpenAI, Cohere, Google, etc. Since we’re not building a production-grade application, open-source models should suffice for our needs. Langchain provides a built-in function called SentenceTransformerEmbeddings(), which allows us to use the all-MiniLM-L6-v2 embedding model. This is a free open-source embedding model provided by sbert.net.

from langchain_community.embeddings.sentence_transformer import (
    SentenceTransformerEmbeddings,
)

embedding_function = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")

Uploading word embeddings to the vector database

Utilizing the Langchain ChromaDB library makes adding embeddings to the vector database a breeze. Only one line of code is needed to accomplish this task.

from langchain_chroma import Chroma

ids = [str(i) for i in range(1, len(chunks) + 1)]
Chroma.from_documents(pages, embedding_function, persist_directory="chroma_db", ids=ids)

We will be assigning IDs to the chunks to avoid duplication while adding the chunks to the database. persist_directory saves the vector database to the working directory which can be later loaded up while querying.

Fetching the nearest neighbouring chunks to the user query using similarity search

The Langchain ChromaDB library includes built-in similarity search functionality. At this point, just about everything is built into Langchain. This allows us to focus on building the application. The default similarity_search() function uses cosine similarity to retrieve the nearest neighbours. The k parameter will control the number of neighbours to fetch.

embedding_function = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")
db = Chroma(persist_directory="chroma_db",embedding_function=embedding_function)
results = db.similarity_search(query.query, k=query.neighbours)

There’s also another function called similarity_search_with_score() that retrieves the neighbours along with their cosine similarity scores. This can be useful if you’re curious or if you want to perform additional reranking operations on the chunks.

Deleting the database

I wasn’t able to find a specific function for deleting the entire database, so for this tutorial, I decided to implement it by deleting the persisted directory of the vector database using shutil. While not perfect, and although we could make use of collections in the vector database, I wanted to keep this tutorial simple, especially since it was my first time working with FastAPI too.

if "chroma_db" in os.listdir():
  shutil.rmtree("chroma_db")
  print(f"Deleted database and its contents.")
else:
  raise FileNotFoundError("Database not found.")

Create endpoints for the functions in FastAPI

Before diving into the exciting part of creating the API, let’s refactor and structure the code to create callable functions for the endpoints.

functions.py:

from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.embeddings.sentence_transformer import (
    SentenceTransformerEmbeddings,
)
from langchain_chroma import Chroma
import warnings
import shutil
import os

warnings.filterwarnings('ignore')

#Creating the database
def create_db():

    loader = PyPDFLoader("files/samples.pdf")
    pages = loader.load()


    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100,length_function=len,
        is_separator_regex=False)
    chunks = text_splitter.split_documents(pages)
    print(len(chunks))

    ids = [str(i) for i in range(1, len(chunks) + 1)]

    # create the open-source embedding function
    embedding_function = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")

    # Create the Chroma database with IDs
    Chroma.from_documents(pages, embedding_function, persist_directory="chroma_db", ids=ids)


#Deleting the database
def delete_persisted_db():
    if "chroma_db" in os.listdir():
        shutil.rmtree("chroma_db")
        print(f"Deleted database and its contents.")
    else:
        raise FileNotFoundError("Database not found.")

For mapping these functions to the endpoints, we will have to import the functions from functions.py. After importing we can edit the main.py file we created at the start of this article to include the new endpoints.

main.py:

from fastapi import FastAPI, HTTPException
from models import Query
from langchain_chroma import Chroma
from langchain_community.embeddings.sentence_transformer import (
    SentenceTransformerEmbeddings,
)
from functions import create_db, delete_persisted_db

app = FastAPI()


@app.get("/")
async def root():
    return {"message": "Whatchamacallit"}

#Create database
@app.get("/create/")
async def create_database():
    create_db()
    return {"message": "Database created."}

#Delete database
@app.delete("/delete/")
async def delete_database():
    try:
        delete_persisted_db()
        return {"message": "Database deleted."}
    except FileNotFoundError as e:
        raise HTTPException(status_code=404, detail=str(e))
    
#Fetch Chunks
@app.post("/neighbours/")
async def fetch_item(query: Query):
    embedding_function = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")
    db = Chroma(persist_directory="chroma_db",embedding_function=embedding_function)
    results = db.similarity_search(query.query, k=query.neighbours)
    return {"message": "Nearest neighbours found.", "results": results}

Another useful feature of FastAPI is its predefined error codes, which can be called directly from the FastAPI library and customized to suit your needs.

We will also need define the Pydantic model for the /neighbours endpoint query body.

models.py:

# Create a basic model for the FastAPI

from pydantic import BaseModel

class Query(BaseModel):
    query: str
    neighbours: int = 3

Outputs

Conclusion

That concludes this simple starter tutorial, which should provide both you and me with an understanding of the fundamental steps involved in building an app that utilizes an LLM at the backend, coupled with basic API development. FastAPI offers many more powerful features for authentication and even frontend development using FastUI, which I plan to explore in my future blogs. My only gripe while working with Langchain is the documentation, which can be quite complicated due to the abundance of functionalities available. It’s both a blessing and a curse. If you’re just starting with Langchain, it can feel quite intimidating, but remember to take it one step at a time

Exploring FastAPI and documenting the process has been quite enjoyable. I hope this article has been both informative and entertaining for you as a reader, just as it has been for me. If you have any suggestions or if you notice any mistakes, please feel free to share your feedback in the comments section.