Building GPT with Private Data: Unlocking the Power of Secure Generative AI

Published in

Generative AI

4 min readMay 22, 2023

Earlier, I wrote about GPT4all. It is an ecosystem of open-source chatbots trained on a massive collection of clean assistant data, including code, stories, and dialogue. Generative AI ecosystem is changing every day. After my previous blog on building a chatbot using private data, I started working on building the same chatbot without an Open API key. I came across the private GPT last week. To start with, it is not production-ready, and I found many bugs and encountered installation issues. Nevertheless, this is definitely the future (well…for a few more months), and many corporations may want to create agents, and chatbots without their data being exposed to the internet.

Private GPT uses LangChain, GPT4All, LlamaCpp, Chroma, and SentenceTransformers.

What is LangChain? At its core, LangChain is a framework built around Large language models (LLMs). We can use it for chatbots, agents, Generative question and answering, summarization, and memory

What is GPT4All? It is an ecosystem of open-source chatbots trained on a massive collection of clean assistant data, including code, stories, and dialogue. You build open-source assistant-style large language models that run locally on your CPU

What is Llamacpp? The main goal of the llama.cpp is to run the LLaMA model using 4-bit integer quantization on a MacBook. Plain C/C++ implementation without dependencies.

What is Chroma? It is an open-source embedding database.
The fastest way to build Python or JavaScript LLM apps with memory.

What is Sentence Transformers? SentenceTransformers is a Python framework for state-of-the-art sentence, text, and image embeddings. You can use this framework to compute sentence/text embeddings for more than 100 languages.

Now let’s quickly do hands-on. As I mentioned earlier, the PrivateGPT repo is not stable and changes every day. Your installation may not be smooth. I am using a Mac M1 machine with 32 GB RAM. If you are using different configurations, speed, and memory may have some impact.

Step1: Go to Github repo https://github.com/imartinez/privateGPT and click on Download Zip to download the code. It will download “privateGPT-main.zip”

Unzip the file and you will see a folder

Step2: Create a folder called “models” and download the default model ggml-gpt4all-j-v1.3-groovy.bin into the folder.

Step3: Rename example.env to just .env

Step4: Now go to the source_document folder. You will find state_of_the_union.txt. By default, your agent will run on this text file. Let’s first test this. You can find this speech here

Now let’s run this without making any changes. Run below

python3 ingest.py

You will see something like above

Step 5. Now we are ready to fly

python3 privateGPT.py

My question was “What are Nato nations?”. Below is answer

Step 6: Now let’s try with a PDF document. Head to https://en.wikipedia.org/wiki/India. On the right side, click on Download as PDF and save India.pdf in the source_documents folder

Now again run ingest.py

and then python3 privateGPT.py

That's all for now. This is very basic at this time, and the repo has multiple open issues. I am confident, it will get better with time.

Stay updated with the latest news and updates in the creative AI space — follow the Generative AI publication.

Building GPT with Private Data: Unlocking the Power of Secure Generative AI

Written by Ravindra Elicherla