Training Your Own LLM using privateGPT

Learn how to train your own language model without exposing your private data to the provider

Wei-Meng Lee
Level Up Coding
Published in
8 min readMay 19, 2023

--

Photo by Richard Bell on Unsplash

One of the major concerns of using public AI services such as OpenAI’s ChatGPT is the risk of exposing your private data to the provider. For commercial use, this remains the biggest concerns for companies considering adopting AI technologies.

Many times, you want to create your own language model that are trained on your set of data (such as sales insights, customers feedback, etc), but at the same time you do not want to expose all these sensitive data to a AI provider such as OpenAI. So the ideal way is to train your own LLM locally, without needing to upload your data to the cloud.

If your data is public and you don’t mind exposing them to ChatGPT, I have another article that shows how you can connect ChatGPT with your own data:

In this article, I will show you how you can use an open-source project called privateGPT to utilize an LLM so that it can answer questions (like ChatGPT) based on your custom training data, all without sacrificing the privacy of your data.

It is important to note that privateGPT is currently a proof-of-concept and is not production ready.

Downloading privateGPT

To try out privateGPT, you can go to GitHub using the following link: https://github.com/imartinez/privateGPT.

You can either download the repository by clicking on the Code | Download ZIP button:

Or, if you have git installed on your system, use the following command in Terminal to clone the repository:

$ git clone https://github.com/imartinez/privateGPT

--

--

ACLP Certified Trainer | Blockchain, Smart Contract, Data Analytics, Machine Learning, Deep Learning, and all things tech (http://calendar.learn2develop.net).