Build an AI Slack Bot with Custom Data

Using python and OpenAI embeddings

Ida Silfverskiöld
Level Up Coding

--

Who doesn’t use AI to create images today? Credit to Midjourney.

What are we building? We’re building a bot that will be using a specific dataset to create answers for a prompt. I.e. we’ll ask it a question and it will search through our file and then answer based on that file. I’ll use OpenAI’s semantic text search using embeddings, simply because it is easy enough to implement, along with the ChatGPT API. In hindsight, there are probably better embedding models out there.

You might also be asking why we aren’t just fine-tuning a model. Fine tuning is a much slower process and requires more computational resources than just using embeddings, because you’re actually updating the parameters of the model. Using fine-tuning, it will take much longer to achieve the answers you want. Think of it like trying to change someone’s inherent characteristics rather than just giving him or her the knowledge or information they need up front.

We’ll go through this step by step so you can see exactly what this takes and how this works. Today you can also use Langchain to create a document chatbot much faster. However, if you’re new to working with embeddings, this tutorial will break down the logic of building a document aware bot.

For this use case, I’ll use a dataset with US Supreme Court cases as it was easy to find but the idea is to use this with a custom dataset containing technical documentation or customer Q&A.

See an example from the SupremeCourtAI Slack bot I built below as a test. This bot is using the information from the file to answer the question. I’ve told the bot that they should answer in simple terms, answering as if the user has never studied law. As we’re using the ChatGPT API, we can keep asking it questions to break down the information we’ve been given. I can also ask it to make the information more entertaining or dumb it down further.

Here the bot is using a bit of outside information at the end there, as I don’t think I’ve provided it something on “learnings.” So, you will need to tweak the background content to make sure it answers as you want it to.

To understand what’s happening under the hood, check the logs below here.

We see the question being typed out (“cases on abortion”) just before we get back the results from the file. I’ve set a limit of 5 results here.

See the rows that are being grabbed from our file for the question “cases on abortion”? This is how semantic text search helps us search through the file with the question. We can use this data to send to the ChatGPT API along with the prompt. After the user has asked the original question, she or he can keep asking it questions pertaining to those cases, breaking down the information in a user friendly format.

It would take some work to build something similar with technical documentation or maybe educational texts but it shouldn’t be that difficult.

What’s time consuming is putting together the dataset itself which is why I used the Supreme Court one that was already available.

To achieve what we’re setting out to do, I mentioned that we’ll use embeddings. Embeddings are a way of representing words or phrases as numerical vectors. In semantic search, we use these embeddings to match a question (also represented as an embedding) to a row of data, allowing us to get out a score of similarity for each row. The highest scored rows we’ll use to feed into OpenAI with the question as background information for the bot.

If this sounds like gibberish, just keep going. It’s easier than it sounds.

The only concern using this method, is that the larger the file, the longer time it may take to respond. If you have more than 3,000 rows it may take more than 30 seconds for the bot to answer. However, for most cases, you’ll be working with smaller files with 500–1000 rows of data. Think customer service questions or sub sections in a technical documentation. Each row can contain quite a bit of text, although there is a limit there as well.

What kind of tools do we need?

I usually use Serverless functions when working with bots like these, it is more efficient. However, we’re comparing the question embedding to all the row embeddings in our dataset and selecting the row with the highest cosine similarity. To do this with OpenAI, we need a few too many python libraries. So, we’ll exceed the size limit for adding layers within AWS Lambda making it a bad choice.

So instead to make this very simple, I’ll use an online IDE called Goorm IDE. I’ve used them before and I kind of like how smooth it is to work with.

You can see the finished code by going to my public container here. You’ll need to add in your own .env file with the correct credentials and keys to make it work.

Before we start with this though, you need to set up your datafile. l’ll use Google Colab to tweak the original file so it can be searched using embeddings. Here you can play around with it as well to get out the answers you need.

What about costs?

Working with the original file to add embeddings will cost you a bit of OpenAI tokens, depending on how large it is.

I would say up to $0.5 worth of tokens for 2000–3000 of rows. But probably less. For every prompt, the cost will be around $0.013. If you are new to OpenAI you will receive $18 worth of tokens so this endeavour will be free to you.

In total I’ve paid about $1 playing around with this for a few days and I’ve re-created a few of these files.

Keeping the Goorm IDE always-on will cost a bit. But for trying this you don’t necessarily need to keep it on 24/7. The container is free. You can also deploy it elsewhere later.

Steps to completion.

These are the steps that we’ll be taking to get this done.

  1. Set up the original file with the custom dataset using embeddings. We’ll use Google Colab.
  2. Setting up our environment in GoormIDE and setting up our POST route with Flask.
  3. Set up our Slack application, enable events and verify the url that we’ve created.
  4. Start by responding to mentions in Slack via our code.
  5. Responding to mentions in Slack using the ChatGPT API.
  6. Add in semantic text search with the file we created at the start.

1. Setting Up Our Custom Dataset

To enable you to search a document using embeddings, you’ll need to add embeddings to every row of data in our dataset. I’ll do this with OpenAI’s embeddings using Google Colab notebooks.

Here I’m working with a static file. As said previously, I grabbed a dataset of all Supreme Court cases for simplicity. You’ll need a csv file for this if you do not want to try this out with the same dataset.

Finding examples from OpenAI on how to add in embeddings for your file is simple, the example I’m showing you will look similar to what they have.

Open up the link below to see the script we’re working with.

You can also find my Github repo for this here.

Copy the script to your Drive to enable you to work with it. You will need to tweak a few things here before you run it.

First, make sure you add in your own OpenAI API key.

# openai api key
import openai
openai.api_key ='SETYOURKEYHERE'

If you do not have an API key, go to platform.openai.com and sign up. Then navigate to API Keys to create a new key.

When you get to the 7th code block you will need to tweak a few things here depending on what your datafile looks like.

# load & inspect dataset - here we have a folder called datasets in my Google drive and the file itself is called justice.csv
input_datapath = "/content/drive/My Drive/datasets/justice.csv"
df = pd.read_csv(input_datapath, index_col=0)

# select only specific columns from the data frame
df = df[["name", "term", "first_party", "second_party", "facts", "first_party_winner", "disposition"]]

# drop any rows with missing values
df = df.dropna()

# create a new column with combined values
df["combined"] = (
"Name: " + df.name.str.strip() + "; Term: " + df.term.str.strip() + "; First Party: " + df.first_party.str.strip() + "; Second Party: " + df.second_party.str.strip() + "; First Party Winner: " + df.first_party_winner.str.strip() + "; Disposition: " + df.disposition.str.strip() + "; Facts: " + df.facts.str.strip()
)

# print first two rows to see what the dataset looks like
df.head(2)

First, make sure that the data path is correct. Here I have a folder called datasets in my Drive and a file called justice.csv.

We’re cleaning this file up a bit in the code above. We’re only keeping certain columns and then creating a “combined” column with text from all the other columns. We will need this “combined” column to search through the text.

As a note, when doing this for real, you want to ideally condense this “combined” column with only what is important. Here I just threw in all columns in there.

If you’re working with your own file, change out the column names to what is relevant to you.

In the 8th code block we’re checking that the combined column for each row isn’t exceeding the token limit for text-embedding-ada-002 and if it is we’re removing it. I don’t think any of my rows exceeded the limit of 8,000 tokens.

# get_encoding takes text as input and returns its corresponding token encoding
encoding = tiktoken.get_encoding(embedding_encoding)

# create a new column n_tokens
df["n_tokens"] = df.combined.apply(lambda x: len(encoding.encode(x)))

# omit reviews that are too long to embed
df = df[df.n_tokens <= max_tokens]

# print length of dataframe to make sure it isn't too long
len(df)

Lastly, we’re creating a new column called “embeddings.” This is the column we’ll use later to match a question to a row of data.

Once this is finished, we’re saving the new file to your Drive. You’ll find it within the Colab Notebooks folder that has been created for you.

# condense the file
df = df.head(2000)

# We're saving a new file with the embedding arrays - this may take a few minutes - 2000 rows cost about $0.12
df["embedding"] = df.combined.apply(lambda x: get_embedding(x, engine=embedding_model))
df.to_csv("/content/drive/My Drive/Colab Notebooks/data/justice_supreme_court_cases_new.csv")

You can run your script and try it out.

The clip below is in Swedish, but it should look similar to you.

Once you’ve tweaked the file with your OpenAI API Key and made sure the path is correct to your datafile you can hit connect and then run all.

It will ask you to authorize your Google account, you should do so. This will enable it to access the csv file.

It will take a few minutes with the last code block. But once it has finished, navigate to the file that has been created in the Colab Notebooks folder and save it somewhere.

This is where you’ll find the dataset with embeddings saved in your Drive.

If you would now open up this new file, you would see an additional column called “embeddings” with a bunch of numerical vectors. These vectors capture the meaning and context of the words so we can later match a question to them.

If you want to go ahead and try out semantic search in Google Colab directly, check out OpenAI’s cook book for semantic text search here. Or, check out my script to work with this exact dataset here.

We’ll do this later in our python script as well.

2. Setting Up Our Environment & Our HTTP POST Route

We could set up our code locally and deploy it elsewhere but to make this really simple I’ll be using an online IDE. We’ll be using GoormIDE.

I will also be able to set my container as public here so you can go in there and look over the code directly. You would then copy everything in there and then set your keys via your .env file.

Let’s start for the beginning though. If you don’t have a GoormIDE account, you can go ahead and create one at the same page. It’s quick.

Create a new container.

You can set a name and description. Keep it as private.

Here you can choose to deploy to Heroku or AWS Beanstalk, but we’re not doing that here right now. We don’t need to.

Choose python as your Stack.

Once you’re done you can click Create Container at the bottom.

After it has been set up for you, click Run Container. It takes a bit to load but you should have something like this in front of you once it has finished.

Create a new requirements.txt file by typing this in the terminal.

touch requirements.txt

You can then type this in the terminal to open it.

goorm requirements.txt

Or just double click on the new file.

Once it is open, add these requirements in there.

plotly
scipy
python-dotenv
flask_executor
slack_sdk
openai
numpy
pandas

Save the file (Command + S). It will look like this so far.

Now install this with pip.

pip install -r requirements.txt

Once done, open up the index.py file that’s in your folder.

Paste in the code below in there.

import os
import json
from flask import Flask, request, jsonify
from dotenv import load_dotenv

app = Flask(__name__)

# Credentials
load_dotenv('.env')

# get our credentials from .env
VERIFICATION_TOKEN = os.getenv('VERIFICATION_TOKEN')

# create a route for slack to hit
@app.route('/', methods=['POST'])
def index():
# grab the json body from the request
data = json.loads(request.data.decode("utf-8"))
# check the token for all incoming requests
if data["token"] != VERIFICATION_TOKEN:
return {"status": 403}
# confirm the challenge to slack to verify the url
if "type" in data:
if data["type"] == "url_verification":
response = {"challenge": data["challenge"]}
return jsonify(response)
return {"status": 503}

# run our app on port 80
if __name__ == '__main__':
app.run(host='0.0.0.0', port=80)

What this code is doing, is setting up a POST route for us using Flask. Once the app is running, we’ll be able to call this endpoint. Slack will want us to confirm a challenge here so we’ve set up the code for it already.

We do though need to set the Verification Token from Slack but we’ll do this once we’ve set up the Slack application after this. As you see we’re already importing the key from a .env file but haven’t yet created this file.

Save the file (Command + S) and then run the app.

python index.py

You should see something like this in your terminal. Disregard this error for now.

Let’s go find the url we’ll need for Slack now. Go to Project → Running URL and Port.

Open this to find your registered URL.

As you see, we’re running it on port 80 hence why we have the same port number in our code.

Copy the URL and test it out somewhere if you want to. Like Postman. Remember it is a POST route and not a GET route so testing it in the browser won’t work. It will give you an error as it’s not receiving any data as of yet but you’ll see it is responding.

We’ll need to set up the Slack application now to get our Verification Token.

4. Setting Up Our Slack Application

For this part, you’ll need admin rights to a Slack workspace.

Once you do have access to a workspace go to api.slack.com to create a new app.

You will have to chose the workspace where you’d like it installed. Once your app has been created, look for your App Credentials under Basic Information.

You are looking for something called Verification Token.

Copy this token and go back to your GoormIDE container. Create a new file called .env with this command.

touch .env

Double click on the .env file to open it up. Here you’ll set your verification token.

VERIFICATION_TOKEN="SETKEYHERE"

We’ll be adding all our keys and credentials to this file as we go on. But you can save this file now and run your application again.

Look how it should look in your GoormIDE container. As a note, I’ve since deleted this app with this specific verification token.

Go back to your Slack app. Navigate to Event Subscriptions. Enable it.

Under Request URL use the URL we got from the last section. Add a slash after it and let it get verified by Slack.

Scroll down further and click on Subscribe to Bot Events. Add in app_mention and message.im.

Save your application.

Now go to OAuth & Permissions. Scroll down to Scopes. Add in chat.write as an OAuth Scope.

Now install your application to your workspace.

This will create a Bot User OAuth Token that you can copy right away. We’ll be using this in a bit. If it doesn’t, you can copy it from OAuth & Permissions, it should start with xoxb-.

Go back to your GoormIDE container.

5. Answering To Mentions in Slack

What we want to do first here is understand what is happening if we try to mention our bot in a Slack channel.

So add in a print() for the request data when our POST route is hit.

 print(data)

See the full code below. See the print(data) beneath the POST Route handler.

import os
import json
from flask import Flask, request, jsonify
from dotenv import load_dotenv

app = Flask(__name__)

# Credentials
load_dotenv('.env')

VERIFICATION_TOKEN = os.getenv('VERIFICATION_TOKEN')

# create a route for slack to hit
@app.route('/', methods=['POST'])
def index():
data = json.loads(request.data.decode("utf-8"))
# look over the data being sent from slack
print(data)
# check the token for all incoming requests
if data["token"] != VERIFICATION_TOKEN:
return {"status": 403}
# confirm the challenge to slack to verify the url
if "type" in data:
if data["type"] == "url_verification":
response = {"challenge": data["challenge"]}
return jsonify(response)
return {"status": 200}

# run our app on port 80
if __name__ == '__main__':
app.run(host='0.0.0.0', port=80)

Run your application again.

Now go to your Slack workspace and navigate to any channel. Mention your bot.

It will ask you to add it to that channel beforehand. Do so. Then mention your bot with some text after.

When you send a message that mentions your bot via your Slack workspace, you should see the payload printed in your GoormIDE terminal. See my logs below here.

We’ll be looking into this payload to see what we can use. What we’re looking for is the Bot’s ID. We’ll be using it to filter the messages coming in, to make sure we only answer if the bot’s name is mentioned. In all honesty, you could skip filtering with mentions if you only have app_mention as scope but it might be good for later in any case.

So, to follow along, we’re looking for the ID in data[“event”][“text”]. Mine above here seems to be <@U054DHQ8726>. Yours will be different, so make sure you grab the right ID there. We are also looking for the channel ID where the message came from, which seems to be data[“event”][“channel”]. We’ll use this to send back a message to the right channel.

Go back to your index.py, we’re changing this code again now. We’ll send a message back to the same channel with the text coming in when the bot is mentioned via the Slack SDK.

See the code below. Remember we have all dependencies already installed.

import os
import json
from flask import Flask, request, jsonify
from dotenv import load_dotenv
from flask_executor import Executor
from slack_sdk import WebClient

app = Flask(__name__)

# Credentials
load_dotenv('.env')

# allows us to execute a function after returning a response
executor = Executor(app)

# set all our keys - use your .env file
slack_token = os.getenv('SLACK_TOKEN')
VERIFICATION_TOKEN = os.getenv('VERIFICATION_TOKEN')

# instantiating slack client
slack_client = WebClient(slack_token)

# create a route for slack to hit
@app.route('/', methods=['POST'])
def index():
data = json.loads(request.data.decode("utf-8"))
# look over the data being sent from slack
print(data)
# check the token for all incoming requests
if data["token"] != VERIFICATION_TOKEN:
return {"status": 403}
# confirm the challenge to slack to verify the url
if "type" in data:
if data["type"] == "url_verification":
response = {"challenge": data["challenge"]}
return jsonify(response)
# handle incoming mentions - (!) CHANGE "@U0534BCTVRA"
if "@U054DHQ8726" in data["event"]["text"]:
# executor will let us send back a 200 right away
executor.submit(handleMentions, data["event"]["channel"], data["event"]["text"].replace('<@U054DHQ8726>', '').strip())
return {"status": 200}
return {"status": 503}

# function to send back a message
def handleMentions(channel,text):
# post message back to slack with the response
slack_client.chat_postMessage(channel=channel, text=text)

# run our app on port 80
if __name__ == '__main__':
app.run(host='0.0.0.0', port=80)

What this code does is simply respond to the mention in Slack to the same channel with the exact same text.

We’re adding in flask_executor here to be able to send back a 200 response to Slack right away. Otherwise Slack would continue to send a POST request to us over and over again until the function handleMentions had run all the way through. I.e. executor will let us run our code asynchronously.

To try this code, you can copy the text above and paste it into your index.py file but, as I mentioned before, make sure you change @U054DHQ8726 in the entire code to your own bot’s ID. Remember we printed the payload from Slack.

Also remember add in the SLACK_TOKEN to your .env file. Remember the token we created earlier that started with xoxb-.

Your .env file should look like this so far, with the appropriate keys added.

I’ll be moving on to add in ChatGPT into this, so AI answers the text for us.

6. Answering To Mentions With The ChatGPT API

Last, we tried to find the Bot’s ID by printing the payload that Slack gave us when the Bot was mentioned. Then we sent back the same text as a response. Just to see that it worked.

Let’s add in the ChatGPT API into this now so the text that is coming from Slack is answered back by AI.

This code will add in the openai module, and use the incoming text as the prompt to call the ChatGPT API. Once we have a response, it will send it back to Slack and then append the message object to a messageOb array. The messageOb is an addition and something unique to working with the ChatGPT API. This will let us achieve multi-turn conversations. However, you do not have to add this in if you only want single-turn tasks without any conversations. In this case you would would not append the message object to the messageOb array at the end of the handleMention’s function.

You can change the index.py file again now with the code below.

import os
import json
import openai
from flask import Flask, request, jsonify
from dotenv import load_dotenv
from flask_executor import Executor
from slack_sdk import WebClient

app = Flask(__name__)

# Credentials
load_dotenv('.env')

# allows us to execute a function after returning a response
executor = Executor(app)

# set all our keys - use your .env file
slack_token = os.getenv('SLACK_TOKEN')
VERIFICATION_TOKEN = os.getenv('VERIFICATION_TOKEN')
openai.api_key = os.getenv('OPEN_AI_API_KEY')

# instantiating slack client
slack_client = WebClient(slack_token)

# background information for the bot
messagesOb = [
{"role": "system", "content": "Your a Law Assistant to a real dummy..."}]

# create a route for slack to hit
@app.route('/', methods=['POST'])
def index():
data = json.loads(request.data.decode("utf-8"))
# check the token for all incoming requests
if data["token"] != VERIFICATION_TOKEN:
return {"status": 403}
# confirm the challenge to slack to verify the url
if "type" in data:
if data["type"] == "url_verification":
response = {"challenge": data["challenge"]}
return jsonify(response)
# handle incoming mentions - change the "@U0534BCTVRA"
if "@U054DHQ8726" in data["event"]["text"]:
print(data["event"]["text"])
# executor will let us send back a 200 right away
executor.submit(handleMentions, data["event"]["channel"], data["event"]["text"].replace('<@U054DHQ8726>', '').strip())
return {"status": 200}
return {"status": 503}

# function to send back a message
def handleMentions(channel,text):
# append user text to the messageOb array
messagesOb.append({"role": "user", "content": text})
# make the openAI call
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=messagesOb
)
# print response - see token count
print(response)
# post message back to slack with the response
slack_client.chat_postMessage(channel=channel, text=response.choices[0].message.content)
# append message to the messageOb object
messagesOb.append(response.choices[0].message)

# run our app on port 80
if __name__ == '__main__':
app.run(host='0.0.0.0', port=80)

First, before you run this, get your OpenAI API key set in your .env file as OPEN_AI_API_KEY.

All the keys that you need to set in your .env file for the code above to work.

Next, remember to change out the Bot’s ID here to your own, like you did before. First when filtering the incoming messages and then when replacing the text to remove the bot’s name from the text. If you don’t do this you will not get back a response in Slack as the function handleMentions won’t run. This is essential.

Once you’re done though, you can run your application.

python index.py

You can go ahead and try it in Slack now. Should work really well.

However, always look over your System — Content in your code to make sure you engineer your bot correctly. Remember I set some text that said it was a Law assistant to dummies in the code. Working with AI like this, you’ll have to work at getting good at prompt engineering. How you tell it to act means everything. So, remember to tweak the system content as well as the prompt as you go along to make sure it acts like you want it to. You need to be very specific.

We’re done here now. You’ve created a ChatGPT bot but now we add in semantic text search with our file as well.

6. Integrating Semantic Text Search

Let’s add the last part of this. Remember the file we created in the first section of this? Grab it now and just paste it into GoormIDE.

If you don’t have a file created yet you can take the one I prepared already here.

You also need to change our code again in the index.py file. This is the final code for the bot.

import os
import json
import openai
from flask import Flask, request, jsonify
from dotenv import load_dotenv
from flask_executor import Executor
from slack_sdk import WebClient
import numpy as np
import pandas as pd
from openai.embeddings_utils import cosine_similarity, get_embedding

app = Flask(__name__)

# Credentials
load_dotenv('.env')

# allows us to execute a function after returning a response
executor = Executor(app)

# set all our keys - use your .env file
slack_token = os.getenv('SLACK_TOKEN')
VERIFICATION_TOKEN = os.getenv('VERIFICATION_TOKEN')
openai.api_key = os.getenv('OPEN_AI_API_KEY')

# instantiating slack client
slack_client = WebClient(slack_token)

# path to datafile - should already contain an embeddings column
datafile_path = "justice_supreme_court_cases_new.csv"

# read the datafile
df = pd.read_csv(datafile_path)
df["embedding"] = df.embedding.apply(eval).apply(np.array)

# background information for the bot
messagesOb = [
{"role": "system", "content": "Keep the answer to less than 100 words to allow for follow up questions. You are an assistant that provides information on supreme court cases in extremely simple terms (dumb it down to a 12 year old) for someone who has never studied law so simplify your language. You should be helping the user answer the question but you can only answer the question with the information that is given to you in the prompt or at sometime sometime in the past. This information is coming from a file that he user needs to understand. It doesn't matter if the information is incorrect. You should still ONLY reply with this information. If you are given no information at all you can talk with the information that has been given to you before but you can't use outside facts whatsoever. If the information given doesn’t make sense, look to the information that has been given to you before to answer the question. If you have no information at all that has been given to you at any point in time from this user that makes sense you can apologise to the user and tell it you cannot answer as you don't have enough information to answer correctly."}
]

# create a route for slack to hit
@app.route('/', methods=['POST'])
def index():
data = json.loads(request.data.decode("utf-8"))
# check the token for all incoming requests
if data["token"] != VERIFICATION_TOKEN:
return {"status": 403}
# confirm the challenge to slack to verify the url
if "type" in data:
if data["type"] == "url_verification":
response = {"challenge": data["challenge"]}
return jsonify(response)
# handle incoming mentions - change the "@U0534BCTVRA"
if "@U054DHQ8726" in data["event"]["text"]:
# executor will let us send back a 200 right away
executor.submit(handleMentions, data["event"]["channel"], data["event"]["text"].replace('<@U054DHQ8726>', '').strip())
return {"status": 200}
return {"status": 503}

# function to search through the rows of data using embeddings
def search_justice(df, search):
row_embedding = get_embedding(
search,
engine="text-embedding-ada-002"
)
df["similarity"] = df.embedding.apply(lambda x: cosine_similarity(x, row_embedding))
new = df.sort_values("similarity", ascending=False);
# only return the rows with a higher than 0.81
highScores = new[new['similarity'] >= 0.81]
return highScores

# this function sends the prompt to OpenAI, with the results we got, and then sends a message back to slack
def handleMentions(channel,text):
# print the text from Slack
print(text)
# search through our dataset
results = search_justice(df, text)
# print the highest 5 results
print(results.head(5))
# set up the prompt with the matched results from the dataset
if results.empty:
prompt = text
else:
prompt = "Look through this information to answer the question: " + results[['combined']].head(5).to_string(header=False, index=False).strip() + "(if it doesn't make sense you can disregard it). The question is: " + text
messagesOb.append({"role": "user", "content": prompt})
# make the openAI call
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=messagesOb
)
# print response - see token count
print(response)
# post message back to slack with the response
slack_client.chat_postMessage(channel=channel, text=response.choices[0].message.content)
# append the message object to the messagesOb object
messagesOb.append(response.choices[0].message)

# run our app on port 80
if __name__ == '__main__':
app.run(host='0.0.0.0', port=80)

This will be a longer file and that’s why I’ve tried to break it down earlier.

Try to look through the comments to understand what it is doing. The only thing we’ve added here though is the loading of the datafile with pandas. We then we add in a search_justice function that will use the text to check rows for cosine similarity. The function will return all rows that have a higher or equal value of 0.81 similarity. We then use these in our prompt to OpenAI. How high or low this number should be is up to you. Choosing a lower number means that it may receive irrelevant information. Choosing a high one means you will need to be clearer when giving it directions. I found that 0.81 did well enough.

Remember, again, to change out your Bot’s ID in this file before you try it and make sure that the datafile’s path is correct. Also, make sure you have all your keys in your .env file.

Run your application.

python index.py

With the code we have above, the results will be something like below.

Look into your logs to understand how it is grabbing information from your dataset.

It may take some time to tweak the prompts until you get something you are happy with. Sometimes it feels like you’re speaking to a genie when telling AI what to do, you need to be very exact or you get something you didn’t expect.

As an end note, this could be very useful for any document you’d like to work with. Not sure if a SupremeCourt AI bot is all that useful considering GPT should already have much of this information on its own. However, trying this out with specific educational or technical documentation would be pretty neat.

Level Up Coding

Thanks for being a part of our community! Before you go:

🚀👉 Join the Level Up talent collective and find an amazing job

--

--