Ditch SQL — Let Vanna Handle It!

SQL users will be replaced by this AI tool in no time. Here’s How.

Prathamesh Gadekar
Level Up Coding
Published in
7 min readSep 12, 2023

--

When I first started delving into SQL I must admit it felt quite overwhelming.

Despite being tech savvy and usually quick to grasp programming languages my experience, with SQL was quite the rollercoaster ride. Initially, it appeared straightforward as I effortlessly retrieved data from tables using SELECT statements. However, as I delved deeper into its intricacies things became increasingly complex.

I vividly recall spending hours attempting to understand why my JOIN wasn’t producing the expected results. After staring at my laptop screen for what seemed like an eternity a eureka moment finally arrived.

It turned out to be a mistake where I had referenced the key of the wrong table. It was indeed a humbling experience that taught me lessons.

Enter Vanna — AI SQL Agent:

I have encountered some challenges with SQL in the past. But recently, I came across Vanna, an AI SQL Agent that apparently can access data without using SQL but PLAIN ENGLISH!

I was still skeptical then, but I decided to try it out anyway.

How Vanna Works?

Workflow of Vanna (Source: Vanna.ai)

Vanna makes it simple to work with databases. In just simply just 2 steps:

  1. Train a Model on Your Data:
    The first thing you might want to do is train a custom model on your dataset. The more quality and better amount of data you use — the better your model will be.
  2. Ask Questions:
    After your model is trained, now you can start asking questions in plain English. Vanna will make use of the model for generating SQL queries that retrieve your desired data from your database.

Getting Started with Vanna

Let’s walk through the process of getting started with Vanna using a sample dataset. The dataset used has the following schema:

Schema of the dataset (Source: Vanna.ai)

Installation and Setup

The installation and setup can be done in 4 simple steps:

# 1. Install Vanna
%pip install vanna
import vanna as vn

# 2. Log In to Vanna
api_key = vn.get_api_key('my-email@example.com') # Your email here
vn.set_api_key(api_key)

# 3. Set Model (Example: 'chinook' for a public model)
vn.set_model('chinook')

# 4. Connect to the Database (Example: SQLite)
vn.connect_to_sqlite('https://github.com/lerocha/chinook-database/raw/master/ChinookDatabase/DataSources/Chinook_Sqlite.sqlite')

Asking Questions

Now that you’re set-up, you can start querying your database:

# Ask a Question
vn.ask("What are the top 5 artists by sales?")
Output of the Code (Source: Vanna)

For a detailed guide on how to get started with Vanna, please refer to the documentation.

Training Vanna

To tailor the model specifically for your dataset, You can manually train the model. You need to first set up your model:

# Log In to Vanna
api_key = vn.get_api_key('my-email@example.com')
vn.set_api_key(api_key)

# Set Your Model (Choose a unique model name)
vn.set_model('my-model')

The model can be trained in three ways:

  1. SQL Commands:
  • The SQL commands form the backbone of database interactions. Training your Vanna model with SQL commands helps it to correctly understand and generate SQL queries.

You can insert SQL commands into your model’s training data using the functionvn.add_sql(question, sql, tag), where:

  • question: Represents a natural language question or query.
  • sql: Corresponds to the corresponding SQL query that answers the question.
  • tag (Optional): Provides a tag or label for the training data.
#Example
vn.add_sql(question="What is the average salary of employees?",
sql="SELECT AVG(salary) FROM employees" )

2. DDL (Data Definition Language):

  • DDL statements are used for defining and managing the structure of a database including creating tables, altering schemas, and defining constraints.
  • You can help improve your Vanna model’s understanding of database structures by adding DDL statements to its training data using the functionvn.add_ddl(ddl).
#Example
vn.add_ddl(ddl="CREATE TABLE employees (id INT, name VARCHAR(255), salary INT)" )

Training with DDL statements helps your model get familiar with the database’s structure and schema so that it is able to generate SQL queries related to database operations.

3. Documentation:

  • Documentation contextualizes and explains what the various elements in a database are, including table definitions or key metrics, business rules
  • It can be appended to the training data by using the vn.add_documentation(documentation) method.
#Example
vn.add_documentation(documentation="Our organization's definition of sales
is the discount price of an item multiplied by the quantity sold." )

Adding documentation gives your model the ability to generate more insightful responses when generating SQL queries with context associated.

Refer to the documentation for more.

Exploring Training Data

Exploring training data involves inspecting the quality of the dataset quantity as well as the diversity used to train an AI model. This ensures that data is precise enough; it covers a wide range of scenarios while being timely and up-to-date.

# View Training Data
vn.get_training_data()

In any case, Vanna maintains user interaction capabilities that allow users to ask questions and get SQL queries or responses based on the model.

Automatic Training

If you are lazy, you can also train the model using the automatic training feature of Vanna (Currently, it is only available for Snowflake). It works in three steps:

  1. Connecting the Database:
  • Connect the database that you want to train your model with.
# Connect to the database (e.g., Snowflake)
vn.connect_to_snowflake(
account='my-account',
username='my-username',
password='my-password',
database='my-database'
)
  • Once connected to the database, Vanna automatically explores it to gather metadata

2. Generating Training Plan:

  • Vanna generates a training plan from collected database metadata which tells what aspect of the database schema and queries should be incorporated into the model during its training.
  • This defines the scope of automatic training.
# Generate a training plan based on the connected database
training_plan = vn.get_training_plan_experimental(
filter_databases=['my-database'], # Optional: Filter databases
filter_schemas=['public'], # Optional: Filter schemas
include_information_schema=False, # Optional: Include system schema
use_historical_queries=True # Optional: Include historical queries
}

3. Training Execution:

  • Vanna uses this training plan to automatically train the AI model by incorporating SQL queries, Data Definition Language (DDL) statements, and database documentation in the knowledge base.
  • This step helps understand the structure and semantics of the database for the model.
# Train the model using the generated training plan
vn.train(plan=training_plan)

Removing Useless Datapoints:

Quality of training data is important for how well your model trains so make sure to remove useless data points.

Also, do not put sensitive or old out-of-date information through these steps because it may somewhat influence how badly your model behaves.

# Remove Training Data (if needed)
vn.remove_training_data(id='my-training-data-id')

Connecting To Databases

Vanna can also integrate with several databases of types, among others but not limited to Snowflake, BigQuery, Postgres, and many more.

To enable the connection toward a certain type of database, you need a Python database connector/library that could help establish a connection to the server of the database.

#Connecting to SnowFlake
# Set your Snowflake credentials
account = "your-account-url"
username = "your-username"
password = "your-password"
database = "your-database"

# Connect to Snowflake
vn.connect_to_snowflake(
account=account,
username=username,
password=password,
database=database,p
)
#Connecting to SQLite
# Specify the URL to your SQLite database file
sqlite_url = "sqlite:///path/to/your/database.sqlite"

# Connect to SQLite
vn.connect_to_sqlite(sqlite_url)
#Connecting to Postgres
# Set your Postgres credentials
host = "your-host"
dbname = "your-database"
user = "your-username"
password = "your-password"
port = 5432 # The default port for Postgres

# Connect to Postgres
vn.connect_to_postgres(
host=host,
dbname=dbname,
user=user,
password=password,
port=port
)

Different use cases have been dealt with by different types of databases. Local development and testing may be enabled via SQLite while web applications might benefit from utilizing Postgres.

Vanna’s versatile support for these databases caters to an overwhelming majority of project requirements.

Results

In summary, Vanna is actually one month ahead of where it was. During this period, the user count increased by about 600%.

Corporations as big as Fortune 1000 companies have started to adopt Vanna for their data query needs. The adoption of corporations makes the platform start to look like a robust solution that can handle abstract database interactions.

Vanna has decided to open-source its framework in order to increase accessibility and versatility. On top of that, they are building an application hosted on the cloud which is web-based to make users easy to use and reach wider audiences.

Future plans for Vanna include integrating Language Model Models (LLMs). This step shows how driven the company is towards keeping up with AI technology advancements while at the same time giving users more options.

To stay updated on Vanna’s recent achievements, including user growth and open-sourcing plans, visit the documentation.

Conclusion

This blog delved into the facets of utilizing Vanna, ranging from setup and model training to connecting with databases and leveraging its features.

To put it briefly Vanna streamlines the process of generating SQL queries and interacting with databases making it an invaluable tool, for data professionals, analysts, and developers.

Its adaptability and user-friendly nature make it an enticing option, for individuals seeking to augment their database-related tasks through AI-driven assistance.

--

--

A Computer Science Enthusiast. I write about machine learning and data science.