How I Made One of the World’s First 100% AI-Generated Podcasts

Created by machine learning, not humans

Eric Borgos
Level Up Coding

--

Podcasting

A few weeks ago an AI-generated podcast named “Joe Rogan interviews Steve Jobs” went viral because of how realistic it sounded. It was an impressive accomplishment, but really just an advanced combination of the various technologies I was already working with (such as GPT-3 and voice cloning), so I decided to make my own version.

There are some potential problems though with creating AI models based on famous people. Legally, celebrities have a “right of publicity” which prohibits unauthorized use of their name, photo, likeness, and voice. You can get around this if you are doing it as a parody or for educational purposes, but that did not apply to my situation.

Instead, this is what I came up with:

BoredHumans.com AI Podcast

No celebrities involved. It was inspired by Podbot.ai, but my version is more automated because it comes up with the Podcast episode title on its own. Other similar AI podcasts include Lexman and Roborah.

The first step in the programming for this project was having the AI “think” of a topic that would then be used as the title of the podcast. I decided that training an ML model on blog posting titles would be better than podcasts, so I put 100,000 blog posting titles into open-source aitextgen for it to learn to write good ones, but that turned out horribly (I am not sure why). So instead I switched to the NLPCloud.com Text Generation API which uses GPT-J (similar to GPT-3). No fine-tuning was needed, instead, I relied on experimenting with various prompts to see what worked best. Some good ones I tried were: “I wrote a blog posting. It is called:”, “My blog posting is called:”, “ I wrote an article named:”, “ I thought of a name for my blog posting. It is:”, “The top 3 blog posting names are:”, and “ The top 3 article names are:”.

Even though all of was automated, I didn’t want to put out any podcasts that were offensive or r-rated, so I manually looked at each title before it was used, and deleted any that were inappropriate. I also added a “bad words” filter to automatically delete any titles that contained words I would not want to use, but some still got through unless I moderated them.

Next, I programmed it to choose an AI host for the episode from a library of 8 hosts that I created ahead of time. The first and last name of each host was generated using open-source nameCreator (not AI) and the image of the host was randomly chosen from my “AI-Generated Faces” website. Each host was then randomly matched with an AI voice from resemble.ai (this costs money to use, but there are open-source alternatives such as Coqui and Tortoise).

To get the text content for the podcast, I fed the title to NLPCloud’s Blog Post Generator API. I could not find any program or API specifically for AI podcast content, so for now, I am using what is meant to be a blog posting. My programming automatically augments the resulting text with SSML, making certain parts of it louder (such as the title and the name of each section) and adding pauses between each paragraph, to make the host sound more human. Text-To-Speech from the Resemble.ai API then turns this text into a voice.

To make it sound like an official podcast, I gave each episode intro and outro music randomly picked from a library of short MP3 files I downloaded from Storyblocks.com. FFmpeg and MoviePy, both of which are open source, were then used to automatically mix the audio.

Cover art for the episodes was generated using Stable Diffusion (open source), which is a new text-to-image program where you give it a phrase (a “prompt”) and it converts your concept into an image using machine learning. For each podcast episode, the title was the prompt.

I could have hosted the podcast on my server, but instead, I uploaded to BuzzSprout using their free account level and embedded their podcast player on my BoredHumans.com AI Podcast page. I partially did this to make it look more official, but it also made it so I could submit the podcast to Apple, Spotify, Amazon, and other places where podcasts are heard.

Right now I only have 1 episode completed, but my plan is to create 100 more and write code to use the BuzzSprout API to automatically upload 1 episode a week. Podcasts are distributed using RSS feeds, which means whenever I upload a new episode it will automatically appear on all the major podcast sites.

I am also looking into turning my audio podcast into a video podcast, using Synthesia, see below:

AI Video Hosts

There is a potential problem with this though. Their terms and conditions page has two restrictions that seem to prohibit what I want to do:

Restriction #1: In User Generated Content in which Stock Avatar is making any kind of statement of opinion, including expressing any personal preferences or experiences as if they are Stock Avatar’s preferences or experiences.

Restriction #2: In User Generated Content in which Stock Avatar is making any kind of statement of fact regarding religion, politics, race, gender, sexuality, or other similar topics that are known to be sensitive to certain demographics.

The solution would be for me to use another similar service or create my own text-to-video software, so that is what I am working on next.

--

--

I am the owner of Impulse Communications, Inc. which is a 25 year-old Internet company specializing in artificial intelligence and website development.