Make Your Python Program Speak!

How to use Google’s gTTS library

Matteo
Level Up Coding

--

Photo by Alex Knight on Unsplash

Have you ever wondered how to create a program that can speak? At first sight, it may seem like a really difficult task. However, Google has done most of the work for us, and now we can use their library to create audio files from text really easily.

Installing the Library

First of all, we need to install the gTTS library.

I suggest that you create a new virtual environment for this project (if you don’t know how to do it, check this article). Then, we can install the library using pip:

pip install gtts

Creating mp3 Files

The easiest way you can use this library is to create an mp3 file that you can then execute.

The first thing to do is to create a gTTS object. This has many parameters, but for now let’s just focus on the most important one: the text that we want to read.

from gtts import gTTStts = gTTS("Hello world")

Now that we have specified the words we want to read, we can save the audio to a mp3 file:

tts.save('hello.mp3')

This is everything you need to start creating mp3 files. Easy, right?

But there is much more that we can do: changing languages, adding a particular accent, and even playing the sound directly in Python!

Some Optional Parameters

Now that we know the basics, let’s see some more parameters of the gTTS object.

Language and accent

Firstly, you may want to use a language different from English, or maybe even a specific accent (e.g. Australian, American, British). To do so we use two optional parameters:

  • tId specifies the domain of the translate API to be used for creating the audio. This is the domain of the google translate website: https://translate.google.<tId> . The default value is com , but you can specify any valid domain, and this will use (in most cases) the local accent. For example, co.uk is the British accent, and com.au is the Australian one.
  • lang is used to specify the language of the text, using a two-letter code: en (the default value) is for English, fr is for French, es is for Spanish and so on. If you want to retrieve a list f all available languages, use this code:
from gtts import langprint(lang.tts_langs())

Here lang.tts_langs() returns a dictionary, where the keys are the names of the languages and the values are the corresponding codes.

Speaking slowly

There is also a parameter to make the program speak more slowly: slow . This is False by default but you can change it if you need.

tts = gTTS("Hello world", slow=True)

Preprocessing

If you want to apply some modification to the text before it is transformed into audio, you may want to use the pre_processor_funcs parameter. This takes a list of functions, drawn from the gtts.tokenixer.pre_processors sub-module of the library. Some of the more common ones are:

  • abbreviations is used to replace known abbreviations with their complete word.
  • end_of_line reform words that were split between the end of one line and the beginning of the next.

Note: Some more complicated preprocessors are available in the official documentation of the library.

Here is an example that uses the preprocessors:

from gtts.tokenizer.pre_processors import abbreviations, end_of_linetts = gTTS("Hello world", pre_processor_funcs = [abbreiations, end_of_line])

Playing The Audio Directly

Now we know how to create an mp3 audio, but we still want to play it from inside our Python program.

To do so, we need to install a library to play mp3 files. In this tutorial, we will use pygame , but you can choose anyone you want.

First of all, let’s install Pygame:

pip install pygame

Now, to play the mp3 file we have created before you can use this code:

from pygame import mixer
import time
mixer.init()
mixer.music.load("hello.mp3")
mixer.music.play()
time.sleep(2)

Here we initialize the mixer from Pygame, load the audio and then play it.

It is important that we add the time.sleep command: this makes sure that the program does not terminate before the audio is played.

The Complete Code

Here you can find the complete code that we used in this article:

from gtts import gTTS
from gtts.tokenizer.pre_processors import abbreviations, end_of_line
from pygame import mixer
import time
# Create the text
text = "Hello World!"
tts = gTTS(text, slow=False, pre_processor_funcs = [abbreviations, end_of_line])
# Save the audio in a mp3 file
tts.save('hello.mp3')
# Play the audio
mixer.init()
mixer.music.load("hello.mp3")
mixer.music.play()
# Wait for the audio to be played
time.sleep(2)

Conclusion

Thank you for reading this article! If you need more information, check out these resources:

--

--

A student, with a passion for programming and in particular Machine Learning.