Level Up Coding

Coding tutorials and news. The developer homepage gitconnected.com && skilled.dev && levelup.dev

Follow publication

Photo by Jason Rosewell on Unsplash

If Web Apps Could Talk — Intro to the Web Speech API

Claire Froelich
Level Up Coding
Published in
8 min readFeb 1, 2020

--

Build a sassy pronunciation checker that uses simple JavaScript to judge your language skills

What if it were easy to add speech-to-text and text-to-speech functionality to your web app? Have no fear, There’s an API for That™. HTML5’s native Web Speech API surfaces two interfaces for handling voice data with ease: SpeechSynthesis for turning text into sound, and SpeechRecognition for the opposite.

Dipping your toes into the Web Speech API you will find plenty of tutorials on using SpeechSynthesis or SpeechRecognition, but this tutorial will have you killing two birds with one app using no more than HTML/CSS/JavaScript. Your end goal is this:

Can you say it?

Try it out here. The user selects a language, enters the phrase to attempt to say and presses ‘Hear it’ to hear it spoken aloud. Next they press ‘Say it’ to activate the mic and record their best attempt. Finally, the app displays to the user what it heard and lets them know whether their utterance matches their goal.

Disclaimer!

Before you start scheming your million dollar app that runs entirely on voice commands à la Star Trek, note the caveats.

  • The Web Speech API is experimental. While native to HTML5, it is not yet canon and specs could change.
  • Browser compatibility is limited. To sum up the facts, SpeechSynthesis is available to 90% of users and SpeechRecognition to a mere 68%. For simplicity’s sake we will design this app for Chrome browser. (Note: for mobile, SpeechRecognition is only compatible with Chrome for Android and Android browser.)

This one’s for fun!

Now that that’s out of the way, let’s dive in.

Getting started

Make an empty project directory called am-i-saying-it-right. Inside, add a file named index.html and paste the following code for your app structure.

index.html

Points of interest:

  • Animate.CSS is imported in the <head> to animate our mic button later
  • The value attribute in our <option> tags for languages holds the language codes recognized by the Web Speech API (see an unofficial list here). We will use them later to tell the engine to “listen for Chinese” or “speak French”, for example.
  • The empty #result-message <div> will be used to tell the user whether their pronunciation is good or sucks.
  • main.js is imported with type="module" in the <script> tag so we can organize our scripts by concern and import them as modules in main.js

Next, bootstrap your style by creating a file style.css and pasting in this code.

Your file tree should now look like this:

Loading index.html in Chrome browser should give you this:

The beauty of silence.

You’ll notice that it does nothing yet. Let’s start making the app speak.

Let there be speech!

The basic syntax for SpeechSynthesis is quite simple:

Basic speechSynthesis syntax

The speechSynthesis interface only “speaks” SpeechSynthesisUtterances. To get the browser to speak, first create a SpeechSynthesisUtterance and assign the string you want spoken to its text attribute. The language spoken will default to the app’s <html lang="..." > value unless a language code is specified with the utterance’s .lang attribute.

Based on this let’s make a function called speakText() that speaks a given string aloud in a given language. In the root of the app, make a module file called textToSpeech.js and insert this code:

textToSpeech.js

Lines 3–5 check whether the user’s browser supports speechSynthesis and nudges them to use Chrome if not.

The rest is familiar already — we create a new SpeechSynthesisUtterance and configure it with the given text (string) and lang (also a string) arguments. Finally we export the module to use it in a new file called main.js.

Create the new file main.js. Maybe you’ve caught on that all of our files in the project will be in the root:

This is the code to put in main.js:

main.js

At the top, you import the speakText() function you’ve just made and exported. Then grab the ‘Hear it’ button, the text input field and language selector from the DOM. Lastly, you add a click event listener to the ‘Hear it’ button to trigger our speakText() method to read the input text out loud when clicked.

Easier heard than said

You’re excited to hear your browser speak French, but when you try ‘Hear it’ you might get an error like this in your console:

what.

That’s my fault because I made you use JavaScript ES6 modules, which due to security reasons require your code to be served to satisfy script references. You can read about it here if you want.

If you aren’t already serving your code on localhost for testing, now is the time to launch your server. If you aren’t sure how to do that, a simple solution is to run Python’s SimpleHTTPServer in your project folder following these quick instructions. Once your server is on, you can access your app from the URL http://localhost:<PORT NUMBER HERE>/ and hear it speak French to you. In my case, for example:

Before you cry ‘BUG!’ make sure your volume is on.

In fact, it will now speak any of the languages you select. Go ahead and listen to omellete au fromage in Japanese or German flavors.

Introducing SpeechRecognition

So far the app is already fun. Maybe you feel like calling it a day and using it to demystify the names of all those French wines in your cupboard, or to let your mother know she’s been pronouncing quinoa wrong all these years.

But we have yet to get to the best part — the judging factor!

SpeechSynthesis and SpeechRecognition are the meat and potatoes of this tutorial. We’ve done the meat — here’s the basic syntax for the potatoes:

Basic SpeechRecognition syntax

First, you create an instance of SpeechRecognition using the webkit prefix for Chrome. After configuring this recognizer’s language with the .lang attribute, you tell it to fire the mic and start listening for speech.

There are a handful of events available to the recognizer, but we are mainly interested is 'result' which fires once a word or phrase is positively recognized. The results property of this event returns a SpeechRecognitionResultList object, which we can drill through to get the transcript of the recognized text. Above we display the transcript to the console on line 7, if words were actually recognized. In case you’re curious what the SpeechRecognitionResultList object looks like, here’s a sample result of me saying “I have two eyebrows” in English:

Example of SpeechRecognitionResultList returned on ‘result’ event

Two things to note here:

  1. You need to serve your code through a web server for recognition to work in Chrome. No worries — we are already set up with localhost from earlier.
  2. The user will be prompted for mic access the first time in a session.start() is called on a SpeechRecognition object. You can’t get around this. It is to prevent spying on people’s speech without their knowledge.
An annoying necessity.

Adding speech recognition to our app

Make a module in the root of the app named speechToText.js. You will create a function called recognizeSpeech() that accepts a language, starts a recognizer in that languuge, and returns a Promise containing the recognizer result once it finishes listening.

Note that all this recognition magic happens in a black box in the cloud, making it an asynchronous operation. We will use the async and await keywords to tell JavaScript, “Hey, this is an asynchronous call, wait to hear back from the cloud with a fulfilled promise containing our transcript before doing something with it!”

speechToText.js

Don’t forget to export this module so we can use it in main.js.

Now we need to trigger ourrecognizeSpeech() when the user clicks the ‘Say it’ button. Switch over to main.js and import the recognizeSpeech function you just made, right below your import of speakText. We will grab and add an event listener to the recognizeButton, below the speakButton event listener (omitted):

What’s going on here? After getting the user’s selected language from the dropdown value, we pass it to the recognizeSpeech() function. This function returns a promise containing the recognition transcript once the user is done speaking. We take the result of this fulfilled promise and display it as the text value of the second input box so the user can see what they said. Lastly, the .catch will intercept any error that might occur in this Promise chain and display it to the console.

Let’s try it out! Make sure your server is on, click ‘Say it’, allow the mic, and you should see a some kind of icon on your tab indicating the mic is listening.

Give it a go:

I watched too much Dexter’s Lab as a kid.

You have now mastered both SpeechSynthesis and SpeechRecognition, in multiple languages! One problem though — where’s the sass when you pronounce something wrong?

Final touches

Judge the user’s pronunciation by displaying a message

Remember that empty div #result-message in the HTML? That’s where we will display a message to the user letting them know whether they successfully said the word. Make a new module file named compare.js and add this code.

compare.js

This compare() function compares the target string and the user’s attempt, then displays a success or failure message in the div and changes the background color to green or red before resetting to normal after two seconds. Be sure to export this function, import it into main.js, then call it as an additional action in the chain of .then’s on the asynchronous recognizeSpeech(). Here is the complete .main.js file reflecting these additions :

main.js (added lines 3, 29–31)
I’m sure you can think of something sassier.

Animate the ‘Say it’ button when active

Make it more obvious to the user when the mic is active. I used Animate.CSS which animates elements simply by adding class names. I made this helper function to toggle the animation classes, and then called it in speechToText.js on the ‘start’ and ‘end’ events of the recognizer.

inside recognizeSpeech() in speechToText.js
Now the button pulsates when the mic is active — believe me.

Conclusion

Adding voice to your app with Web Speech API can be a nice-to-have extra feature, given its limited browser compatibility. You can get your browser talking or listening with just 3–5 lines of vanilla JavaScript using the Web Speech API. You even know how to handle transcript results with Promises, giving you power to navigate chains of events with voice. The vocal world is your oyster, go nuts. But be sensible — you probably don’t want users yelling their passwords or social security numbers into forms on the train.

See the full code from this tutorial here, and try out the app here.

--

--

Responses (1)

Write a response