Build your personal speaker assistant with Amplify and ChatGPT

Published in

Level Up Coding

11 min readMar 28, 2023

As a speaker, it can be difficult to keep track of all the information you need to convey during a presentation. In light of recent AI developments, that’s where an AI speaker assistant can come in handy.

This article is based on improvements and feedbacks received on both a previous article I wrote about a serverless API for text-to-speech answers

A Serverless API for ChatGPT text-to-speech answers

Lately, there has been a lot of noise around ChatGPT, for good reasons. Rather than just trying its service from OpenAI…

levelup.gitconnected.com

and a Podcast, where three humans and a bot discussed about serverless and ChatGpt implications:

The result of this article consists of an Amplify-based web application. It will allow users to ask questions by voice to our ChatGpt assistant, Johanna, which will reply with its own voice.

Why

A speaker assistant is an AI-powered tool that can help speakers to complement their talk, among others, with:

Providing information: If the presenter is unsure of a fact or statistic, they can ask the speaker assistant to look it up and provide an answer in real-time.
Interacting with the audience: A speaker assistant can facilitate audience participation by taking questions from the audience and providing real-time responses. This can help to make the presentation more engaging and interactive for the audience.
More advanced help such as post-review, setting reminders during the talk and more.

Architecture

A high level architecture can be seen below:

The flow is the following:

End users will access a web application, hosted in CloudFront and S3, by registering as users to the platform.
Once they are authenticated (via Cognito), they can ask a question to Johanna by speaking in their, for instance, laptop microphone (via the React Speech Recognition library).
Their question will be answered by a serverless API integrated with ChatGPT (API Gateway + Lambda + OpenAI APIs)
Finally, the answer gets translated into voice by leveraging Amplify predictions module, which uses Amazon Polly for text-to-speech capability. Let’s start.

Prerequisites

It is recommended to have:

Setup Amplify Project

Amplify comes with a very nice documentation about getting started. To setup an Amplify project:

Follow the Prerequisites to install needed software (Node) and configuring Amplify CLI.
Setup a fullstack project. This will initialize the Amplify backend via the init command and setup the frontend so that can interact with it. You can name your react app as you wish(e.g. amplify-ai-assistant)

Let’s focus on general features such as hosting and authentication so that later we can focus on application.

Add hosting (CloudFront + S3)

Once you have setup a fullstack project, you will have a typical React app based on create-react-app, which you can run on localhost:

To prep this dummy application in CloudFront and S3, run add hosting command and then choose below:

amplify add hosting

...

? Select the plugin module to execute … (Use arrow keys or type to filter)
 Hosting with Amplify Console (Managed hosting with custom domains, Continuous deployment)
❯ Amazon CloudFront and S3

...

? Select the environment setup: … (Use arrow keys or type to filter)
 DEV (S3 only with HTTP)
 ❯ PROD (S3 with CloudFront using HTTPS)

...

? hosting bucket name » <BUCKET_NAME_HERE>

The above example uses the Creat React App (CRA) project, which seems to slowly dying.

For your own learning you could use Vite + React or Next.js. This article will CRA as not being the primary focus of this article.

The changes have not been deployed yet to AWS. Before pushing the above resources, let’s add Cognito authentication.

Add authentication (Cognito)

We need to provision the backend which will store our users. This consists in a Cognito User Pool.

Amazon Cognito user pools

A user pool is a user directory in Amazon Cognito. With a user pool, your users can sign in to your web or mobile app…

docs.aws.amazon.com

To provision it, we need to issue an add auth command and follow below instructions:

amplify add auth

...

? Do you want to use the default authentication and security configuration? (Use arrow keys)
> Default configuration
  Default configuration with Social Provider (Federation)
  Manual configuration
  I want to learn more.

...

? How do you want users to be able to sign in? (Use arrow keys)
> Username
  Email
  Phone Number
  Email or Phone Number
  I want to learn more.

...

? Do you want to configure advanced settings? (Use arrow keys)
> No, I am done.
  Yes, I want to make some additional changes.

Before pushing/publishing the above resources, follow the Create Login UI section where you will install the react Amplify UI and decorate your App.js component with the withAuthenticator high order component.

As last command, we execute amplify publish. This results in provisioning both previous hosting components (CloudFront + S3) and Cognito user pool configuration to AWS.

amplify publish

At this point, you just deployed a dummy React app protected by an Amplify Cognito UI component. Below is the result:

Add API (Api Gateway + Lambda)

Our goal is to create a REST Node API exposing a POST endpoint which will contain code to call OpenAI APIs. Below command will create an API Gateway REST api, with a single endpoint (/ask), protected by IAM Authorization:

amplify add api

...

? Select from one of the below mentioned services: 
  GraphQL
> REST

...

? Provide a friendly name for your resource to be used as a label 
for this category in the project: » <API_NAME_HERE>

...

? Provide a path (e.g., /book/{isbn}): » /ask

..

? Provide an AWS Lambda function name: askChatGtpLambda

...

? Choose the runtime that you want to use: (Use arrow keys)
  .NET 6
  Go
  Java
> NodeJS
  Python

...

? Choose the function template that you want to use: (Use arrow keys)
  AppSync - GraphQL API request (with IAM)
  CRUD function for DynamoDB (Integration with API Gateway)
  GraphQL Lambda Authorizer
> Hello World
  Lambda trigger
  Serverless ExpressJS function (Integration with API Gateway)

...

? Do you want to configure advanced settings? (y/N) N

...

? Do you want to edit the local lambda function now? (Y/n) N

...

? Restrict API access? (Y/n) » Y

...

? Who should have access? ...  (Use arrow keys or type to filter)
❯ Authenticated users only
  Authenticated and Guest users

...

? What permissions do you want to grant to Authenticated users? ...  (Use arrow keys or type to filter)
 ● create
 ● read
❯● update
 ○ delete
(Use <space> to select, <ctrl + a> to toggle all)

...

? Do you want to add another path? (y/N) » N

Let’s push the function to AWS:

amplify push

Now let’s think about the content of the Lambda. The function will essentially call the ChatGpt OpenAI API by issuing a call to the createCompletion api.


const configuration = new Configuration({
    apiKey: <API_KEY_HERE>,
});

const openai = new OpenAIApi(configuration);

const completion = await openai.createCompletion({
    model: "text-davinci-003",
    prompt: JSON.parse(event.body).input.question,
    temperature: 0.6,
    echo: false,
    max_tokens: 2048
});

In order to call such api, you need to hold a ChatGPT secret api key, which can be found here.

As we do not want to store such secret directly in Lambda code or its environment variables, we will use SSM Parameter Store secure string to do that. Amplify of course, supports that. We need to update the function we have not yet pushed.

amplify update function

...

? Select the Lambda function you want to update (Use arrow keys)
> askChatGtpLambda

...

? Which setting do you want to update? 
  Resource access permissions
  Scheduled recurring invocation       
  Lambda layers configuration
  Environment variables configuration  
> Secret values configuration

...

? Enter a secret name (this is the key used to look up the secret value): OPEN_API_KEY

...

? Enter the value for OPEN_API_KEY: [input is hidden] <SECRET_HERE>

...

? What do you want to do? (Use arrow keys)
  Add a secret
  Update a secret
  Remove secrets
> I'm done

...

? Do you want to edit the local lambda function now? N

The immediate result of above is a pre-generated commented snippet in the Lambda function, which explains how to retrieve the said Parameter from SSM.


/*
Use the following code to retrieve configured secrets from SSM:

const aws = require('aws-sdk');

const { Parameters } = await (new aws.SSM())
  .getParameters({
    Names: ["OPEN_API_KEY"].map(secretName => process.env[secretName]),
    WithDecryption: true,
  })
  .promise();

Parameters will be of the form { Name: 'secretName', Value: 'secretValue', ... }[]
*/

You are welcome to use the above. I generally prefer sticking to SDK V3, so we will have to explicitly install the @aws-sdk/client-ssm package and modify the function as below.

import { SSMClient, GetParametersCommand } from "@aws-sdk/client-ssm"; 
import { Configuration, OpenAIApi } from "openai";

const config = {
    region : "eu-west-1"
};

export const handler = async (event) => {

    const client = new SSMClient(config);
    const command = new GetParametersCommand({
        Names: ["OPEN_API_KEY"].map(secretName => process.env[secretName]),
        WithDecryption: true
    });

    const {Parameters} = await client.send(command);

    const configuration = new Configuration({
        apiKey: Parameters[0].Value,
    });

    const openai = new OpenAIApi(configuration);

    const completion = await openai.createCompletion({
        model: "text-davinci-003",
        prompt: JSON.parse(event.body).input.question,
        temperature: 0.6,
        echo: false,
        max_tokens: 2048
    });

    return {
        statusCode: 200,
        headers: {
            "Access-Control-Allow-Headers" : "Content-Type",
            "Access-Control-Allow-Origin": "<ORIGIN_HERE>",
            "Access-Control-Allow-Methods": "OPTIONS,POST,GET"
        },
        body: JSON.stringify({ "Answer": completion.data.choices[0].text }),
    }
};

Finally push the changes to AWS by issuing

amplify push

Our api is now able to retrieve a ChatGPT answer from a given input question. Before gluing everything in the UI, let’s add the last piece: Amazon Polly.

Add Predictions

The Predictions category provides a solution for using AI and ML cloud services to enhance your application. The case of interest for this tool is converting text to speech, which, behind the scenes, uses Amazon Polly.

What Is Amazon Polly?

Introduction to Amazon Polly, a cloud service that converts text into lifelike speech.

docs.aws.amazon.com

To add predictions capability, follow below instructions:

amplify add predictions

...

? Please select from one of the categories below ...  (Use arrow keys or type to filter)
  Identify
❯ Convert
  Interpret
  Infer
  Learn More

...

? What would you like to convert? ...  (Use arrow keys or type to filter)
  Translate text into a different language
❯ Generate speech audio from text
  Transcribe text from audio

...

? Provide a friendly name for your resource » speechGenerator

...

? What is the source language? ...  (Use arrow keys or type to filter)
❯ US English
  Turkish
  Swedish
  Russian
  Romanian
  Portuguese
  Brazilian Portuguese
  Polish
  Dutch
  Norwegian

...

? Select a speaker ...  (Use arrow keys or type to filter)
  Kevin - Male
  Salli - Female
  Matthew - Male
  Kimberly - Female
  Kendra - Female
  Justin - Male
  Joey - Male
❯ Joanna - Female
  Ivy - Female
  Ruth - Female

...

? Who should have access? ...  (Use arrow keys or type to filter)
❯ Auth users only
  Auth and Guest users

To deploy changes to AWS, run

amplify push

Finally, in order to glue predictions with frontend and use Predictions as backend API, follow the instructions here, by adding the below to the App.js component:

import { Amplify } from 'aws-amplify';
import {
  Predictions,
  AmazonAIPredictionsProvider
} from '@aws-amplify/predictions';
import awsconfig from './aws-exports';

Amplify.configure(awsconfig);
Amplify.addPluggable(new AmazonAIPredictionsProvider());

The Predictions object is particularly relevant as it exposes a REST Api which allows to execute AI/ML actions. Below shows how the answer is converted into Joanna voice, provided as data stream which then can be used by Web Audio APIs to transmit the sound to the microphones.

const result = await Predictions.convert(
  {
    textToSpeech: {
      source: {
        text: completion.Answer,
      },
      voiceId: "Joanna" // default configured on aws-exports.js
    }
  }
);

// Use result.audioStream to pass to Web Audio APIs.

Finally, it is time to setup the frontend.

Finalize the frontend

The main areas I would like to address are:

The use of React-speech-recognition. It is a hook that uses the Web Speech API to detect voice data and incorporate it into web apps. A simple usage of those APIs can be found below:

const {
    transcript, // the current voice text as string
    listening,  // whether the microphone is listening or not
    resetTranscript,  //utility to reset the transcript
    browserSupportsSpeechRecognition  // whether current browser supports the WebSpeech APIs
} = useSpeechRecognition();

...

// To start listening to speech, call the startListening function.
SpeechRecognition.startListening({ continuous: true });

// To turn the microphone off, but still finish processing 
// any speech in progress, call stopListening.
SpeechRecognition.stopListening();

React-vertical-timeline-component is a nice React component. Its look and feel makes it a great candidate for a real time conversation between the speaker and the bot.

Web Audio APIs are leveraged to receive the voice stream as input and decode it so it can be sent to microphone and produce real sound.

let AudioContext = window.AudioContext || window.webkitAudioContext;
const audioCtx = new AudioContext();
const source = audioCtx.createBufferSource();

audioCtx.decodeAudioData(result.audioStream, (buffer) => {
  source.buffer = buffer;
  source.connect(audioCtx.destination);
  source.start(0);
});

End result

A full demo of the solution can be found here, while GitHub is here. If you wish to access the demo connect with me in Linkedin or write a comment below in the article and I will give you access. Below you can see a typical interaction by voice:

After logging in, you can click the microphone icon on top-center screen. You can then start talking and converse with Joanna, which will reply after you click the green stop button. The red cancel button is used to cancel and retry.

Conclusion

Creating a speaker assistant with AWS Amplify and Chat GPT is a great way to practice and prepare for presentations. With the power of AI, you can get feedback, advice, and support anytime and anywhere.

Hope you enjoyed this article ;)

Thanks for being a part of our community! Before you go:

👏 Clap for the story and follow the author 👉
📰 View more content in the Level Up Coding publication
💰 Free coding interview course ⇒ View Course
🔔 Follow us: Twitter | LinkedIn | Newsletter

🚀👉 Join the Level Up talent collective and find an amazing job