TTS overview

17 Mar 2025
3 Minutes to read

Print
Share
Dark
Light
PDF

TTS overview

Updated on 17 Mar 2025
3 Minutes to read

Print
Share
Dark
Light
PDF

Article summary

Did you find this summary helpful?

Thank you for your feedback!

Who should read this article: All users

Use Voiso's text-to-speech (TTS) capabilities to improve the efficiency and flexibility of creating and deploying audio messages for contacts. Text-to-speech audio messaging saves your contact center time and money when compared to traditional audio recordings.

Introduction

Voiso text-to-speech (TTS) is a service that converts your text messages into natural-sounding speech. Voiso leverages Amazon Polly to provide consistent voice production and personalization. You can use text-to-speech to create audio messages in your inbound call flows.

Text-to-speech replaces traditional audio recordings in IVRs to enable you to rapidly develop and manage the audio messages that your inbound callers hear.

Traditional audio messages can be time-consuming and expensive to produce, especially when hiring voice actors and reserving recording studio time. Using text-to-speech eliminates concerns about accents or poor pronunciation affecting message comprehension. All your messages are clear and precise.

Text-to-speech increases your contact center efficiency and flexibility. It is quick and easy to replace or update messages instantaneously without the inconvenience of making a whole new audio recording. Simply change the text and the new message is immediately available in your IVR.

In Flow Builder, text-to-speech uses call flow variables to enable you to personalize messages with customer-specific information without manual intervention. This is particularly useful for delivering customer-specific information, such as account balances or order statuses, which would be impractical with pre-recorded messages.

Voiso's text-to-speech capability also gives you full access to Amazon Polly's multilingual support. This enables you to have messages in multiple languages based on caller preferences.

Languages and voices

Voiso text-to-speech supports all of Amazon Polly's languages and standard voices.

Language support

The following languages are available for text-to-speech synthesis:

Arabic
Catalan (Spain)
Chinese (Mandarin)
Danish (Denmark)
Dutch (Netherlands)
English (United States)
Finnish (Finland)
French (Canada)
French (France)
German (Germany)
Italian (Italy)
Japanese (Japan)
Korean (South Korea)
Norwegian Bokmål (Norway)
Polish (Poland)
Portuguese (Portugal)
Spanish (Mexico)
Spanish (Spain)
Swedish (Sweden)
Russian (Russia)

Voice support

Refer to Voice samples for a list of the supported voices and samples of what they sound like.

Speech Synthesis Markup Language (SSML)

Voiso's text-to-speech feature supports Speech Synthesis Markup Language (SSML). SSML enables you to craft the exact sound and feel of the voice messages generated through speech synthesis.

SSML lets you include speech synthesis options such as:

long pauses
variable speech rate or pitch
emphasis of specific words or phrases
the use of phonetic pronunciation
breathing sounds
whispering

Refer to the Amazon Polly Developer Guide for a complete list of supported tags.

SSML is very similar to Hyper Text Markup Language (HTML), so if you are familiar with HTML, it is easy to learn and apply SSML.

Refer to SSML syntax for some examples of how you can improve your text-to-speech voice messages by using SSML.

Flow Builder node support

The following Flow Builder nodes implement text-to-speech:

Use Case: Account balance inquiry

The following simplified flow presents a use case that you can use as a basis for your own text-to-speech flow.

Scenario

A contact calls into a bank's contact center to inquire about their account balance.

Flow

Here is an example of a simplified call flow for a contact checking their account balance.

Flow Builder TTS Use Case Account Balance Flow

Incoming Call: The contact dials the contact center's inbound number.
IVR Welcome Message: The DTMF node plays a pre-defined welcome message, such as "To check your account balance, press 1. To speak to a human agent press 2".
If the contact presses 1, their phone number (ANI) is used by the HTTP Request node to query a web service and store the contact's account balance in a custom variable.
If the balance is successfully retrieved, the Play Audio node reads a voice message to the contact that includes the custom variable containing the account balance information.

Hello, your account balance is <prosody rate="slow">{{contactBalance}}</prosody.>
<break strength="medium"/>
Thank you for using our automated service!

Was this article helpful?

What's Next

Voice samples

Table of contents

Introduction
Languages and voices
Speech Synthesis Markup Language (SSML)
Flow Builder node support
Use Case: Account balance inquiry

TTS overview

Introduction

Languages and voices

Language support

Voice support

Speech Synthesis Markup Language (SSML)

Flow Builder node support

Use Case: Account balance inquiry

Scenario

Flow

Related articles

What's Next