Amazon Polly – Turning text into lifelike speech using deep learning

Amazon Polly is a service that turns text into lifelike speech, allowing you to create applications that talk, and build entirely new categories of speech-enabled products. Amazon Polly is a text-to-speech service that uses advanced deep learning technologies to synthesize speech that sounds like a human voice.

With dozens of lifelike voices across a variety of languages, you can select the ideal voice and build speech-enabled applications that work in many different countries.

Natural Sounding Voices

Amazon Polly provides dozens of lifelike voices and supports multiple languages, including a wide range of male and female voices with a variety of accents. Amazon Polly’s fluid pronunciation of text in multiple languages enables you to deliver high-quality voice output and create applications for global users.

Easy Integration

Amazon Polly makes it easy to add voice to your website, mobile app, or device. With Amazon Polly, you just write the text you want converted to speech to the Amazon Polly API and it immediately returns the audio stream. Unlike other solutions that require a lengthy approval process, Amazon Polly doesn’t require you to describe how you will use Amazon Polly’s speech in your application, and there are no distribution agreements to sign, so you can start right away.

Store and Redistribute Speech

Unlike other solutions that require a royalty or charge a fee every time you replay previously generated audio, Amazon Polly allows for unlimited replays without any additional fees. These free replays extend to offline use as well. You can create speech files in a variety of standard formats, such as MP3 and OGG, and store these on devices such as a mobile phones or Internet of Things (IoT) devices for offline playback.

Low Cost

Amazon Polly’s pay-as-you-go pricing, low cost per character converted, and unlimited replays make it a cost-effective way to enable speech synthesis in virtually any application.

Fast Response

Delivering lifelike voices and conversational user experiences requires consistently fast response times. Voice-enabled applications need to play synthesized speech without delay. Consider apps that provide spoken directions for navigation, eLearning applications that provide verbal instruction to students, and apps that engage the user through real time dialog. These apps are most effective when responses can start without perceived delays in the conversational flow. Even when you send lengthy text to Amazon Polly’s API, it returns the audio to your application as a stream so you can play the voices immediately. These kinds of dynamic, spoken responses require access to a much larger quantity of speech audio than is typically available to store on users’ devices. Amazon Polly is in the cloud, so you have access to a wide variety of synthesized speech. With Amazon Polly, your application can provide even more valuable responses that include real-time data.

Use Cases

Content Creation

Amazon Polly makes it easy to add speech to your video, presentation, or online training course. Amazon Polly can generate speech in 24 languages, making it easy to add voice to applications with a global audience. With Amazon Polly you can read your RSS feed, news, or email, and store synthesized speech in the form of audio files.


Amazon Polly enables developers to provide their applications with an enhanced visual experience such as speech-synchronized facial animation or karaoke-style word highlighting. Amazon Polly makes it easy to request an additional stream of metadata with information about when particular sentences, words and sounds are being pronounced. Using this metadata stream alongside the synthesized speech audio stream, customers can animate avatars and highlight text as it is currently spoken text in their app.

Customer Contact Center

With Amazon Polly, your customer contact centers can respond with natural sounding voices. You can replay Amazon Polly’s speech output through your interactive voice response (IVR) systems. Additionally, you can leverage Amazon Polly’s API to deliver automated real-time information such as service status, account and billing inquiries, addresses, and contact information.

Internet of Things (IoT)

Amazon Polly enables new Internet of Things (IoT) use cases by making it easy and inexpensive to add speech to IoT devices. IoT devices can use speech to provide natural responses and notifications, making applications more accessible and allowing users to consume information without having to rely on a screen. With Amazon Polly you can generate speech files and store them on your devices for offline playback.

Use AWS Lambda to generate pre-signed Polly URLs based on events from the AWS IoT rules engine, then use Device Gateway to send these URLs to your IoT devices to allow them to request lifelike speech.

Language Learning

Amazon Polly can be used to improve the usability of applications that teach people how to speak new languages. For example, end users can type foreign language phrases into your application, the hear them spoken by a native speaker. Amazon Polly supports 24 languages, giving teachers and students plenty of options.

With Amazon Polly you can create and distribute accessible information in the form of synthesized speech for visually impaired people. This way you can help people with sight loss to consume various content like news, books or email messages.

