Microsoft researchers unveil text-to-speech AI tool, Vall-E

January 11, 2023

Microsoft researchers have unveiled Vall-E, a text-to-speech AI model that can listen to a voice for a few seconds and then mimic that voice, including emotional tone and acoustics, to say anything.

VALL-E is a “neural codec language model,” according to the researchers, that uses Meta’s EnCodec technology to evaluate a human voice as individual tokens. The AI can then estimate how the voice would sound if other words were added to it.

It can deliver a speech in a “zero-shot situation,” meaning without any prior examples or training in a specific context or situation, after being trained with 60,000 hours of English speech recordings.

VALL-E produces audio codec codes from text and acoustic prompts rather than the traditional waveform manipulation method. The script that the user enters into the model is merged with the acoustic tokens and sound prompts from the original recording to generate the correct waveform for the neural codec decoder.

When merged with other generative AI models like GPT-3, the researchers believe VALL-E can be used for high-quality text-to-speech applications, speech editing where a recording of a person could be edited and changed from a text transcript (making them say something they didn’t originally say), and audio content creation.

The tool is not currently available to the general public.

The sources for this piece include an article in ArsTechnica.

Top Stories

Related Articles

December 31, 2025 AST SpaceMobile has launched the largest satellite ever deployed in low-Earth orbit, escalating competition with SpaceX’s Starlink more...

December 31, 2025 Microsoft engineer Galen Hunt briefly set off alarm bells across the developer community after declaring an ambition more...

December 23, 2025 Editor's Notes: This is the first of two articles reflecting on the year but Yogi Schulz. Schulz' more...

December 23, 2025 Google parent company Alphabet said Monday that it will acquire Intersect Power for $4.75 billion in cash more...

Jim Love

Jim is an author and podcast host with over 40 years in technology.

Share:
Facebook
Twitter
LinkedIn