Microsoft’s new AI can simulate anyone’s voice with 3 seconds of audio

Found 22 days ago at Arstechnica

On Thursday, Microsoft researchers announced a new text to speech AI model called VALL E that can closely simulate a person voice when given a three second audio sample. Once it learns a specific voice, VALL E can synthesize audio of that person saying anything—and do it in a way that attempts to preserve the speaker emotional tone. Its creators speculate that VALL E could be used for high...

Read the article at Arstechnica

Related News

More General News