Microsoft's AI can mimic anyone's voice in just 3 seconds.

Buzz

Ngày cập nhật gần nhất: 15/6/2026

Content

Microsoft aims to integrate Vall-E into software for high-quality text-to-speech conversion.

Vall-E, Microsoft's AI creation, can replicate tone and speech patterns of real people by listening to them for three seconds, albeit with a slightly robotic undertone.

Microsoft's AI system, named 'Language Model Codec,' utilizes algorithms to process video and store it as a byte stream. Audio or video files are compressed and then decompressed for various purposes.

Built on the EnCodec platform, Vall-E by Microsoft constructs individual audio codecs using machine learning techniques developed in 2022 by Meta. It captures and analyzes each person's audio, segmenting the information into tokens through EnCodec. This method differs from previous text-to-speech approaches, which were typically waveform-based.

Subsequently, Vall-E employs training data to match what it 'knows' about speech intonations, enabling it to articulate different phrases based on its learned knowledge.

This voice mimicry process happens within three seconds - no AI system has achieved this level of language emulation before.

Microsoft utilizes a library containing 60,000 hours of English speech from over 7,000 individuals to train Vall-E. This library will be continually supplemented over time and across multiple languages.

Microsoft aims for Vall-E to be integrated into software for high-quality text-to-speech conversion.

However, Vall-E raises concerns among experts that it could be exploited for malicious purposes. This AI could be used by malevolent actors to impersonate voices for fraudulent activities, such as extortion. When combined with deepfake videos, the potential danger escalates significantly.

Nevertheless, Vall-E's potential misuse highlights the importance of implementing safeguards and regulations to mitigate its harmful impacts.

Mytour's content is for customer care and travel encouragement only, and we are not responsible.

For errors or inappropriate content, please contact us at: [email protected]