VibeVoice: A Frontier Open-Source Text-to-Speech Model
馃搫 Report
路
Code
路
馃 Hugging Face
路
Demo
VibeVoice is a novel framework designed for generating expressive, long-form, multi-speaker conversational audio, such as podcasts, from text. It addresses significant challenges in traditional Text-to-Speech (TTS) systems, particularly in scalability, speaker consistency, and natural turn-taking.
A core innovation of VibeVoice is its use of continuous speech tokenizers (Acoustic and Semantic) operating at an ultra-low frame rate of 7.5 Hz. These tokeniz...
Read more at microsoft.github.io