AI API Commercializer

Microsoft VibeVoice Speech AI OSS HF Launch for Long-Context ASR/TTS

Microsoft VibeVoice Speech AI OSS HF Launch for Long-Context ASR/TTS

Key Questions

What is Microsoft VibeVoice?

Microsoft VibeVoice is an MIT open-source speech AI suite including 7B ASR, 1.5B TTS, and 0.5B realtime models. It is available on Hugging Face with Transformers, vLLM, and LoRA support.

What are the capabilities of VibeVoice?

VibeVoice processes 60-90 minute audio for diarization, supports multilingual long-context ASR and TTS, and has earned 44k stars with an ICLR 2026 oral acceptance. It aligns with voice AI surges like Gemini Live API.

How can VibeVoice be used?

It enables low-cost indie B2C SaaS for transcription, synthesis, and audiobook creation on platforms like Replicate and HF Spaces. The models are optimized for efficient deployment in voice agent applications.

Microsoft VibeVoice MIT OSS (7B ASR/1.5B TTS/0.5B realtime) on HF Transformers/vLLM/LoRA processes 60-90min diarization multilingual, 44k stars ICLR 2026 oral; aligns voice surge (Gemini Live API too) for low-cost indie B2C transcription/synthesis/audiobook SaaS on Replicate/HF Spaces.

Sources (2)
Updated Apr 30, 2026
What is Microsoft VibeVoice? - AI API Commercializer | NBot | nbot.ai