Transforming speech technology with WhisperLive

Transforming speech technology with WhisperLive

Kara Bembridge
May 28, 2024

Share this post:

Reading time:

The world of AI has made leaps and bounds from what it once was, but there are still some adjustments required for the optimal outcome. In the realm of conversational AI, VoxAI had already developed a platform to capture customer orders. The response time and oratory abilities needed improvement and this is where Collabora stepped in with WhisperLive.

WhisperLive, a real-time transcription service powered by OpenAI's Whisper model departed from traditional speech recognition methods by incorporating voice activity detection (VAD). VAD identifies speech presence, allowing for selective transmission of audio data to enhance transcription accuracy while optimizing data handling.

Simultaneously, Collabora employed a finely tuned Mistral model for the NLP component. Renowned for its efficiency and versatility, Mistral is six times faster and equally or more effective than the Llama 2 70B model across benchmarks. This model supports multiple languages and possesses inherent coding capabilities.

As we look to the WhisperLive's promising possibilities, our Machine Learning Lead, Marcus Edel, puts it best:

"The future of customer interaction lies in the harmonious fusion of sophisticated AI and powerful communication technologies. As we continue our mission and build fully in the open, WhisperLive, and now the award-nominated WhisperFusion, are poised to make an impact in the communication technology landscape."

To learn more about how this project came to life, take a look at out our case study.

If you're eager to implement your own transcription service, please get in touch! Our machine learning team is ready to assist you with your AI needs.

Collabora's WhisperFusion nominated for Embedded Award 2024

WhisperFusion: Ultra-low latency conversations with an AI chatbot

WhisperSpeech: Exploring new horizons in text-to-speech tech

Collabora's WhisperFusion nominated for Embedded Award 2024

WhisperFusion: Ultra-low latency conversations with an AI chatbot

WhisperSpeech: Exploring new horizons in text-to-speech tech

Search the newsroom

Latest Blog Posts

The power of APIs: The unsung hero of AI interface

07/07/2026

AI development is shifting from implementing models from scratch to composing powerful capabilities via APIs, enabling developers to integrate…

Simplifying Bluetooth qualification for Linux/BlueZ: New upstream documentation

26/05/2026

New upstream BlueZ documentation helps simplify Bluetooth qualification for Linux-based products by mapping supported profiles, test requirements,…

Building Tyr in Rust: CSF architecture and booting the MCU

14/05/2026

See how Tyr moves beyond MCU firmware boot to build the group, queue, VM, submission, and completion paths needed to run real Vulkan workloads…

Optimizing memory access in NIR

07/05/2026

A complete breakdown of Mesa’s NIR compiler detailing how it optimizes shader memory access with SSA promotion, deref analysis, copy propagation,…

BlueZ-powered Auracast broadcasting on Genio 700

05/05/2026

Collabora brought Bluetooth Auracast broadcasting to MediaTek Genio 700 for Embedded World 2026. Here's the complete, fully Open Source…

Making the invisible audible: Building an OpenXR experience for ocean protection

22/04/2026

Using our XR expertise, Collabora created a standalone XR experience for our 1% for the Planet partner, SOMAR, to showcase the direct impact…

About Collabora

Whether writing a line of code or shaping a longer-term strategic software development plan, we'll help you navigate the ever-evolving world of Open Source.

한국의 국기 한국어 버전의 Collabora.com 보기