Ultra-low latency conversations with a drive-thru AI chatbot

WhisperLive and its Open Source text-to-speech models meet VoxAI

VoxAI, a company recognized for its innovative approach to capturing customer orders through a human-like voice interface, embarked on a project to enhance its transcription capabilities and refine its natural language processing (NLP) backend. The endeavor focused on developing two key components: an efficient transcription model and a streamlined NLP process. Given the complexity and scarcity of resources on advancing these technologies for production use, the challenge was formidable.


The transcription component leveraged WhisperLive, a cutting-edge real-time transcription service developed by Collabora, powered by OpenAI's Whisper model. This approach marked a departure from conventional speech recognition technologies by employing voice activity detection (VAD). VAD identifies speech presence, enabling selective transmission of audio data to the Whisper model, thereby optimizing data handling and transcription accuracy.

In tandem, for the NLP component, Collabora utilized a fine tuned Mistral model, renowned for its efficiency and versatility. This model, being six times faster and equally or more effective than the Llama 2 70B across benchmarks, supports multiple languages and possesses inherent coding capabilities. A specialized data generation pipeline was established, combining customer data with advanced modeling techniques to fine tune the Mistral model to meet VoxAI's specific operational needs.

Future Directions

VoxAI's collaboration with Collabora signified a pivotal shift from reliance on proprietary, outdated, and unsupported pipelines to a robust open-source framework. Collabora filled critical gaps in free and open-source software (FOSS) projects to align with VoxAI's operational needs, thereby enabling the company to concentrate on creating value-added services with the reassurance of community support for addressing common challenges.

Guidance and Ongoing Support

A cornerstone of Collabora's engagement is its commitment to providing continual support and guidance. This commitment stems from a long-standing relationship with clients, fostering a deep understanding of their needs. Collabora's dedication to removing barriers to FOSS adoption, particularly for clients transitioning from subpar vendor-specific solutions, ensures that clients such as VoxAI receive comprehensive support throughout their journey to market success with FOSS-based products.

"The future of customer interaction lies in the harmonious fusion of sophisticated AI and powerful communication technologies. As we continue our mission and build fully in the open, WhisperLive, and now the award-nominated WhisperFusion, are poised to make an impact in the communication technology landscape."Marcus Edel, Machine Learning Lead, Collabora
Linux Kernel, VOX, PyTorch

