Triple Threat: The Power of Transcription, Summary, and Translation

Triple Threat: The Power of Transcription, Summary, and Translation

Marcus Edel
August 03, 2023

Share this post:

Reading time:

At Collabora, we're committed to bringing people together. That's why we're pushing state-of-the-art machine-learning techniques like Large Language Models, Speech Recognition, and Speech-Synthesis techniques.

Collabora is a massive advocate of open source, which has long been the backbone of AI. The principle of taking code and publishing it for all to see and tinker with has remained unquestioned among the AI research community and has been credited with supercharging the technology's development.

That said, to us, the most compelling use cases of these technologies will come from starting with a real human need. Today we'll show you how we leveraged open-source models to solve a problem that we encounter on a daily basis. Using state-of-the-art natural language processing techniques, we developed an AI-driven automatic transcription, summarization, and translation pipeline. By applying these techniques and learnings, the community will be able to automate the boring work of transcription, summarization, and translation, enabling them to spend more time on creative and stimulating work.

Curious to see how this works? Check out our demo page to generate your own transcription, summary, and translation, or use our browser extension to get live transcriptions.

Transcription

We use OpenAI's Whisper as it is currently one of the best-performing models for audio transcription. Moreover, it's readily available and comes in different model sizes. Using the small model, we achieved decent results even on non-English audio. In addition, it's resource-efficient enough to be run on a CPU without falling behind the stream. You could deploy the transcription server on a DataCrunch CPU instance for less than $50 per month, serving multiple users.

Some of our meetings are technical and use terminology that OpenAI's Whisper fails to get right. We finetuned the model on our meetings to account for the imperfect transcription, eliminating the issues completely.

We implemented a simple Python client and backend that takes care of all the heavy lifting. That's a good reminder to appreciate the hard work that open source developers put in regularly. If you want to learn more about this specific implementation, I recommend checking out the repository.

In addition, we implemented a browser plugin (Chrome/Firefox) that connects to the backend and delivers live transcription for any media content. This enables out-of-the-box live transcriptions for a number of web-conferencing applications and web-video platforms.

Summarization

For the summarization part, we used LangChain, another open-source framework, for developing applications powered by language models. LangChain allows us to switch out the large language model without changing a lot of code. We tested different LLMs for the summarization tasks and decided to go with Falcon-40B-Instruct, but as mentioned, it would be easy to swap it out with ChatGPT, Claude, or Instruct-GPT-J. Remember that when selecting an instructed LLM, ensure it understands the prompt.

We implemented a simple Python script that takes a meeting transcript and generates a summarization.

text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=100)
texts = text_splitter.create_documents([transcript])

model_repo = 'tiiuae/falcon-40b-instruct'

tokenizer = AutoTokenizer.from_pretrained(model_repo)
model = AutoModelForCausalLM.from_pretrained(model_repo,
                                             load_in_8bit=True,
                                             device_map='auto',
                                             torch_dtype=torch.float16,
                                             low_cpu_mem_usage=True,
                                             trust_remote_code=True
                                            )
max_len = 2048
task = "text-generation"

pipe = pipeline(
    task=task,
    model=model,
    tokenizer=tokenizer,
    max_length=max_len,
    temperature=0.7,
    top_p=0.95,
    repetition_penalty=1.15,
    pad_token_id = 11
)

llm = HuggingFacePipeline(pipeline=pipe)

chain = load_summarize_chain(llm=llm, chain_type="map_reduce", verbose=True)
summary = chain.run(texts)
print("summary", summary)

Meeting Overview

To show the summary and transcription, we implemented another simple script that outputs an HTML page with the transcription, speaker diarization, and summary.

Outlook

Our goal is not to solve this problem alone: it's to engage with the community and push the envelope of what's possible. We open source our code and believe the community can benefit from that to build something even better.

In today's increasingly international landscape, language (transcription, summarization, and translation) will continue to play a vital role in helping people around the world to connect on many platforms. This will change how people live their lives, how they do business, and how they are educated. At Collabora, we really keep that mission at the heart of what we do as people.

MLfix to quickly fix datasets

Carlafox: Towards reliable open-source 3D perception

Labeling tools are great, but what about quality checks?

MLfix to quickly fix datasets

Carlafox: Towards reliable open-source 3D perception

Labeling tools are great, but what about quality checks?

Comments (0)

Add a Comment

Search the newsroom

Latest Blog Posts

Quick notes from the GStreamer Spring Hackfest 2025

15/07/2025

This past May, we met with the community at the GStreamer Spring Hackfest in Nice, France, and were able to make great strides, including…

PipeWire workshop 2025: Updates on video transport, Rust efforts, TSN networking, and Bluetooth support

03/07/2025

As part of the activities Embedded Recipes in Nice, France, Collabora hosted a PipeWire workshop/hackfest, an opportunity for attendees…

Coccinelle for Rust progress report

25/06/2025

In collaboration with Inria, the French Institute for Research in Computer Science and Automation, Tathagata Roy shares the progress made…

Linux Media Summit 2025 recap

23/06/2025

Last month in Nice, active media developers came together for the annual Linux Media Summit to exchange insights and tackle ongoing challenges…

Constructor acquires, destructor releases

09/06/2025

In this final article based on Matt Godbolt's talk on making APIs easy to use and hard to misuse, I will discuss locking, an area where…

What if C++ had decades to learn?

21/05/2025

In this second article of a three-part series, I look at how Matt Godbolt uses modern C++ features to try to protect against misusing an…

About Collabora

Whether writing a line of code or shaping a longer-term strategic software development plan, we'll help you navigate the ever-evolving world of Open Source.

한국의 국기 한국어 버전의 Collabora.com 보기