We're hiring!
*

Unleashing gst-python-ml: Python-powered ML analytics for GStreamer pipelines

Aaron Boxer avatar

Aaron Boxer
May 12, 2025

Share this post:

Reading time:

Creating powerful video analytics pipelines is easy if you have the right tools. In this post, we will show you how to effortlessly build a broad range of machine learning (ML) enabled video pipelines using just two components, GStreamer and Python. We will focus on simplicity and functionality, deferring performance tuning to a future deep dive.

The core of our pipeline is GStreamer, everyone's favorite multimedia framework. Over the past few years, Collabora has contributed extensive ML capabilities to upstream GStreamer, adding support for ONNX and LiteRT inference and introducing a fine-grained, extensible metadata framework to persist model outputs.

We now take the next step by unleashing gst-python-ml: a pure Python framework that can easily build powerful ML-enabled GStreamer pipelines using standard Python packages. With just a few lines of Python, or a single gst-launch-1.0 command, you can now run complex models across multiple streams, complete with tracking, captioning, speech and text processing, and much more.

Features

The framework is composed of a set of base classes that can be easily extended to create new ML elements, and a set of tested, fully functional elements that support the following features and models:

  1. Object Detection with Yolo, FasterRCNN, MaskRCNN, or any TorchVision object detection model
  2. Segmentation with MaskRCNN or Yolo
  3. Tracking with Yolo 
  4. Video Captioning with Phi3.5 Vision
  5. Translation with Marian 
  6. Transcription with Whisper
  7. Speech to Text with Whisper
  8. Text to Speech with WhisperSpeech
  9. Text to Image with Stable Diffusion
  10. Bird's eye view of sports matches
  11. Batch inference
  12. Multiple incoming streams
  13. Large Language Models (LLMs) with any HuggingFace Hub LLM
  14. Serializing ML metadata to Kafka server for real-time post-processing
  15. Overlay to display ML metadata such as bounding boxes and tracks

Running a Pipeline

For a taste of the ease and simplicity of gst-python-ml, we present a few sports analytics sample pipelines.

1. Here are all the steps needed to run a Yolo tracking pipeline on Ubuntu:

apt install -y gstreamer1.0-plugins-base gstreamer1.0-plugins-base-apps \
    gstreamer1.0-plugins-good gstreamer1.0-plugins-bad \
    gir1.2-gst-plugins-bad-1.0 python3-gst-1.0 gstreamer1.0-python3-plugin-loader

pip install pygobject pycairo torch torchvision transformers numpy ultralytics gst-python-ml

gst-launch-1.0 filesrc location=/path/to/video ! decodebin !  videoconvert ! \
    pyml_yolo model-name=yolo11m track=True ! pyml_overlay  ! videoconvert ! autovideosink

2. Here is a soccer match processed with this pipeline:

3. Multiple video sources are also supported.

Collabora soccer tracking

4. Another supported sports analytics feature is the creation of a bird's eye view of a game, to show a quick overview of the field:

Collabora bird's eye view

5. gst-python-ml shows its true power when using hybrid vision + language models to enable features that are simply not available in any other GStreamer-based analytics framework, whether open source or commercial. For example, video captioning is supported using the Phi3.5 Vision model. Each video frame can be automatically captioned, and these captions can be further processed to automatically summarize a game or to detect significant events such as goals.

Collabora video captioning

These are just a few of the features we have built with gst-python-ml - the possibilities are endless.

Development

gst-python-ml is distributed as a PyPI package. All elements are first class GStreamer elements that can be added to any GStreamer pipeline, and they will work with any Linux distribution's GStreamer packages, from version 1.24 onward.

Development takes place on our GitHub repository — we welcome contributions, feedback and new ideas.

As we continue building gst-python-ml we are actively looking for collaborators and partners. Our goal is to make ML workflows in GStreamer powerful and accessible — whether for real-time media analysis, content generation, or for intelligent pipelines in production environments.

If you would like to know more about Collabora's work on GStreamer ML, please contact us.

Comments (0)


Add a Comment






Allowed tags: <b><i><br>Add a new comment:


Search the newsroom

Latest Blog Posts

Unleashing gst-python-ml: Python-powered ML analytics for GStreamer pipelines

12/05/2025

Powerful video analytics pipelines are easy to make when you're well-equipped. Combining GStreamer and Machine Learning frameworks are the…

Matt Godbolt sold me on Rust (by showing me C++)

06/05/2025

Gustavo Noronha helps break down C++ and shows how that knowledge can open up new possibilities with Rust.

Customizing WirePlumber's configuration for embedded systems

29/04/2025

Configuring WirePlumber on embedded Linux systems can be somewhat confusing. We take a moment to demystify this process for a particular…

Evolving hardware, evolving demo: Collabora's Embedded World Board Farm

24/04/2025

Collabora's Board Farm demo, showcasing our recent hardware enablement and continuous integration efforts, has undergone serious development…

Implementing Bluetooth on embedded Linux: Open source BlueZ vs proprietary stacks

27/02/2025

If you are considering deploying BlueZ on your embedded Linux device, the benefits in terms of flexibility, community support, and long-term…

The state of GFX virtualization using virglrenderer

15/01/2025

With VirGL, Venus, and vDRM, virglrenderer offers three different approaches to obtain access to accelerated GFX in a virtual machine. Here…

Open Since 2005 logo

Our website only uses a strictly necessary session cookie provided by our CMS system. To find out more please follow this link.

Collabora Limited © 2005-2025. All rights reserved. Privacy Notice. Sitemap.